Assimilation Evaluation Toolkit

From Apertium
Revision as of 17:36, 2 July 2014 by Sereni (talk | contribs)
Jump to navigation Jump to search

Project description

This page describes a work in progress[1]. The Assimilation evaluation toolkit is a set of programs that generates tasks for human evaluation of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of these sentences. They also contain a key to determine answer correctness, not shown to the evaluator. The tasks may be generated as standalone text files with automated checking, or as XML files to be integrated into Appraise evaluation system.

Keyword extraction

Keywords are extracted from the text with a method described in [2]. This method favors longer keywords, which is not suitable for text gapping, so the keywords containing more than two words are filtered out. A list of stopwords is required for the algorithm to work. For this, we use Apertium POS-tagger and select stopwords containing the following tags (list to be refined): 'pr', 'vbser', 'def', 'ind', 'cnjcoo', 'det', 'rel', 'vaux', 'vbhaver', 'prn', 'itg'.

The toolkit also features a non-keyword gap generation mode, when words are randomly omitted regardless of their significance for the text.

Task generation

For task generation, four input files are needed: original text, machine translation, reference translation, and its pos-tagged version. After keywords have been extracted, they are removed from the reference translation. Gap density can be varied, c.f. output with gap density of 30% and 70%:

Corpora in { gap } are large collections of texts enhanced with special markup. They allow linguists to search the texts by various { gap } in order to discover phenomena and patterns in the natural language.

Corpora in { gap } are large collections of { gap } with { gap }. They allow linguists to { gap } the { gap } by various parameters in order to { gap } and { gap } in the natural language.

Gap density can be specified in relation to the number of keywords, or to the total number of words in the text. In addition, the user may adjust gap contents by specifying parts of speech to be removed.

As an option, the users may select to view lemmas of omitted words in the gaps. In this case, the evaluators are required to fill in the correct grammatical forms of the words given. This may help to understand how well the MT system deals with translating grammar.

Multiple choice gaps

An additional task generation mode features multiple choice options for gaps. Each omitted word is assigned a list of similar words for the user to choose from during evaluation. The choices are picked from the same text by part of speech and grammar tags. A choice must be the same part of speech as the original word, and they should share as many grammatical features as possible. The approach is described in [3] and [4]. An example of keyword choices generated on the two sentences above (the number of choices can be specified):

special / large / natural, linguists / parameters / patterns, search / make / discover.

Progress

This section lists project progress according to the GSOC proposal.

Week 1: Created an algorithm for keyword extraction in simple gaps. Made a program that creates a gapped reference translation with keys given reference translation text and its tagged version.

Week 2: Multiple choice gaps: created an algorithm for finding similar words for multiple choice based on pos and grammar tags. Updated code to create tasks with multiple choice gaps. Added variable gap density. Added random word deletion (without keyword determination). Fixed bugs.

Week 3: Added task generation with lemmas in place of gaps. Added an option to select parts of speech to be removed. Adjusted keyword removal algorithm to calculate scores based on lemmas (thus account for different word forms).

Week 4: Added modules to generate text-based task sets with original text, reference translation with all supported gap types and optional machine translation. Added a module that calculates the number of correct answers from tasks filled out by evaluators.

Week 5: Created the command-line interface for task generation and answer checking.

Weeks 6-7: Integrated gisting evaluation tasks into Appraise. Created command-line interface for XML generation.

References and links

  1. Current code on github
  2. Rose, Stuart, et al. "Automatic keyword extraction from individual documents." Text Mining (2010): 1-20.
  3. Trond Trosterud, Kevin Brubeck Unhammer. Evaluating North Sámi to Norwegian assimilation RBMT. Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation (FreeRBMT 2012); 06/2012
  4. Jim O'Regan and Mikel L. Forcada (2013) "Peeking through the language barrier: the development of a free/open-source gisting system for Basque to English based on apertium.org". Procesamiento del Lenguaje Natural 51, 15-22.