Difference between revisions of "Assimilation Evaluation Toolkit"

From Apertium
Jump to navigation Jump to search
(Added how-to-use section for the command line interface)
Line 1: Line 1:
 
== Project description ==
 
== Project description ==
 
This page describes a work in progress<ref>Current code [https://github.com/Sereni/Appraise/tree/master/eval on github]</ref>. The [[Assimilation and Dissemination|Assimilation]] evaluation toolkit is a set of programs that generates tasks for human evaluation of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of these sentences. They also contain a key to determine answer correctness, not shown to the evaluator.
 
This page describes a work in progress<ref>Current code [https://github.com/Sereni/Appraise/tree/master/eval on github]</ref>. The [[Assimilation and Dissemination|Assimilation]] evaluation toolkit is a set of programs that generates tasks for human evaluation of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of these sentences. They also contain a key to determine answer correctness, not shown to the evaluator.
  +
  +
== How to use ==
  +
To use the toolkit, download the files from [https://github.com/Sereni/Appraise/tree/master/eval github]. The Python files are required, and you may either use the default text files for testing or provide your own data. The toolkit has one dependency, click, which can be installed from pip:
  +
  +
$ pip install click
  +
  +
To generate tasks, run
  +
  +
$ python gist_eval.py [OPTIONS] ORIGINAL REFERENCE TAGS [TASK] [KEYS]
  +
  +
ORIGINAL is the path to untranslated text, REFERENCE - to reference translation, from which the gapped text will be created, TAGS - the reference translation put through Apertium tagger. TASK and KEYS are optional arguments, which specify the path to save output files - the task itself and the answer keys, respectively. If those are left out, the defaults will be task.txt and keys.txt in the script folder.
  +
  +
The following options are supported:
  +
* -mt, --machine FILENAME – original text translated through Apertium, if the tasks should contain machine translation as a tip for evaluator;
  +
* -m, --mode [simple | choices | lemmas] - specifies task mode. Simple just removes the words, choices gives a choice of three options for each gap, and lemmas leaves the word's lemma in place, and the user is to fill in the correct grammatical form. The default mode is 'simple';
  +
* -k, --keyword - if on, the words to be removed will be determined with keyword selection algorithm, if off - the words will be randomly selected;
  +
* -d, --density - an integer from 1 to 100, specifies gap density. The default is 50;
  +
* -r, --relative - if on, the gap density is calculated against the number of words which were selected to be removed, if off - against the total number of words in the text (but not more than the number of keywords found, if the keyword mode is on);
  +
* -p, --pos - if you wish to remove only specific parts of speech, specify them here as a string of [http://wiki.apertium.org/wiki/List_of_symbols Apertium part-of-speech tags] separated by commas, i.e. 'vblex, n, adj'.
  +
  +
To check the task completed by the user, run
  +
  +
$ python text_checker.py TASK KEYS
  +
  +
TASK is a path to the filled-in task, and KEYS is a path to the answer keys, which were generated with the task. Note that the script assumes that the filled-in task structure is unchanged, and that the evaluator only filled in / changed the words in the gaps. The script returns the number and percentage of correct answers based on the answer key.
   
 
== Keyword extraction ==
 
== Keyword extraction ==

Revision as of 14:29, 23 June 2014

Project description

This page describes a work in progress[1]. The Assimilation evaluation toolkit is a set of programs that generates tasks for human evaluation of machine translation. The tasks consist of sentences in the original language, reference translation with keywords omitted and the machine translation of these sentences. They also contain a key to determine answer correctness, not shown to the evaluator.

How to use

To use the toolkit, download the files from github. The Python files are required, and you may either use the default text files for testing or provide your own data. The toolkit has one dependency, click, which can be installed from pip:

$ pip install click

To generate tasks, run

$ python gist_eval.py [OPTIONS] ORIGINAL REFERENCE TAGS [TASK] [KEYS] 

ORIGINAL is the path to untranslated text, REFERENCE - to reference translation, from which the gapped text will be created, TAGS - the reference translation put through Apertium tagger. TASK and KEYS are optional arguments, which specify the path to save output files - the task itself and the answer keys, respectively. If those are left out, the defaults will be task.txt and keys.txt in the script folder.

The following options are supported:

  • -mt, --machine FILENAME – original text translated through Apertium, if the tasks should contain machine translation as a tip for evaluator;
  • -m, --mode [simple | choices | lemmas] - specifies task mode. Simple just removes the words, choices gives a choice of three options for each gap, and lemmas leaves the word's lemma in place, and the user is to fill in the correct grammatical form. The default mode is 'simple';
  • -k, --keyword - if on, the words to be removed will be determined with keyword selection algorithm, if off - the words will be randomly selected;
  • -d, --density - an integer from 1 to 100, specifies gap density. The default is 50;
  • -r, --relative - if on, the gap density is calculated against the number of words which were selected to be removed, if off - against the total number of words in the text (but not more than the number of keywords found, if the keyword mode is on);
  • -p, --pos - if you wish to remove only specific parts of speech, specify them here as a string of Apertium part-of-speech tags separated by commas, i.e. 'vblex, n, adj'.

To check the task completed by the user, run

$ python text_checker.py TASK KEYS

TASK is a path to the filled-in task, and KEYS is a path to the answer keys, which were generated with the task. Note that the script assumes that the filled-in task structure is unchanged, and that the evaluator only filled in / changed the words in the gaps. The script returns the number and percentage of correct answers based on the answer key.

Keyword extraction

Keywords are extracted from the text with a method described in [2]. This method favors longer keywords, which is not suitable for text gapping, so the keywords containing more than two words are filtered out. A list of stopwords is required for the algorithm to work. For this, we use Apertium POS-tagger and select stopwords containing the following tags (list to be refined): 'pr', 'vbser', 'def', 'ind', 'cnjcoo', 'det', 'rel', 'vaux', 'vbhaver', 'prn', 'itg'.

The toolkit also features a non-keyword gap generation mode, when words are randomly omitted regardless of their significance for the text.

Task generation

For task generation, four input files are needed: original text, machine translation, reference translation, and its pos-tagged version. After keywords have been extracted, they are removed from the reference translation. Gap density can be varied, c.f. output with gap density of 30% and 70%:

Corpora in { gap } are large collections of texts enhanced with special markup. They allow linguists to search the texts by various { gap } in order to discover phenomena and patterns in the natural language.

Corpora in { gap } are large collections of { gap } with { gap }. They allow linguists to { gap } the { gap } by various parameters in order to { gap } and { gap } in the natural language.

Gap density can be specified in relation to the number of keywords, or to the total number of words in the text. In addition, the user may adjust gap contents by specifying parts of speech to be removed.

As an option, the users may select to view lemmas of omitted words in the gaps. In this case, the evaluators are required to fill in the correct grammatical forms of the words given. This may help to understand how well the MT system deals with translating grammar.

Multiple choice gaps

An additional task generation mode features multiple choice options for gaps. Each omitted word is assigned a list of similar words for the user to choose from during evaluation. The choices are picked from the same text by part of speech and grammar tags. A choice must be the same part of speech as the original word, and they should share as many grammatical features as possible. The approach is described in [3] and [4]. An example of keyword choices generated on the two sentences above (the number of choices can be specified):

special / large / natural, linguists / parameters / patterns, search / make / discover.

Progress

This section lists project progress according to the GSOC proposal.

Week 1: Created an algorithm for keyword extraction in simple gaps. Made a program that creates a gapped reference translation with keys given reference translation text and its tagged version.

Week 2: Multiple choice gaps: created an algorithm for finding similar words for multiple choice based on pos and grammar tags. Updated code to create tasks with multiple choice gaps. Added variable gap density. Added random word deletion (without keyword determination). Fixed bugs.

Week 3: Added task generation with lemmas in place of gaps. Added an option to select parts of speech to be removed. Adjusted keyword removal algorithm to calculate scores based on lemmas (thus account for different word forms).

Week 4: Added modules to generate text-based task sets with original text, reference translation with all supported gap types and optional machine translation. Added a module that calculates the number of correct answers from tasks filled out by evaluators.

Week 5: Created the command-line interface for task generation and answer checking.

References and links

  1. Current code on github
  2. Rose, Stuart, et al. "Automatic keyword extraction from individual documents." Text Mining (2010): 1-20.
  3. Trond Trosterud, Kevin Brubeck Unhammer. Evaluating North Sámi to Norwegian assimilation RBMT. Proceedings of the Third International Workshop on Free/Open-Source Rule-Based Machine Translation (FreeRBMT 2012); 06/2012
  4. Jim O'Regan and Mikel L. Forcada (2013) "Peeking through the language barrier: the development of a free/open-source gisting system for Basque to English based on apertium.org". Procesamiento del Lenguaje Natural 51, 15-22.