Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Meta-evaluation

From Apertium
(Difference between revisions)
Jump to: navigation, search
(Translation accuracy)
(Translation accuracy)
 
(9 intermediate revisions by one user not shown)
Line 1: Line 1:
  +
<div style="float: right;">__TOC__</div>
 
Apertium language modules and translation pairs are subject to the following types of evaluation:
 
Apertium language modules and translation pairs are subject to the following types of evaluation:
   
Line 10: Line 11:
 
** Monolingual naïve coverage
 
** Monolingual naïve coverage
 
** Trimmed naïve coverage (i.e., using a trimmed dictionary)
 
** Trimmed naïve coverage (i.e., using a trimmed dictionary)
* Accuracy of analyses
+
* Accuracy of analyser
 
** Precision/Recall/F-score
 
** Precision/Recall/F-score
 
* Accuracy of translation
 
* Accuracy of translation
** WER/PER/BLEU
+
** Overall accuracy (over parallel corpora): WER/PER/BLEU
* Clenliness of translation output
+
** Regression tests (pairs of phrases or sentences)
  +
* Cleanliness of translation output
 
** Testvoc
 
** Testvoc
   
 
== Morphology coverage ==
 
== Morphology coverage ==
 
The tools we have for this are <code>aq-morftest</code> from [[Apertium quality]] and [[morph-test.py]].
 
The tools we have for this are <code>aq-morftest</code> from [[Apertium quality]] and [[morph-test.py]].
  +
  +
Two complaints: they don't support directionality restrictions on tests, and they don't return error codes.
   
 
== Naïve coverage ==
 
== Naïve coverage ==
Line 26: Line 29:
   
 
== Translation accuracy ==
 
== Translation accuracy ==
[[apertium-eval-translator.pl]] and [[apertium-eval-translator-line.pl]] work well but are a bit old, and could probably benefit from being rewritten in python
+
* WER/PER: [[apertium-eval-translator.pl]] and [[apertium-eval-translator-line.pl]] work well but are a bit old, and could probably benefit from being rewritten in python
+
* BLEU: nothing existing
  +
* Regression testing: we have some once-off scripts for this?
   
 
== Translation cleanliness ==
 
== Translation cleanliness ==
   
There are several ways to test translation cleanliness. From simplest to most involved:
+
There are several ways to test translation cleanliness that are good for different purposes:
* morphology expansion testvoc
+
* morphology expansion testvoc ("standard testvoc")
  +
* prefixed morphology expansion testvoc ("testvoc lite")
 
* corpus testvoc
 
* corpus testvoc
  +
  +
[[Category:Evaluation]]
  +
[[Category:Quality control]]

Latest revision as of 04:32, 1 June 2019

Apertium language modules and translation pairs are subject to the following types of evaluation:

  • Morphology coverage / regression testing
  • Size of system
    • Number of stems in lexc, monodix, bidix
    • Number of disambiguation rules
    • Number of lexical selection rules
    • Number of transfer rules
  • Naïve coverage
    • Monolingual naïve coverage
    • Trimmed naïve coverage (i.e., using a trimmed dictionary)
  • Accuracy of analyser
    • Precision/Recall/F-score
  • Accuracy of translation
    • Overall accuracy (over parallel corpora): WER/PER/BLEU
    • Regression tests (pairs of phrases or sentences)
  • Cleanliness of translation output
    • Testvoc

[edit] Morphology coverage

The tools we have for this are aq-morftest from Apertium quality and morph-test.py.

Two complaints: they don't support directionality restrictions on tests, and they don't return error codes.

[edit] Naïve coverage

In theory, aq-covtest does this, but mostly people write their own scripts.

A good generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., frequency lists of unknown forms in the corpus).

[edit] Translation accuracy

[edit] Translation cleanliness

There are several ways to test translation cleanliness that are good for different purposes:

  • morphology expansion testvoc ("standard testvoc")
  • prefixed morphology expansion testvoc ("testvoc lite")
  • corpus testvoc
Personal tools