Difference between revisions of "Meta-evaluation"

From Apertium
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
<div style="float: right;">__TOC__</div>
 
Apertium language modules and translation pairs are subject to the following types of evaluation:
 
Apertium language modules and translation pairs are subject to the following types of evaluation:
   
Line 10: Line 11:
 
** Monolingual naïve coverage
 
** Monolingual naïve coverage
 
** Trimmed naïve coverage (i.e., using a trimmed dictionary)
 
** Trimmed naïve coverage (i.e., using a trimmed dictionary)
* Accuracy of analyses
+
* Accuracy of analyser
 
** Precision/Recall/F-score
 
** Precision/Recall/F-score
 
* Accuracy of translation
 
* Accuracy of translation
** WER/PER/BLEU
+
** Overall accuracy (over parallel corpora): WER/PER/BLEU
  +
** Regression tests (pairs of phrases or sentences)
* Clenliness of translation output
+
* Cleanliness of translation output
 
** Testvoc
 
** Testvoc
   
 
== Morphology coverage ==
 
== Morphology coverage ==
 
The tools we have for this are <code>aq-morftest</code> from [[Apertium quality]] and [[morph-test.py]].
 
The tools we have for this are <code>aq-morftest</code> from [[Apertium quality]] and [[morph-test.py]].
  +
  +
Two complaints: they don't support directionality restrictions on tests, and they don't return error codes.
   
 
== Naïve coverage ==
 
== Naïve coverage ==
Line 26: Line 30:
   
 
== Translation accuracy ==
 
== Translation accuracy ==
[[apertium-eval-translator.pl]] and [[apertium-eval-translator-line.pl]] work well but are a bit old, and could probably benefit from being rewritten in python
+
* WER/PER: [[apertium-eval-translator.pl]] and [[apertium-eval-translator-line.pl]] work well but are a bit old, and could probably benefit from being rewritten in python
  +
* BLEU: nothing existing
 
  +
* Regression testing: we have some once-off scripts for this?
   
 
== Translation cleanliness ==
 
== Translation cleanliness ==
   
There are several ways to test translation cleanliness. From simplest to most involved:
+
There are several ways to test translation cleanliness that are good for different purposes:
 
* morphology expansion testvoc ("standard testvoc")
 
* morphology expansion testvoc ("standard testvoc")
 
* prefixed morphology expansion testvoc ("testvoc lite")
 
* prefixed morphology expansion testvoc ("testvoc lite")
 
* corpus testvoc
 
* corpus testvoc
  +
  +
[[Category:Evaluation]]
  +
[[Category:Quality control]]

Latest revision as of 02:32, 1 June 2019

Apertium language modules and translation pairs are subject to the following types of evaluation:

  • Morphology coverage / regression testing
  • Size of system
    • Number of stems in lexc, monodix, bidix
    • Number of disambiguation rules
    • Number of lexical selection rules
    • Number of transfer rules
  • Naïve coverage
    • Monolingual naïve coverage
    • Trimmed naïve coverage (i.e., using a trimmed dictionary)
  • Accuracy of analyser
    • Precision/Recall/F-score
  • Accuracy of translation
    • Overall accuracy (over parallel corpora): WER/PER/BLEU
    • Regression tests (pairs of phrases or sentences)
  • Cleanliness of translation output
    • Testvoc

Morphology coverage[edit]

The tools we have for this are aq-morftest from Apertium quality and morph-test.py.

Two complaints: they don't support directionality restrictions on tests, and they don't return error codes.

Naïve coverage[edit]

In theory, aq-covtest does this, but mostly people write their own scripts.

A good generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., frequency lists of unknown forms in the corpus).

Translation accuracy[edit]

Translation cleanliness[edit]

There are several ways to test translation cleanliness that are good for different purposes:

  • morphology expansion testvoc ("standard testvoc")
  • prefixed morphology expansion testvoc ("testvoc lite")
  • corpus testvoc