Meta-evaluation

Morphology coverage / regression testing
Size of system
- Number of stems in lexc, monodix, bidix
- Number of disambiguation rules
- Number of lexical selection rules
- Number of transfer rules
Naïve coverage
- Monolingual naïve coverage
- Trimmed naïve coverage (i.e., using a trimmed dictionary)
Accuracy of analyser
- Precision/Recall/F-score
Accuracy of translation
- Overall accuracy (over parallel corpora): WER/PER/BLEU
- Regression tests (pairs of phrases or sentences)
Cleanliness of translation output
- Testvoc

Morphology coverage[edit]

The tools we have for this are aq-morftest from Apertium quality and morph-test.py.

Two complaints: they don't support directionality restrictions on tests, and they don't return error codes.

Naïve coverage[edit]

In theory, aq-covtest does this, but mostly people write their own scripts.

A good generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., frequency lists of unknown forms in the corpus).

Translation accuracy[edit]

WER/PER: apertium-eval-translator.pl and apertium-eval-translator-line.pl work well but are a bit old, and could probably benefit from being rewritten in python
BLEU: nothing existing
Regression testing: we have some once-off scripts for this?

Translation cleanliness[edit]

There are several ways to test translation cleanliness that are good for different purposes:

morphology expansion testvoc ("standard testvoc")
prefixed morphology expansion testvoc ("testvoc lite")
corpus testvoc

Meta-evaluation

Contents

Morphology coverage[edit]

Naïve coverage[edit]

Translation accuracy[edit]

Translation cleanliness[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools