Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.


From Apertium
Jump to: navigation, search

Apertium language modules and translation pairs are subject to the following types of evaluation:

  • Morphology coverage / regression testing
  • Size of system
    • Number of stems in lexc, monodix, bidix
    • Number of disambiguation rules
    • Number of lexical selection rules
    • Number of transfer rules
  • Naïve coverage
    • Monolingual naïve coverage
    • Trimmed naïve coverage (i.e., using a trimmed dictionary)
  • Accuracy of analyser
    • Precision/Recall/F-score
  • Accuracy of translation
    • Overall accuracy (over parallel corpora): WER/PER/BLEU
    • Regression tests (pairs of phrases or sentences)
  • Cleanliness of translation output
    • Testvoc

[edit] Morphology coverage

The tools we have for this are aq-morftest from Apertium quality and morph-test.py.

Two complaints: they don't support directionality restrictions on tests, and they don't return error codes.

[edit] Naïve coverage

In theory, aq-covtest does this, but mostly people write their own scripts.

A good generalised script that supports hfst and lttoolbox binaries and arbitrary corpora would be good. It should also (optionally) output hitparades (e.g., frequency lists of unknown forms in the corpus).

[edit] Translation accuracy

[edit] Translation cleanliness

There are several ways to test translation cleanliness that are good for different purposes:

  • morphology expansion testvoc ("standard testvoc")
  • prefixed morphology expansion testvoc ("testvoc lite")
  • corpus testvoc
Personal tools