Translation quality statistics
This page aims to give an overview of the quality of various translators available in the Apertium platform.
These measures are used in the table below:
- Word Error Rate (WER) and Position-independent Word Error Rate (PWER) are measures of post-edition effort. The number gives the expected number of words needed to be corrected in 100 words of running text. So, a WER of 4.7% indicates that in a given 100 words of text, 4.7 of them will need to be corrected by the post-editor – for WER, lower is better.
- Bilingual Evaluation Understudy (BLEU) varies from 0 (bad) to 1 (perfect), so for BLEU, higher is better.
Precise numbers may vary due to differences in how sentences are selected to be evaluated. In some pairs, unknown words may be taken into account, in others not. Evaluations where unknown words are allowed will likely give me accurate numbers for postedition error, providing the corpus on which the evaluation was made resembles the corpus on which further translations will be made. Evaluations not allowing unknown words will give a better indication of "best-case" working of transfer rules.
|WER||PWER||BLEU||Reference / Notes|
||11th February 2011||fr → eo||Yes||22.4 %||20.6 %||-||French_and_Esperanto/Quality_tests|
|eo → fr||-||-||-|
||19th September 2010||0.1.0||mk → en||No||43.96%||31.22%||-||Percentage is average of 1,000 words from SETimes and 1,000 from Wikipedia|
|en → mk||-||-|
||31st August 2010||0.1.0||mk → bg||Yes||26.67 %||25.39 %||-||-|
|bg → mk||-||-|
||12th October 2009||0.6.1||nno → nob||Yes||-||-||-||Unhammer and Trosterud, 2009|
(two reference translations; as of Nov 2021 the same test set gives 0.862±0.011)
|nob → nno||32.5%, 17.7%||-||0.74|
||March 2010||0.2.0||br → fr||No||38 %||22 %||-||Tyers, 2010|
|fr → br||-||-||-|
||12th October 2009||0.5.0||sv → da||Yes||30.3 %||27.7 %||-||Swedish_and_Danish/Evaluation|
|da → sv||-||-||-|
||2nd September 2009||eu → es||Unknown||72.4 %||39.8 %||-||Ginestí-Rosell et al., 2009|
|es → eu||-||-||-|
||2nd January 2009||cy → en||Unknown||55.7 %||30.5 %||-||Tyers and Donnelly, 2009|
|en → cy||-||-||-|
||8th May 2009||0.9.0||en → eo||Unknown||21.0 %||19,0 %||-||English_and_Esperanto/Evaluation|
|eo → en||-||-||-|
||15th May 2006||es → pt||Unknown||4.7 %||-||-||Armentano et al., 2006|
|pt → es||11.3 %||-||-|
||10th May 2006||oc → ca||Unknown||9.6 %||-||-||Armentano and Forcada, 2006|
|ca → oc||-||-||-|
||28th July 2008||pt → ca||Unknown||16.6%||-||-||Armentano and Forcada, 2008|
|ca → pt||14.1%||-||-|
||May 2009||en → es||Unknown||-||-||0.1851||Sánchez-Martínez, 2009|
|es → en||-||-||0.1881|
Coverage and Dictionary size
The number of entries in a dictionary, as well as the number of corpus forms that get some analysis, may give an indication of the maturity of a language pair.
For most dictionaries, there are numbers at least for dictionary size on the wiki, some also have coverage stats, see Category:Datastats. The stats of a certain apertium package will have a page with that package name (the name of the repository in GitHub) followed by "/stats", e.g. apertium-es-ca/stats. Some language pairs are split into several packages, so for nno-nob, there are pages apertium-nno, apertium-nob and apertium-nno-nob, but for dictionary counts you should consult the last one.
(Stats pages currently do not show number of CG or transfer rules.)
The apertium.org usage stats give some indication of which pairs have the most users, which in turn might say something about quality. However, there may be various reasons for why a pair sees a lot or little use:
- some pairs are only offered for free by Apertium (e.g. nob-nno)
- for some pairs, there are very few speakers of one of the languages (though the pair itself may have high quality)
- and, of course some pairs simply have very good quality (e.g. spa-cat)
- Armentano-Oller, C., Carrasco, R. C. Corbí-Bellot, A. M., Forcada, M. L., Ginestí-Rosell, M., Ortiz-Rojas, S., Pérez-Ortiz, J. A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M. A. (2006) "Open-source Portuguese-Spanish machine translation", in In Lecture Notes in Computer Science 3960 (Computational Processing of the Portuguese Language, Proceedings of the 7th International Workshop on Computational Processing of Written and Spoken Portuguese, PROPOR 2006), May 13-17, 2006, ME - RJ / Itatiaia, Rio de Janeiro, Brazil. , p. 50-59
- Armentano-Oller, C. and Forcada, M. L. (2006) "Open-source machine translation between small languages: Catalan and Aranese Occitan", in Strategies for developing machine translation for minority languages (5th SALTMIL workshop on Minority Languages) (organized in conjunction with LREC 2006 (22-28.05.2006)) , p. 51-54
- Armentano-Oller, C., M.L. Forcada, "Reutilización de datos lingüísticos para la creación de un sistema de traducción automática para un nuevo par de lenguas", Procesamiento del Lenguaje Natural, :41, 243-250
- Ginestí-Rosell, M. and Ramírez-Sánchez, G. and Ortiz-Rojas, S. and Tyers, F. M. and Forcada, M. L. (2009) "Development of a free Basque to Spanish machine translation system". Procesamiento de Lenguaje Natural. No. 43, pp. 185--197
- Tyers, F. M. and Donnelly, K. (2009) "apertium-cy - a collaboratively-developed free RBMT system for Welsh to English". The Prague Bulletin of Mathematical Linguistics No. 91, pp. 57--66.
- Sánchez-Martínez Felipe; Mikel L. Forcada; Andy Way. "Hybrid rule-based ‒ example-based MT: Feeding Apertium with sub-sentential translation units". In Proceedings of the 3rd Workshop on Example-Based Machine Translation, p. 11-18, November 12-13, 2009, Dublin, Ireland.
- Unhammer, Kevin; Trosterud, Trond. "Reuse of free resources in machine translation between Nynorsk and Bokmål". In: Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation / Edited by Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M. Tyers. Alicante : Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, 2009, pp. 35-42