Difference between revisions of "Google Summer of Code/Midterm report 2011"

From Apertium
Jump to navigation Jump to search
Line 61: Line 61:
-------------------------------------------------------
-------------------------------------------------------
Number of words in reference: 456
Number of words in reference: 456
Number of words in test: 462
Number of words in test: 450
Number of unknown words (marked with a star) in test: 2
Number of unknown words (marked with a star) in test: 2
Percentage of unknown words: 0.43 %
Percentage of unknown words: 0.44 %


Results when removing unknown-word marks (stars)
Results when removing unknown-word marks (stars)
-------------------------------------------------------
-------------------------------------------------------
Edit distance: 303
Edit distance: 284
Word error rate (WER): 65.58 %
Word error rate (WER): 63.11 %
Number of position-independent word errors: 243
Number of position-independent word errors: 220
Position-independent word error rate (PER): 52.60 %
Position-independent word error rate (PER): 48.89 %


Statistics about the translation of unknown words
Statistics about the translation of unknown words
Line 76: Line 76:
Number of unknown words which were free rides: 0
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %
Percentage of unknown words that were free rides: 0.00 %

</pre>
</pre>



Revision as of 21:07, 13 July 2011

Language pairs

For language pairs, we have two tasks, for some pairs the task was to translate a news article without any diagnostics and evaluate the output. For the other pairs, the task was to create morphological analysers with 80% coverage.

Turkish → Azerbaijani

See also: apertium-tr-az/dev/midterm
Statistics about input files
-------------------------------------------------------
Number of words in reference: 356
Number of words in test: 364
Number of unknown words (marked with a star) in test: 4
Percentage of unknown words: 1.10 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 52
Word error rate (WER): 14.29 %
Number of position-independent word errors: 50
Position-independent word error rate (PER): 13.74 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 1
Percentage of unknown words that were free rides: 25.00 %

Turkish → Kyrgyz

See also: apertium-tr-ky/dev/midterm
Statistics about input files
-------------------------------------------------------
Number of words in reference: 380
Number of words in test: 371
Number of unknown words (marked with a star) in test: 5
Percentage of unknown words: 1.35 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 209
Word error rate (WER): 56.33 %
Number of position-independent word errors: 197
Position-independent word error rate (PER): 53.10 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %

Serbo-Croatian → Macedonian

Statistics about input files
-------------------------------------------------------
Number of words in reference: 456
Number of words in test: 450
Number of unknown words (marked with a star) in test: 2
Percentage of unknown words: 0.44 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 284
Word error rate (WER): 63.11 %
Number of position-independent word errors: 220
Position-independent word error rate (PER): 48.89 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %

Slovenian → Spanish

Statistics about input files
-------------------------------------------------------
Number of words in reference: 487
Number of words in test: 455
Number of unknown words (marked with a star) in test: 36
Percentage of unknown words: 7.91 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 424
Word error rate (WER): 93.19 %
Number of position-independent word errors: 365
Position-independent word error rate (PER): 80.22 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 11
Percentage of unknown words that were free rides: 30.56 %

Maltese → Hebrew

Bengali → English