Difference between revisions of "Google Summer of Code/Midterm report 2011"
Jump to navigation
Jump to search
Line 31: | Line 31: | ||
===Turkish → Kyrgyz=== |
===Turkish → Kyrgyz=== |
||
<pre> |
|||
Statistics about input files |
|||
------------------------------------------------------- |
|||
Number of words in reference: 380 |
|||
Number of words in test: 371 |
|||
Number of unknown words (marked with a star) in test: 5 |
|||
Percentage of unknown words: 1.35 % |
|||
Results when removing unknown-word marks (stars) |
|||
------------------------------------------------------- |
|||
Edit distance: 209 |
|||
Word error rate (WER): 56.33 % |
|||
Number of position-independent word errors: 197 |
|||
Position-independent word error rate (PER): 53.10 % |
|||
Statistics about the translation of unknown words |
|||
------------------------------------------------------- |
|||
Number of unknown words which were free rides: 0 |
|||
Percentage of unknown words that were free rides: 0.00 % |
|||
</pre> |
|||
===Serbo-Croatian → Macedonian=== |
===Serbo-Croatian → Macedonian=== |
Revision as of 11:33, 12 July 2011
Language pairs
For language pairs, we have two tasks, for some pairs the task was to translate a news article without any diagnostics and evaluate the output. For the other pairs, the task was to create morphological analysers with 80% coverage.
Turkish → Azerbaijani
Statistics about input files ------------------------------------------------------- Number of words in reference: 356 Number of words in test: 364 Number of unknown words (marked with a star) in test: 4 Percentage of unknown words: 1.10 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 52 Word error rate (WER): 14.29 % Number of position-independent word errors: 50 Position-independent word error rate (PER): 13.74 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 1 Percentage of unknown words that were free rides: 25.00 %
Turkish → Kyrgyz
Statistics about input files ------------------------------------------------------- Number of words in reference: 380 Number of words in test: 371 Number of unknown words (marked with a star) in test: 5 Percentage of unknown words: 1.35 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 209 Word error rate (WER): 56.33 % Number of position-independent word errors: 197 Position-independent word error rate (PER): 53.10 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %