Difference between revisions of "Google Summer of Code/Midterm report 2011"
Jump to navigation
Jump to search
m |
|||
(14 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
===Turkish → Azerbaijani=== |
===Turkish → Azerbaijani=== |
||
+ | :See also: [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tr-az/dev/midterm apertium-tr-az/dev/midterm] |
||
+ | |||
+ | <pre> |
||
+ | Statistics about input files |
||
+ | ------------------------------------------------------- |
||
+ | Number of words in reference: 356 |
||
+ | Number of words in test: 364 |
||
+ | Number of unknown words (marked with a star) in test: 4 |
||
+ | Percentage of unknown words: 1.10 % |
||
+ | |||
+ | Results when removing unknown-word marks (stars) |
||
+ | ------------------------------------------------------- |
||
+ | Edit distance: 52 |
||
+ | Word error rate (WER): 14.29 % |
||
+ | Number of position-independent word errors: 50 |
||
+ | Position-independent word error rate (PER): 13.74 % |
||
+ | |||
+ | Statistics about the translation of unknown words |
||
+ | ------------------------------------------------------- |
||
+ | Number of unknown words which were free rides: 1 |
||
+ | Percentage of unknown words that were free rides: 25.00 % |
||
+ | </pre> |
||
===Turkish → Kyrgyz=== |
===Turkish → Kyrgyz=== |
||
+ | :See also: [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tr-ky/dev/midterm apertium-tr-ky/dev/midterm] |
||
+ | |||
+ | <pre> |
||
+ | Statistics about input files |
||
+ | ------------------------------------------------------- |
||
+ | Number of words in reference: 380 |
||
+ | Number of words in test: 371 |
||
+ | Number of unknown words (marked with a star) in test: 5 |
||
+ | Percentage of unknown words: 1.35 % |
||
+ | |||
+ | Results when removing unknown-word marks (stars) |
||
+ | ------------------------------------------------------- |
||
+ | Edit distance: 209 |
||
+ | Word error rate (WER): 56.33 % |
||
+ | Number of position-independent word errors: 197 |
||
+ | Position-independent word error rate (PER): 53.10 % |
||
+ | |||
+ | Statistics about the translation of unknown words |
||
+ | ------------------------------------------------------- |
||
+ | Number of unknown words which were free rides: 0 |
||
+ | Percentage of unknown words that were free rides: 0.00 % |
||
+ | </pre> |
||
===Serbo-Croatian → Macedonian=== |
===Serbo-Croatian → Macedonian=== |
||
+ | <pre> |
||
+ | Statistics about input files |
||
+ | ------------------------------------------------------- |
||
+ | Number of words in reference: 456 |
||
+ | Number of words in test: 453 |
||
+ | Number of unknown words (marked with a star) in test: 2 |
||
+ | Percentage of unknown words: 0.44 % |
||
+ | |||
+ | Results when removing unknown-word marks (stars) |
||
+ | ------------------------------------------------------- |
||
+ | Edit distance: 275 |
||
+ | Word error rate (WER): 60.71 % |
||
+ | Number of position-independent word errors: 207 |
||
+ | Position-independent word error rate (PER): 45.70 % |
||
+ | |||
+ | Statistics about the translation of unknown words |
||
+ | ------------------------------------------------------- |
||
+ | Number of unknown words which were free rides: 0 |
||
+ | Percentage of unknown words that were free rides: 0.00 % |
||
+ | </pre> |
||
===Slovenian → Spanish=== |
===Slovenian → Spanish=== |
||
+ | <pre> |
||
+ | Statistics about input files |
||
+ | ------------------------------------------------------- |
||
+ | Number of words in reference: 487 |
||
+ | Number of words in test: 457 |
||
+ | Number of unknown words (marked with a star) in test: 33 |
||
+ | Percentage of unknown words: 7.22 % |
||
+ | |||
+ | Results when removing unknown-word marks (stars) |
||
+ | ------------------------------------------------------- |
||
+ | Edit distance: 420 |
||
+ | Word error rate (WER): 91.90 % |
||
+ | Number of position-independent word errors: 372 |
||
+ | Position-independent word error rate (PER): 81.40 % |
||
+ | |||
+ | Statistics about the translation of unknown words |
||
+ | ------------------------------------------------------- |
||
+ | Number of unknown words which were free rides: 11 |
||
+ | Percentage of unknown words that were free rides: 33.33 % |
||
+ | |||
+ | </pre> |
||
===Maltese → Hebrew=== |
===Maltese → Hebrew=== |
||
Line 23: | Line 108: | ||
− | [[Category:Google Summer of Code|Midterm report |
+ | [[Category:Google Summer of Code|2011, Midterm report]] |
Latest revision as of 00:19, 3 July 2012
Language pairs[edit]
For language pairs, we have two tasks, for some pairs the task was to translate a news article without any diagnostics and evaluate the output. For the other pairs, the task was to create morphological analysers with 80% coverage.
Turkish → Azerbaijani[edit]
- See also: apertium-tr-az/dev/midterm
Statistics about input files ------------------------------------------------------- Number of words in reference: 356 Number of words in test: 364 Number of unknown words (marked with a star) in test: 4 Percentage of unknown words: 1.10 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 52 Word error rate (WER): 14.29 % Number of position-independent word errors: 50 Position-independent word error rate (PER): 13.74 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 1 Percentage of unknown words that were free rides: 25.00 %
Turkish → Kyrgyz[edit]
- See also: apertium-tr-ky/dev/midterm
Statistics about input files ------------------------------------------------------- Number of words in reference: 380 Number of words in test: 371 Number of unknown words (marked with a star) in test: 5 Percentage of unknown words: 1.35 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 209 Word error rate (WER): 56.33 % Number of position-independent word errors: 197 Position-independent word error rate (PER): 53.10 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %
Serbo-Croatian → Macedonian[edit]
Statistics about input files ------------------------------------------------------- Number of words in reference: 456 Number of words in test: 453 Number of unknown words (marked with a star) in test: 2 Percentage of unknown words: 0.44 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 275 Word error rate (WER): 60.71 % Number of position-independent word errors: 207 Position-independent word error rate (PER): 45.70 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %
Slovenian → Spanish[edit]
Statistics about input files ------------------------------------------------------- Number of words in reference: 487 Number of words in test: 457 Number of unknown words (marked with a star) in test: 33 Percentage of unknown words: 7.22 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 420 Word error rate (WER): 91.90 % Number of position-independent word errors: 372 Position-independent word error rate (PER): 81.40 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 11 Percentage of unknown words that were free rides: 33.33 %