Difference between revisions of "Google Summer of Code/Midterm report 2011"

From Apertium
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 6: Line 6:
   
 
===Turkish → Azerbaijani===
 
===Turkish → Azerbaijani===
  +
  +
:See also: [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tr-az/dev/midterm apertium-tr-az/dev/midterm]
   
 
<pre>
 
<pre>
Line 29: Line 31:
   
 
===Turkish → Kyrgyz===
 
===Turkish → Kyrgyz===
  +
  +
:See also: [https://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-tr-ky/dev/midterm apertium-tr-ky/dev/midterm]
   
 
<pre>
 
<pre>
Line 53: Line 57:
 
===Serbo-Croatian → Macedonian===
 
===Serbo-Croatian → Macedonian===
   
  +
<pre>
  +
Statistics about input files
  +
-------------------------------------------------------
  +
Number of words in reference: 456
  +
Number of words in test: 453
  +
Number of unknown words (marked with a star) in test: 2
  +
Percentage of unknown words: 0.44 %
  +
  +
Results when removing unknown-word marks (stars)
  +
-------------------------------------------------------
  +
Edit distance: 275
  +
Word error rate (WER): 60.71 %
  +
Number of position-independent word errors: 207
  +
Position-independent word error rate (PER): 45.70 %
  +
  +
Statistics about the translation of unknown words
  +
-------------------------------------------------------
  +
Number of unknown words which were free rides: 0
  +
Percentage of unknown words that were free rides: 0.00 %
  +
</pre>
   
 
===Slovenian → Spanish===
 
===Slovenian → Spanish===
Line 60: Line 84:
 
-------------------------------------------------------
 
-------------------------------------------------------
 
Number of words in reference: 487
 
Number of words in reference: 487
Number of words in test: 454
+
Number of words in test: 457
Number of unknown words (marked with a star) in test: 39
+
Number of unknown words (marked with a star) in test: 33
Percentage of unknown words: 8.59 %
+
Percentage of unknown words: 7.22 %
   
 
Results when removing unknown-word marks (stars)
 
Results when removing unknown-word marks (stars)
 
-------------------------------------------------------
 
-------------------------------------------------------
Edit distance: 438
+
Edit distance: 420
Word error rate (WER): 96.48 %
+
Word error rate (WER): 91.90 %
Number of position-independent word errors: 398
+
Number of position-independent word errors: 372
Position-independent word error rate (PER): 87.67 %
+
Position-independent word error rate (PER): 81.40 %
   
 
Statistics about the translation of unknown words
 
Statistics about the translation of unknown words
 
-------------------------------------------------------
 
-------------------------------------------------------
 
Number of unknown words which were free rides: 11
 
Number of unknown words which were free rides: 11
Percentage of unknown words that were free rides: 28.21 %
+
Percentage of unknown words that were free rides: 33.33 %
  +
 
</pre>
 
</pre>
   
Line 83: Line 108:
   
   
[[Category:Google Summer of Code|Midterm report 2011]]
+
[[Category:Google Summer of Code|2011, Midterm report]]

Latest revision as of 00:19, 3 July 2012

Language pairs[edit]

For language pairs, we have two tasks, for some pairs the task was to translate a news article without any diagnostics and evaluate the output. For the other pairs, the task was to create morphological analysers with 80% coverage.

Turkish → Azerbaijani[edit]

See also: apertium-tr-az/dev/midterm
Statistics about input files
-------------------------------------------------------
Number of words in reference: 356
Number of words in test: 364
Number of unknown words (marked with a star) in test: 4
Percentage of unknown words: 1.10 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 52
Word error rate (WER): 14.29 %
Number of position-independent word errors: 50
Position-independent word error rate (PER): 13.74 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 1
Percentage of unknown words that were free rides: 25.00 %

Turkish → Kyrgyz[edit]

See also: apertium-tr-ky/dev/midterm
Statistics about input files
-------------------------------------------------------
Number of words in reference: 380
Number of words in test: 371
Number of unknown words (marked with a star) in test: 5
Percentage of unknown words: 1.35 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 209
Word error rate (WER): 56.33 %
Number of position-independent word errors: 197
Position-independent word error rate (PER): 53.10 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %

Serbo-Croatian → Macedonian[edit]

Statistics about input files
-------------------------------------------------------
Number of words in reference: 456
Number of words in test: 453
Number of unknown words (marked with a star) in test: 2
Percentage of unknown words: 0.44 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 275
Word error rate (WER): 60.71 %
Number of position-independent word errors: 207
Position-independent word error rate (PER): 45.70 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 0
Percentage of unknown words that were free rides: 0.00 %

Slovenian → Spanish[edit]

Statistics about input files
-------------------------------------------------------
Number of words in reference: 487
Number of words in test: 457
Number of unknown words (marked with a star) in test: 33
Percentage of unknown words: 7.22 %

Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 420
Word error rate (WER): 91.90 %
Number of position-independent word errors: 372
Position-independent word error rate (PER): 81.40 %

Statistics about the translation of unknown words
-------------------------------------------------------
Number of unknown words which were free rides: 11
Percentage of unknown words that were free rides: 33.33 %

Maltese → Hebrew[edit]

Bengali → English[edit]