Difference between revisions of "Google Summer of Code/Midterm report 2011"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| Line 60: | Line 60: | ||
| Statistics about input files | Statistics about input files | ||
| ------------------------------------------------------- | ------------------------------------------------------- | ||
| Number of words in reference:  | Number of words in reference: 456 | ||
| Number of words in test:  | Number of words in test: 432 | ||
| Number of unknown words (marked with a star) in test: 2 | Number of unknown words (marked with a star) in test: 2 | ||
| Percentage of unknown words: 0. | Percentage of unknown words: 0.46 % | ||
| Results when removing unknown-word marks (stars) | Results when removing unknown-word marks (stars) | ||
| ------------------------------------------------------- | ------------------------------------------------------- | ||
| Edit distance:  | Edit distance: 305 | ||
| Word error rate (WER):  | Word error rate (WER): 70.60 % | ||
| Number of position-independent word errors:  | Number of position-independent word errors: 262 | ||
| Position-independent word error rate (PER):  | Position-independent word error rate (PER): 60.65 % | ||
| Statistics about the translation of unknown words | Statistics about the translation of unknown words | ||
Revision as of 15:17, 12 July 2011
Language pairs
For language pairs, we have two tasks, for some pairs the task was to translate a news article without any diagnostics and evaluate the output. For the other pairs, the task was to create morphological analysers with 80% coverage.
Turkish → Azerbaijani
- See also: apertium-tr-az/dev/midterm
Statistics about input files ------------------------------------------------------- Number of words in reference: 356 Number of words in test: 364 Number of unknown words (marked with a star) in test: 4 Percentage of unknown words: 1.10 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 52 Word error rate (WER): 14.29 % Number of position-independent word errors: 50 Position-independent word error rate (PER): 13.74 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 1 Percentage of unknown words that were free rides: 25.00 %
Turkish → Kyrgyz
- See also: apertium-tr-ky/dev/midterm
Statistics about input files ------------------------------------------------------- Number of words in reference: 380 Number of words in test: 371 Number of unknown words (marked with a star) in test: 5 Percentage of unknown words: 1.35 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 209 Word error rate (WER): 56.33 % Number of position-independent word errors: 197 Position-independent word error rate (PER): 53.10 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %
Serbo-Croatian → Macedonian
Statistics about input files ------------------------------------------------------- Number of words in reference: 456 Number of words in test: 432 Number of unknown words (marked with a star) in test: 2 Percentage of unknown words: 0.46 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 305 Word error rate (WER): 70.60 % Number of position-independent word errors: 262 Position-independent word error rate (PER): 60.65 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0.00 %
Slovenian → Spanish
Statistics about input files ------------------------------------------------------- Number of words in reference: 487 Number of words in test: 454 Number of unknown words (marked with a star) in test: 39 Percentage of unknown words: 8.59 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 438 Word error rate (WER): 96.48 % Number of position-independent word errors: 398 Position-independent word error rate (PER): 87.67 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 11 Percentage of unknown words that were free rides: 28.21 %

