Difference between revisions of "English and Italian/Google Translate"
Jump to navigation
Jump to search
(new evaluation) |
(→First revision: common errors before I forget) |
||
Line 14: | Line 14: | ||
== First revision == |
== First revision == |
||
Common errors found: |
|||
*missing concordance of singular/plural and male/female between noun and adjective/pronoun; |
|||
*articles, especially definite article vs. no article; |
|||
*co-ordinated sentences and pronouns (those... who and the like). |
|||
<pre> |
<pre> |
Revision as of 11:00, 20 March 2014
A basic evaluation of the Google Translate translation from English to Italian was made on 2014-03-20 according to Evaluation instructions and apertium-eval-translator.pl from latest trunk. We found a 21.63 % WER.
Method
As a base we used about 1000 words of an English leaflet (originally translated from German, which accounts for some peculiarities) by Wikimedia and Creative Commons (which accounts for some specialized terminology): Google Translate, manual corrections.
Considerations:
- only agrammatical passages and turns of grammatical meaning were corrected,
- as well as some inconsistencies in translation and major lexical errors which didn't convey the original meaning at all;
- but errors which would not be evident without knowing the source were left alone, as well as lexical choices which are disputable but not outright wrong,
- and the text wasn't made as fluent as it would be required to completely cover the machine translation origin.
The second result was calculated after removing the whitespace incorrectly added around punctuation; the difference is very significant, confirming our choice not to correct such whitespace errors to avoid excess noise in the evaluation.
First revision
Common errors found:
- missing concordance of singular/plural and male/female between noun and adjective/pronoun;
- articles, especially definite article vs. no article;
- co-ordinated sentences and pronouns (those... who and the like).
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt Test file: 'MT.txt' Reference file 'postedit.txt' Statistics about input files ------------------------------------------------------- Number of words in reference: 994 Number of words in test: 984 Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 215 Word error rate (WER): 21.63 % Number of position-independent correct words: 862 Position-independent word error rate (PER): 13.28 % Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 215 Word Error Rate (WER): 21.63 % Number of position-independent correct words: 862 Position-independent word error rate (PER): 13.28 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0%
Second revision
$ perl apertium-eval-translator.pl -test MT.txt -ref postedit.txt Test file: 'MT.txt' Reference file 'postedit.txt' Statistics about input files ------------------------------------------------------- Number of words in reference: 915 Number of words in test: 984 Number of unknown words (marked with a star) in test: Percentage of unknown words: 0.00 % Results when removing unknown-word marks (stars) ------------------------------------------------------- Edit distance: 345 Word error rate (WER): 37.70 % Number of position-independent correct words: 719 Position-independent word error rate (PER): 28.96 % Results when unknown-word marks (stars) are not removed ------------------------------------------------------- Edit distance: 345 Word Error Rate (WER): 37.70 % Number of position-independent correct words: 719 Position-independent word error rate (PER): 28.96 % Statistics about the translation of unknown words ------------------------------------------------------- Number of unknown words which were free rides: 0 Percentage of unknown words that were free rides: 0%