Difference between revisions of "Evalution kaz-tur Machine Translation System"
Purplemoon (talk | contribs) |
Purplemoon (talk | contribs) |
||
Line 21: | Line 21: | ||
The text files which used to calculate the evaluation was putted on github https://github.com/apertium/apertium-kaz-tur/tree/master/eval. |
The text files which used to calculate the evaluation was putted on github https://github.com/apertium/apertium-kaz-tur/tree/master/eval. |
||
Because the error rate was low for the new-apertium, we chose to do a differential evaluation compared to the old-apertium, to see if in fact, the error rate was actually lower. |
Because the error rate was low for the new-apertium, we chose to do a differential evaluation compared to the old-apertium, to see if in fact, the error rate was actually lower. The differential evaluation results add into apertium-kaz-tur in github https://github.com/apertium/apertium-kaz-tur/blob/master/eval/test2.txt |
||
We manually checked each of the translations output by the structural-transfer module to see if the applying rules were better or worse than the old apertium's structural-transfer module. We found out that the new module's rate was %93 and the old module's rate was %73 for both out of 100 sentences, there are %20 differences between two systems and about %7 sentences were bad in both output. |
We manually checked each of the translations output by the structural-transfer module to see if the applying rules were better or worse than the old apertium's structural-transfer module. We found out that the new module's rate was %93 and the old module's rate was %73 for both out of 100 sentences, there are %20 differences between two systems and about %7 sentences were bad in both output. |
Revision as of 19:49, 23 October 2018
The system has been evaluated by measuring the translation quality, the error rate of text produced by the system when comparing with postedited versions of them.
The translation quality was measured using two metrics, the first was word error rate (WER), and the second was position-independent word error rate (PER). Both metrics are based on the Levenshtein distance (Levenshtein, 1965). Metrics based on word error rate were chosen as to be able to compare the system against systems based on similar technology, and to assess the usefulness of the system in a real setting, that is of translating for dissemination.
System | WER(%) | PER(%) |
---|---|---|
new-system | 20.87 | 19.98 |
old-system | 45.77 | 41.69 |
Besides calculating WER and PER for our new apertium Kazakh-Turkish MT system, we did the same for old apertium Kazakh-Turkish MT system. The procedure was the same for both of them. We took a small (1,025 tokens) Kazakh text, which was a concatenation of several articles from Wikipedia and translated it using the two MT systems. The output of each system was postedited independently to avoid biasing in favour of one particular system. Then we calculated WER and PER for each using the apertium-eval-translator {http://wiki.apertium.org/wiki/apertium-eval-translator}.
The text files which used to calculate the evaluation was putted on github https://github.com/apertium/apertium-kaz-tur/tree/master/eval.
Because the error rate was low for the new-apertium, we chose to do a differential evaluation compared to the old-apertium, to see if in fact, the error rate was actually lower. The differential evaluation results add into apertium-kaz-tur in github https://github.com/apertium/apertium-kaz-tur/blob/master/eval/test2.txt
We manually checked each of the translations output by the structural-transfer module to see if the applying rules were better or worse than the old apertium's structural-transfer module. We found out that the new module's rate was %93 and the old module's rate was %73 for both out of 100 sentences, there are %20 differences between two systems and about %7 sentences were bad in both output.