Difference between revisions of "Курсы машинного перевода для языков России/Session 7"
Line 1: | Line 1: | ||
{{TOCD}} |
|||
Now that all of the basic aspects of creating a new MT system in Apertium have been covered, we come to the final, and possibly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The practical will involve working with some of the methods that we use to assure consistency and quality in Apertium. It will also cover quality evaluation. |
|||
==Theory== |
==Theory== |
||
===Consistency=== |
===Consistency=== |
||
====Self-contained system==== |
|||
===Quality=== |
===Quality=== |
||
===Evaluation=== |
===Evaluation=== |
Revision as of 09:32, 9 January 2012
Now that all of the basic aspects of creating a new MT system in Apertium have been covered, we come to the final, and possibly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The practical will involve working with some of the methods that we use to assure consistency and quality in Apertium. It will also cover quality evaluation.
Theory
Consistency
Self-contained system
Quality
Evaluation
Vocabulary coverage
The coverage of a system is an indication of how much of the vocabulary it covers in a given corpus or domain. For an idea of what this means, we will try translating a sentence with different levels of coverage:
Sentence | Coverage |
---|---|
Селскостопанските отрасли в Косово и Македония ще получат тласък. Селскостопанските отрасли en Косово и Македония ще получат тласък. |
11% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. Селскостопанските отрасли en Косово y Македония ще получат тласък. |
22% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. Селскостопанските отрасли en Косово y Македония получат тласък. |
33% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. Селскостопанските отрасли en Косово y Македония recibirá тласък. |
44% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. El agrícola отрасли en Косово y Македония recibirá тласък. |
55% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. El agrícola отрасли en Косово y Македония recibirá empujón. |
66% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. El sector agrícola en Косово y Македония recibirá empujón. |
77% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. El sector agrícola en Косово y Macedonia recibirá empujón. |
88% |
Селскостопанските отрасли в Косово и Македония ще получат тласък. El sector agrícola en Kosovo y Macedonia recibirá empujón. |
100% |
Usually, coverage is given over a set of sentences, or corpus, instead of over a single sentence. In Apertium, the baseline coverage for releasing a new prototype translator is around 80%, or 2 unknown words in 10 for a given corpus. This is not enough to make revision practical, except in the case of closely-related languages.
Error rate
While the coverage gives you an idea of how many words you will have to change in the best case, that is, that the rest of the translation is correct. A more accurate indication of how many words you will have to change when using the translator is given by post-edition word error rate (often abbreviated as wer). This is given as a percentage of changes (insertions, deletions, substitutions) between a machine translated sentence, and a sentence which has been revised by a human translator.
Taking the example above:
Changes | wer | ||
---|---|---|---|
Original | Селскостопанските отрасли в Косово и Македония ще получат тласък. | — | |
Machine translation | El sector agrícola en Kosovo y Macedonia recibirá empujón. | — | |
substitute | El sector agricultura en Kosovo y Macedonia recibirá impulso. | 2/9 | |
insert | El sector de la agricultura en Kosovo y Macedonia recibirá un impulso. | 3/9 | |
Revised | El sector de la agricultura en Kosovo y Macedonia recibirá un impulso. | 5/9 | 55.56% |
As with coverage, error rate evaluation is usually carried out on a corpus of sentences. So it gives you an indication of how many words you are likely to have to change in a given sentence.
When calculated over an appropriate corpus of the target translation domain, the combination of word error rate and coverage can give an idea of the usefulness of a machine translation system for a specific task. Of course, to determine if a system is useful for translators, a more thorough and case-specific evaluation needs to be made.
Practice
Word error rate
Apertium has a tool for calculating the Word error rate between a reference translation and a machine translation. The objective of this practical is to try it out on the system you have created.
You will need two reference translations. The first will be the "original" text in the target language, this was created without post-editting. The second will be a post-editted version of the machine translation text. When you are creating the post-editted version, take care to make only the minimal changes required to produce an adequate translation.