Difference between revisions of "Курсы машинного перевода для языков России/Session 7"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
{{TOCD}}

Now that all of the basic aspects of creating a new MT system in Apertium have been covered, we come to the final, and possibly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The practical will involve working with some of the methods that we use to assure consistency and quality in Apertium. It will also cover quality evaluation.


==Theory==
==Theory==


===Consistency===
===Consistency===

====Self-contained system====


===Quality===
===Quality===




===Evaluation===
===Evaluation===

Revision as of 09:32, 9 January 2012

Now that all of the basic aspects of creating a new MT system in Apertium have been covered, we come to the final, and possibly most important one. This session will cover the question of why we need data consistency, what we mean by quality and how to perform an evaluation. The practical will involve working with some of the methods that we use to assure consistency and quality in Apertium. It will also cover quality evaluation.

Theory

Consistency

Self-contained system

Quality

Evaluation

Vocabulary coverage

The coverage of a system is an indication of how much of the vocabulary it covers in a given corpus or domain. For an idea of what this means, we will try translating a sentence with different levels of coverage:

Sentence Coverage
Селскостопанските отрасли в Косово и Македония ще получат тласък.
Селскостопанските отрасли en Косово и Македония ще получат тласък.
11%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
Селскостопанските отрасли en Косово y Македония ще получат тласък.
22%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
Селскостопанските отрасли en Косово y Македония получат тласък.
33%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
Селскостопанските отрасли en Косово y Македония recibirá тласък.
44%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
El agrícola отрасли en Косово y Македония recibirá тласък.
55%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
El agrícola отрасли en Косово y Македония recibirá empujón.
66%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
El sector agrícola en Косово y Македония recibirá empujón.
77%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
El sector agrícola en Косово y Macedonia recibirá empujón.
88%
Селскостопанските отрасли в Косово и Македония ще получат тласък.
El sector agrícola en Kosovo y Macedonia recibirá empujón.
100%

Usually, coverage is given over a set of sentences, or corpus, instead of over a single sentence. In Apertium, the baseline coverage for releasing a new prototype translator is around 80%, or 2 unknown words in 10 for a given corpus. This is not enough to make revision practical, except in the case of closely-related languages.

Error rate

While the coverage gives you an idea of how many words you will have to change in the best case, that is, that the rest of the translation is correct. A more accurate indication of how many words you will have to change when using the translator is given by post-edition word error rate (often abbreviated as wer). This is given as a percentage of changes (insertions, deletions, substitutions) between a machine translated sentence, and a sentence which has been revised by a human translator.

Taking the example above:

Changes wer
Original Селскостопанските отрасли в Косово и Македония ще получат тласък.
Machine translation El sector agrícola en Kosovo y Macedonia recibirá empujón.
   substitute El sector agricultura en Kosovo y Macedonia recibirá impulso. 2/9
   insert El sector de la agricultura en Kosovo y Macedonia recibirá un impulso. 3/9
Revised El sector de la agricultura en Kosovo y Macedonia recibirá un impulso. 5/9 55.56%

As with coverage, error rate evaluation is usually carried out on a corpus of sentences. So it gives you an indication of how many words you are likely to have to change in a given sentence.

When calculated over an appropriate corpus of the target translation domain, the combination of word error rate and coverage can give an idea of the usefulness of a machine translation system for a specific task. Of course, to determine if a system is useful for translators, a more thorough and case-specific evaluation needs to be made.


Practice

Word error rate

Apertium has a tool for calculating the Word error rate between a reference translation and a machine translation. The objective of this practical is to try it out on the system you have created.

You will need two reference translations. The first will be the "original" text in the target language, this was created without post-editting. The second will be a post-editted version of the machine translation text. When you are creating the post-editted version, take care to make only the minimal changes required to produce an adequate translation.