In natural languages, we can have many different types of ambiguity:
- part of speech ambiguity
- word-sense / lexical selection ambiguity
- syntactic ambiguity
- pragmatic ambiguity
Part of Speech ambiguity
The same form of a word may be a noun or a verb, plural or singular, etc.
To give an example, the form banks may be either a plural noun or third person present tense verb:
- During the financial crisis, banks received $1.2 trillion in loans from the Government (noun)
- When a plane turns, it banks to give the lifting force of the wings a horizontal component (verb)
In apertium, the output from the morphological analyser is ambiguous with respect to Part of Speech, and shows this PoS-ambiguity by giving several analyses for one word:
$ echo banks|apertium -d trunk/apertium-en-es/ en-es-anmor ^banks/bank<n><pl>/bank<vblex><pri><p3><sg>$
The "PoS-tagger" (PoS-disambiguator) removes the ambiguity by selecting the most likely (and hopefully the correct) analysis:
$ echo The banks|apertium -d trunk/apertium-en-es/ en-es-tagger ^The<det><def><sp>$ ^bank<n><pl>$
Note that we still call it PoS-ambiguity if the ambiguity is in "subtags" like infinitive vs present (bank can be not only a noun or a present tense verb, but also an infinitive verb).
Word-Sense / Lexical Selection ambiguity
One form of a word, with a certain part of speech, can still have several possible meanings, and might have several possible translations.
Say that we know from context that the form banks is the plural noun (perhaps the previous word was the). It still has two possible meanings: "river bank", or "financial bank":
- She put the child in it and placed it among the reeds by the banks (river)
- During the financial crisis, banks received $1.2 trillion in loans from the Government (financial)
If we were translating to Spanish, this word sense ambiguity becomes important, since the first sense translates to 'orillas', while the second translates to 'bancos'.
Note: just because a dictionary lists a word sense distinction, it isn't necessarily relevant for machine translation! We don't have to figure out whether banks refers to the buildings of the banks or the abstract financial institutions, it still translates to 'bancos'. For this reason, when we're talking about machine translation, we don't talk about word sense disambiguation, but lexical selection, selecting the best possible translation of a certain PoS analysis of a form.
Lexical selection happens after PoS disambiguation, either before or after bidix lookup (word-translation). In general, good lexical selection is much less important for getting a good translation than good PoS disambiguation.
Currently there is no agreed-upon lexical selection module in Apertium (different pairs use different methods), but see Lexical selection for a list of methods, some more experimental than others.