In natural languages, we can have many different types of ambiguity, e.g.:
- part of speech ambiguity
- word-sense / lexical selection ambiguity
- syntactic ambiguity
- pragmatic ambiguity
Part of Speech ambiguity
The same form of a word may be a noun or a verb, plural or singular, etc.
To give an example, the form banks may be either a plural noun or third person present tense verb:
- During the financial crisis, banks received $1.2 trillion in loans from the Government (noun)
- When a plane turns, it banks to give the lifting force of the wings a horizontal component (verb)
In apertium, the output from the morphological analyser is ambiguous with respect to Part of Speech, and shows this PoS-ambiguity by giving several analyses for one word:
$ echo banks|apertium -d trunk/apertium-en-es/ en-es-anmor ^banks/bank<n><pl>/bank<vblex><pri><p3><sg>$
The "PoS-tagger" (PoS-disambiguator) removes the ambiguity by selecting the most likely (and hopefully the correct) analysis:
$ echo The banks|apertium -d trunk/apertium-en-es/ en-es-tagger ^The<det><def><sp>$ ^bank<n><pl>$
Note that we still call it PoS-ambiguity if the ambiguity is in "subtags" like infinitive vs present (bank can be not only a noun or a present tense verb, but also an infinitive verb).
Word-Sense / Lexical Selection ambiguity
One form of a word, with a certain part of speech, can still have several possible meanings, and might have several possible translations.
Say that we know from context that the form banks is the plural noun (perhaps the previous word was the). It still has two possible meanings: "river bank", or "financial bank":
- She put the child in it and placed it among the reeds by the banks (river)
- During the financial crisis, banks received $1.2 trillion in loans from the Government (financial)
If we were translating to Spanish, this word sense ambiguity becomes important, since the first sense translates to 'orillas', while the second translates to 'bancos'.
Note: just because a dictionary lists a word sense distinction, it isn't necessarily relevant for machine translation! We don't have to figure out whether banks refers to the buildings of the banks or the abstract financial institutions, it still translates to 'bancos'. For this reason, when we're talking about machine translation, we don't talk about word sense disambiguation, but lexical selection, selecting the best possible translation of a certain PoS analysis of a form. In general, good lexical selection is less important for getting a good translation than good PoS disambiguation.
In Apertium, lexical selection (with
lrx-proc) happens after bidix lookup (word-translation), but before structural transfer rules. See Lexical selection for more information.
Consider the sentence Umberto saw the man with the spyglass. Here, there are two equally feasible interpretations: Umberto saw the man who had the spyglass, or Umberto used a spyglass to see the man. This is a syntactic ambiguity, since the phrase with the spyglass can be syntactically attached to (or dependent on, if that's more in line with your favourite theory of syntax) the phrase the man or the verb saw.
This is just one of many kinds of syntactic ambiguity.
Pragmatic ambiguity is amguity which is dependent on context.
An example: If someone asks you "Headlights on?", they might be reminding you to turn them off, or on, or they might be asking if they themselves should turn them on. The sentence itself does not provide this background information.
- Part-of-speech tagging
- Tagger training
- Constraint Grammar
- Lexical selection
- Word sense disambiguation
- Anaphora resolution