Difference between revisions of "Ambiguity"

From Apertium
Jump to navigation Jump to search
m
 
(2 intermediate revisions by one other user not shown)
Line 22: Line 22:
</pre>
</pre>


The "[[apertium-tagger|PoS-tagger]]" (PoS-disambiguator) removes the ambiguity by selecting the most likely (and hopefully the correct) analysis:
The "[[apertium-tagger|PoS-tagger]]" (PoS-disambiguator) and/or [[Constraint Grammar]] removes the ambiguity by selecting the most likely (and hopefully the correct) analysis:
<pre>
<pre>
$ echo The banks|apertium -d trunk/apertium-en-es/ en-es-tagger
$ echo The banks|apertium -d trunk/apertium-en-es/ en-es-tagger
Line 39: Line 39:
If we were translating to Spanish, this ''word sense ambiguity'' becomes important, since the first sense translates to 'orillas', while the second translates to 'bancos'.
If we were translating to Spanish, this ''word sense ambiguity'' becomes important, since the first sense translates to 'orillas', while the second translates to 'bancos'.


Note: just because a dictionary lists a word sense distinction, it isn't necessarily relevant for machine translation! We don't have to figure out whether '''banks''' refers to the buildings of the banks or the abstract financial institutions, it still translates to 'bancos'. For this reason, when we're talking about machine translation, we don't talk about '''word sense disambiguation''', but '''lexical selection''', selecting the best possible translation of a certain PoS analysis of a form.
Note: just because a dictionary lists a word sense distinction, it isn't necessarily relevant for machine translation! We don't have to figure out whether '''banks''' refers to the buildings of the banks or the abstract financial institutions, it still translates to 'bancos'. For this reason, when we're talking about machine translation, we don't talk about '''word sense disambiguation''', but '''lexical selection''', selecting the best possible translation of a certain PoS analysis of a form. In general, good lexical selection is less important for getting a good translation than good PoS disambiguation.


Lexical selection happens after PoS disambiguation, either before or after bidix lookup (word-translation). In general, good lexical selection is much less important for getting a good translation than good PoS disambiguation.
In Apertium, lexical selection (with <code>lrx-proc</code>) happens after bidix lookup (word-translation), but before structural transfer rules. See [[Lexical selection]] for more information.

Currently there is no agreed-upon lexical selection module in Apertium (different pairs use different methods), but see [[Lexical selection]] for a list of methods, some more experimental than others.


==Syntactic ambiguity==
==Syntactic ambiguity==
Line 75: Line 73:
[[Category:Ambiguity]]
[[Category:Ambiguity]]
[[Category:Theoretical background]]
[[Category:Theoretical background]]
[[Category:Documentation in English]]

Latest revision as of 18:06, 26 September 2016

In natural languages, we can have many different types of ambiguity, e.g.:

  • part of speech ambiguity
  • word-sense / lexical selection ambiguity
  • syntactic ambiguity
  • pragmatic ambiguity


Part of Speech ambiguity[edit]

The same form of a word may be a noun or a verb, plural or singular, etc.

To give an example, the form banks may be either a plural noun or third person present tense verb:

  • During the financial crisis, banks received $1.2 trillion in loans from the Government (noun)
  • When a plane turns, it banks to give the lifting force of the wings a horizontal component (verb)

In apertium, the output from the morphological analyser is ambiguous with respect to Part of Speech, and shows this PoS-ambiguity by giving several analyses for one word:

$ echo banks|apertium -d trunk/apertium-en-es/ en-es-anmor
^banks/bank<n><pl>/bank<vblex><pri><p3><sg>$

The "PoS-tagger" (PoS-disambiguator) and/or Constraint Grammar removes the ambiguity by selecting the most likely (and hopefully the correct) analysis:

$ echo The banks|apertium -d trunk/apertium-en-es/ en-es-tagger 
^The<det><def><sp>$ ^bank<n><pl>$

Note that we still call it PoS-ambiguity if the ambiguity is in "subtags" like infinitive vs present (bank can be not only a noun or a present tense verb, but also an infinitive verb).

Word-Sense / Lexical Selection ambiguity[edit]

One form of a word, with a certain part of speech, can still have several possible meanings, and might have several possible translations.

Say that we know from context that the form banks is the plural noun (perhaps the previous word was the). It still has two possible meanings: "river bank", or "financial bank":

  • She put the child in it and placed it among the reeds by the banks (river)
  • During the financial crisis, banks received $1.2 trillion in loans from the Government (financial)

If we were translating to Spanish, this word sense ambiguity becomes important, since the first sense translates to 'orillas', while the second translates to 'bancos'.

Note: just because a dictionary lists a word sense distinction, it isn't necessarily relevant for machine translation! We don't have to figure out whether banks refers to the buildings of the banks or the abstract financial institutions, it still translates to 'bancos'. For this reason, when we're talking about machine translation, we don't talk about word sense disambiguation, but lexical selection, selecting the best possible translation of a certain PoS analysis of a form. In general, good lexical selection is less important for getting a good translation than good PoS disambiguation.

In Apertium, lexical selection (with lrx-proc) happens after bidix lookup (word-translation), but before structural transfer rules. See Lexical selection for more information.

Syntactic ambiguity[edit]

Consider the sentence Umberto saw the man with the spyglass. Here, there are two equally feasible interpretations: Umberto saw the man who had the spyglass, or Umberto used a spyglass to see the man. This is a syntactic ambiguity, since the phrase with the spyglass can be syntactically attached to (or dependent on, if that's more in line with your favourite theory of syntax) the phrase the man or the verb saw.

This is just one of many kinds of syntactic ambiguity.

Pragmatic ambiguity[edit]

Pragmatic ambiguity is amguity which is dependent on context.

An example: If someone asks you "Headlights on?", they might be reminding you to turn them off, or on, or they might be asking if they themselves should turn them on. The sentence itself does not provide this background information.

See also[edit]

External links[edit]