Difference between revisions of "Word-sense disambiguation"

From Apertium
Jump to navigation Jump to search
 
(13 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Word sense disambiguation''' is important in machine translation between less-closely related languages. The problem was elucidated most famously by Yehoshua Bar-Hillel, who asks us to consider the following sentence:
'''Word sense disambiguation''' means choosing between two ''meanings'' of the same word (we assume we already know the part of speech). This can be important in machine translation between less-closely related languages. The problem was elucidated most famously by Yehoshua Bar-Hillel, who asks us to consider the following sentence:


:Little John was looking for his toy box. Finally he found it. The box was in the pen.
:Little John was looking for his toy box. Finally he found it. The box was in the pen.


The word pen may have two meanings:
The word pen may have two meanings, the first being, "something you use to write with", the second being, "a container of some kind". To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem.


#Something you use to write with
Figuring out which sense to use when a word is ambiguous is called ''word sense disambiguation'', and is a big research area.
#A container of some kind


To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopaedia" a machine would never be able to deal with this problem. Figuring out which sense to use when a word is ambiguous is called ''word sense disambiguation'', and is a big research area.
==Lextor==

{{main|Lextor}}
However, importantly, many of the possible meanings and nuances identified by lexicographers ''do not affect machine translation''. E.g. the English term ''hospital'' can refer to both an organisation and a concrete building, but regardless of which meaning is used in a sentence, in Norwegian it still becomes ''sjukehus''. Thus we use the term '''lexical selection''' when we speak of those word senses that matter to MT. More on this in the article [[Lexical selection]].


Lextor is the current word sense disambiguation module for Apertium, it works using statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. For more information see the [[Lextor|main page]].


==Further reading==
==Further reading==


* Ide, N. and Véronis, J. (1998) "[http://www.up.univ-mrs.fr/~veronis/pdf/1998wsd.pdf Word Sense Disambiguation: The State of the Art]". ''Computational Linguistics'' 24(1)
* Ide, N. and Véronis, J. (1998) "[http://www.up.univ-mrs.fr/~veronis/pdf/1998wsd.pdf Word Sense Disambiguation: The State of the Art]". ''Computational Linguistics'' 24(1)
* Agirre, E. and Edmonds, P., editors (2007). "[http://ftsp.mercubuana.ac.id/30/J07-2005.pdf Word Sense Disambiguation: Algorithms and Applications"]. Volume 33 of ''Text, Speech and Language Technology''
* Navigli, R. (2009) [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf Word Sense Disambiguation: A Survey]. ''ACM Comput. Surv.'' 41, 2, Article 10



[[Category:Development]]
[[Category:Lexical selection]]
[[Category:Documentation in English]]

Latest revision as of 10:38, 7 September 2012

Word sense disambiguation means choosing between two meanings of the same word (we assume we already know the part of speech). This can be important in machine translation between less-closely related languages. The problem was elucidated most famously by Yehoshua Bar-Hillel, who asks us to consider the following sentence:

Little John was looking for his toy box. Finally he found it. The box was in the pen.

The word pen may have two meanings:

  1. Something you use to write with
  2. A container of some kind

To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopaedia" a machine would never be able to deal with this problem. Figuring out which sense to use when a word is ambiguous is called word sense disambiguation, and is a big research area.

However, importantly, many of the possible meanings and nuances identified by lexicographers do not affect machine translation. E.g. the English term hospital can refer to both an organisation and a concrete building, but regardless of which meaning is used in a sentence, in Norwegian it still becomes sjukehus. Thus we use the term lexical selection when we speak of those word senses that matter to MT. More on this in the article Lexical selection.


Further reading[edit]