Difference between revisions of "Word-sense disambiguation"

From Apertium
Jump to navigation Jump to search
(lexsel is subcat of development)
Line 23: Line 23:
   
 
* Ide, N. and Véronis, J. (1998) "[http://www.up.univ-mrs.fr/~veronis/pdf/1998wsd.pdf Word Sense Disambiguation: The State of the Art]". ''Computational Linguistics'' 24(1)
 
* Ide, N. and Véronis, J. (1998) "[http://www.up.univ-mrs.fr/~veronis/pdf/1998wsd.pdf Word Sense Disambiguation: The State of the Art]". ''Computational Linguistics'' 24(1)
* Agirre, E. and Edmonds, P., editors (2007). "Word Sense Disambiguation: Algorithms and Applications". Volume 33 of ''Text, Speech and Language Technology''
+
* Agirre, E. and Edmonds, P., editors (2007). "[http://ftsp.mercubuana.ac.id/30/J07-2005.pdf Word Sense Disambiguation: Algorithms and Applications"]. Volume 33 of ''Text, Speech and Language Technology''
 
* Navigli, R. (2009) [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf Word Sense Disambiguation: A Survey]. ''ACM Comput. Surv.'' 41, 2, Article 10
 
* Navigli, R. (2009) [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf Word Sense Disambiguation: A Survey]. ''ACM Comput. Surv.'' 41, 2, Article 10
   

Revision as of 13:26, 22 October 2010

Word sense disambiguation is important in machine translation between less-closely related languages. The problem was elucidated most famously by Yehoshua Bar-Hillel, who asks us to consider the following sentence:

Little John was looking for his toy box. Finally he found it. The box was in the pen.

The word pen may have two meanings:

  1. Something you use to write with
  2. A container of some kind

To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopaedia" a machine would never be able to deal with this problem. Figuring out which sense to use when a word is ambiguous is called word sense disambiguation, and is a big research area.

Lextor

Main article: Lextor

Lextor is the current word sense disambiguation module for Apertium, it works using statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. The module is turned off in most cases as it does not provide an improvement over the baseline.

See also

Further reading