Difference between revisions of "Word-sense disambiguation"

From Apertium
Jump to navigation Jump to search
Line 3: Line 3:
 
:Little John was looking for his toy box. Finally he found it. The box was in the pen.
 
:Little John was looking for his toy box. Finally he found it. The box was in the pen.
   
  +
The word pen may have two meanings:
The word pen may have two meanings, the first being, "something you use to write with", the second being, "a container of some kind". To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem.
 
   
  +
#Something you use to write with
Figuring out which sense to use when a word is ambiguous is called ''word sense disambiguation'', and is a big research area.
 
  +
#A container of some kind
  +
 
To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopaedia" a machine would never be able to deal with this problem. Figuring out which sense to use when a word is ambiguous is called ''word sense disambiguation'', and is a big research area.
   
 
==Lextor==
 
==Lextor==

Revision as of 15:34, 11 August 2007

Word sense disambiguation is important in machine translation between less-closely related languages. The problem was elucidated most famously by Yehoshua Bar-Hillel, who asks us to consider the following sentence:

Little John was looking for his toy box. Finally he found it. The box was in the pen.

The word pen may have two meanings:

  1. Something you use to write with
  2. A container of some kind

To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopaedia" a machine would never be able to deal with this problem. Figuring out which sense to use when a word is ambiguous is called word sense disambiguation, and is a big research area.

Lextor

Main article: Lextor

Lextor is the current word sense disambiguation module for Apertium, it works using statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. For more information see the main page.

Further reading