Word-sense disambiguation

From Apertium
Revision as of 12:52, 11 August 2007 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Word sense disambiguation is important in machine translation between less-closely related languages. The problem was elucidated most famously by Yehoshua Bar-Hillel, who asks us to consider the following sentence:

Little John was looking for his toy box. Finally he found it. The box was in the pen.

The word pen may have two meanings, the first being, "something you use to write with", the second being, "a container of some kind". To a human, the meaning is obvious, but Bar-Hillel claimed that without a "universal encyclopedia" a machine would never be able to deal with this problem.

Figuring out which sense to use when a word is ambiguous is called word sense disambiguation, and is a big research area.

Lextor

Main article: Lextor

Lextor is the current word sense disambiguation module for Apertium, it works using statistics and requires 1) slightly pre-processed dictionaries and 2) corpora to train the module. For more information see the main page.

Further reading