Naïve bayes classifier for lexical selection

Lexical selection is the task of choosing the best translation of a polysemous word in a sentence. One way of doing this is to use the sentence context to determine the appropriate translation of a word. This context can be modelled as a naïve Bayes classifier. Lexical selection is related to word sense disambiguation, and we will adopt the terminology of that field when talking about it.

An extract from a paper dictionary for some words in Catalan and their translations in English. Note the proper name Estats Units, multiword L'estament mèdic, polysemous words estat, estar and estalviar, and lexical-category ambiguous word estanc.

