From Apertium
Jump to navigation Jump to search


Feel free to edit/comment/spam/anything here

Some formalizing

IMHO the LS problem can be reduced to a classification problem:

the context could be a text frame, a bag of words, a tfidf-labelled array etc.

the possible translations for w can be obtained maybe from WordNet? or another dictionary?

We already have a set of attributes (srl and slr) to mark ambiguous words; it would be best to use those. en-ca and en-es have examples -- Jimregan 13:22, 21 June 2009 (UTC)

Awesome :D I'll give it a read in the next days :) Thanks a lot! -- Deadbeef 23:53, 30 June 2009 (UTC)

the classification problem can be solved in various ways: support vector machines, naive-bayes classifier, decision tree etc.

It seems that the WSD problem can be handled with a Inductive Logic Programming-oriented approach, as this paper states:

I'm currently trying to introduce probabilistic reasoning into Aleph[1] - the Inductive Logic Programming framework cited in the paper - for a university project and maybe it would be interesting to see how it could handle with lexical selection.

Data Mining/Machine Learning tools supporting the classification task

I've tried many tools while taking AI and DM-related classes, like Weka[2] (that I've integrated in a Multi-Agent System to support agents while taking decisions) or RapidMiner[3], but I think the most appropriate tool to use in this case could be Orange[4]. Now I'm doing some experiments in using its APIs from C++ and Python.

Some Bookmarks (please feel free to add more)

Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain:

Word Sense Disambiguation - Algorithms and Applications:

Word Sense Disambiguation: The State of the Art:

Word Sense Disambiguation (slide from the "Linguaggi e Traduttori" class):

Perl scripts doing WSD and mapping on UMLS ontologies:

Nice ACM survey on WSD:

Verb Semantics and Lexical Selection:

Parameter reduction in unsupervisedly trained sliding-window part-of-speech taggers: