Difference between revisions of "User:Deadbeef/LexicalSelection"
(12 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
= Introduction = |
= Introduction = |
||
Feel free to edit/comment/spam/anything here |
|||
Hello world! |
|||
Line 8: | Line 8: | ||
IMHO the LS problem can be reduced to a classification problem: |
IMHO the LS problem can be reduced to a classification problem: |
||
:<math>\mathrm{classify}(word w, |
:<math>\mathrm{classify}(word\ w,\ context\ c)\ \in\ \{\ t\ :\ t\ possible\ translation\ for\ w\ \}.</math> |
||
the context <math>c</math> could be a text frame, a bag of words, a tfidf-labelled array etc. |
|||
the possible translations for w can be obtained maybe from WordNet? or another dictionary? |
|||
{{comment|We already have a set of attributes (<code>srl</code> and <code>slr</code>) to mark ambiguous words; it would be best to use those. en-ca and en-es have examples -- [[User:Jimregan|Jimregan]] 13:22, 21 June 2009 (UTC)}} |
|||
{{comment|Awesome :D I'll give it a read in the next days :) Thanks a lot! -- [[User:Deadbeef|Deadbeef]] 23:53, 30 June 2009 (UTC)}} |
|||
the classification problem can be solved in various ways: support vector machines, naive-bayes classifier, decision tree etc. |
|||
It seems that the WSD problem can be handled with a Inductive Logic Programming-oriented approach, as this paper states: http://www.mt-archive.info/ACL-2007-Specia.pdf |
|||
I'm currently trying to introduce probabilistic reasoning into Aleph[http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/aleph_toc.html] - the Inductive Logic Programming framework cited in the paper - for a university project and maybe it would be interesting to see how it could handle with lexical selection. |
|||
= Data Mining/Machine Learning tools supporting the classification task = |
|||
I've tried many tools while taking AI and DM-related classes, like Weka[http://www.cs.waikato.ac.nz/ml/weka/] (that I've integrated in a Multi-Agent System to support agents while taking decisions) or RapidMiner[http://www.rapidminer.com], but I think the most appropriate tool to use in this case could be Orange[http://www.ailab.si/Orange/]. Now I'm doing some experiments in using its APIs from C++ and Python. |
|||
= Some Bookmarks (please feel free to add more) = |
= Some Bookmarks (please feel free to add more) = |
||
Line 29: | Line 50: | ||
Verb Semantics and Lexical Selection: http://www.ldc.upenn.edu/acl/P/P94/P94-1019.pdf |
Verb Semantics and Lexical Selection: http://www.ldc.upenn.edu/acl/P/P94/P94-1019.pdf |
||
Parameter reduction in unsupervisedly trained sliding-window part-of-speech taggers: http://transducens.dlsi.ua.es/repositori/transducens/pubs/167/ranlp05.pdf |
|||
http://www.dlsi.ua.es/~mlf/docum/sanchezvillamil04p.pdf |
|||
http://www.dlsi.ua.es/~mlf/docum/sanchezvillamil05p.pdf |
Latest revision as of 20:07, 6 July 2009
Contents
Introduction[edit]
Feel free to edit/comment/spam/anything here
Some formalizing[edit]
IMHO the LS problem can be reduced to a classification problem:
the context could be a text frame, a bag of words, a tfidf-labelled array etc.
the possible translations for w can be obtained maybe from WordNet? or another dictionary?
We already have a set of attributes (srl
and slr
) to mark ambiguous words; it would be best to use those. en-ca and en-es have examples -- Jimregan 13:22, 21 June 2009 (UTC)
Awesome :D I'll give it a read in the next days :) Thanks a lot! -- Deadbeef 23:53, 30 June 2009 (UTC)
the classification problem can be solved in various ways: support vector machines, naive-bayes classifier, decision tree etc.
It seems that the WSD problem can be handled with a Inductive Logic Programming-oriented approach, as this paper states: http://www.mt-archive.info/ACL-2007-Specia.pdf
I'm currently trying to introduce probabilistic reasoning into Aleph[1] - the Inductive Logic Programming framework cited in the paper - for a university project and maybe it would be interesting to see how it could handle with lexical selection.
Data Mining/Machine Learning tools supporting the classification task[edit]
I've tried many tools while taking AI and DM-related classes, like Weka[2] (that I've integrated in a Multi-Agent System to support agents while taking decisions) or RapidMiner[3], but I think the most appropriate tool to use in this case could be Orange[4]. Now I'm doing some experiments in using its APIs from C++ and Python.
Some Bookmarks (please feel free to add more)[edit]
Using UMLS Concept Unique Identifiers (CUIs) for Word Sense Disambiguation in the Biomedical Domain: http://www.d.umn.edu/~tpederse/Pubs/amia07.pdf
Word Sense Disambiguation - Algorithms and Applications: http://www.wsdbook.org/
Word Sense Disambiguation: The State of the Art: http://sites.univ-provence.fr/~veronis/pdf/1998wsd.pdf
Word Sense Disambiguation (slide from the "Linguaggi e Traduttori" class): http://www.di.uniba.it/~semeraro/LT/WSD.pdf
Perl scripts doing WSD and mapping on UMLS ontologies: http://cuitools.sourceforge.net/
Nice ACM survey on WSD: http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf
Verb Semantics and Lexical Selection: http://www.ldc.upenn.edu/acl/P/P94/P94-1019.pdf
Parameter reduction in unsupervisedly trained sliding-window part-of-speech taggers: http://transducens.dlsi.ua.es/repositori/transducens/pubs/167/ranlp05.pdf