Difference between revisions of "User:Francis Tyers/Sandbox"
Jump to navigation
Jump to search
Line 39: | Line 39: | ||
* Many rules will be slow. |
* Many rules will be slow. |
||
* Might not work very well. |
* Might not work very well. |
||
;Relevant prior work |
|||
* Jin Yang (1999) "Towards the Automatic Acquisition of Lexical Selection Rules" |
Revision as of 11:37, 7 October 2009
Lexical selection
Information
- Surface form -- tud etc.
- Lemma -- den etc.
- Category -- n.f etc.
- Syntax -- @SUBJ etc.
Ideas
Inferring rules from collocations
- The bilingual dictionary has several translations for each ambiguous word.
- Rules are created to select between them based on context.
- For each word in the bilingual dictionary, collocations (n-grams) are extracted from a source language corpus.
- reisa þetta hús og fullgjöra
- reisa þetta hús og fullgjöra
- niður þetta hús Guðs í
- gjört fyrir hús Guðs himnanna
- inn í hús Semaja Delajasonar
- For each ambiguous word, these collocations are run with each of the entries in the bilingual dictionary through the translator.
- Translations are scored on a target language corpus.
- Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of:
MAP (sense1) ("hús") IF (1 ("Guðs"));
- Syntax could also be included.
MAP (sense1) ("hús") IF (1 @SUBJ);
- Advantages
- Fairly straightforward -- the rules can be created automatically in constraint grammar.
- Human readable / editable.
- Doesn't require parallel corpus.
- Unsupervised
- Disadvantages
- Many rules will be slow.
- Might not work very well.
- Relevant prior work
- Jin Yang (1999) "Towards the Automatic Acquisition of Lexical Selection Rules"