Difference between revisions of "User:Francis Tyers/Sandbox"
Jump to navigation
Jump to search
Line 11: | Line 11: | ||
===Ideas=== |
===Ideas=== |
||
==== Inferring rules from collocations ==== |
|||
* The bilingual dictionary has several translations for each ambiguous word. |
* The bilingual dictionary has several translations for each ambiguous word. |
||
Line 25: | Line 25: | ||
* Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of: |
* Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of: |
||
** <code>MAP (sense1) ("hús") IF (1 ("Guðs"));</code> |
** <code>MAP (sense1) ("hús") IF (1 ("Guðs"));</code> |
||
* Syntax could also be included. |
|||
** <code>MAP (sense1) ("hús") IF (1 @SUBJ);</code> |
|||
;Advantages |
|||
* Fairly straightforward -- the rules can be created automatically in constraint grammar. |
|||
* Human readable / editable. |
|||
* Doesn't require parallel corpus. |
|||
;Disadvantages |
|||
* Many rules will be slow. |
|||
* Might not work very well. |
Revision as of 11:36, 7 October 2009
Lexical selection
Information
- Surface form -- tud etc.
- Lemma -- den etc.
- Category -- n.f etc.
- Syntax -- @SUBJ etc.
Ideas
Inferring rules from collocations
- The bilingual dictionary has several translations for each ambiguous word.
- Rules are created to select between them based on context.
- For each word in the bilingual dictionary, collocations (n-grams) are extracted from a source language corpus.
- reisa þetta hús og fullgjöra
- reisa þetta hús og fullgjöra
- niður þetta hús Guðs í
- gjört fyrir hús Guðs himnanna
- inn í hús Semaja Delajasonar
- For each ambiguous word, these collocations are run with each of the entries in the bilingual dictionary through the translator.
- Translations are scored on a target language corpus.
- Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of:
MAP (sense1) ("hús") IF (1 ("Guðs"));
- Syntax could also be included.
MAP (sense1) ("hús") IF (1 @SUBJ);
- Advantages
- Fairly straightforward -- the rules can be created automatically in constraint grammar.
- Human readable / editable.
- Doesn't require parallel corpus.
- Disadvantages
- Many rules will be slow.
- Might not work very well.