Difference between revisions of "User:Francis Tyers/Sandbox"

Revision as of 11:38, 7 October 2009

Lexical selection

The bilingual dictionary has several translations for each ambiguous word.
Rules are created to select between them based on context.
For each word in the bilingual dictionary, collocations (n-grams) are extracted from a source language corpus.
- reisa þetta hús og fullgjöra
- reisa þetta hús og fullgjöra
- niður þetta hús Guðs í
- gjört fyrir hús Guðs himnanna
- inn í hús Semaja Delajasonar
For each ambiguous word, these collocations are run with each of the entries in the bilingual dictionary through the translator.
Translations are scored on a target language corpus.
Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of:
- MAP (sense1) ("hús") IF (1 ("Guðs"));
Syntax could also be included.
- MAP (sense1) ("hús") IF (1 @SUBJ);

Fairly straightforward -- the rules can be created automatically in constraint grammar.
Human readable / editable.
Doesn't require parallel corpus -- although might work better with one.
Unsupervised.

@@ Line 32: / Line 32: @@
 * Fairly straightforward -- the rules can be created automatically in constraint grammar.
 * Human readable / editable.
-* Doesn't require parallel corpus.
+* Doesn't require parallel corpus -- although might work better with one.
 * Unsupervised.