Difference between revisions of "User:Francis Tyers/Sandbox"
Jump to navigation
Jump to search
Line 2: | Line 2: | ||
==Information== |
|||
* Surface form -- tud etc. |
* Surface form -- tud etc. |
||
Line 9: | Line 9: | ||
* Syntax -- @SUBJ etc. |
* Syntax -- @SUBJ etc. |
||
==Ideas== |
|||
For some things linguistic knowledge is better, or easier. It is also better for hacking. For other things, statistics are better. Wider coverage for cheaper. The lexical selection module(s) should allow both the use of rules and of statistics. Rules for things we "know", statistics for those we don't. |
For some things linguistic knowledge is better, or easier. It is also better for hacking. For other things, statistics are better. Wider coverage for cheaper. The lexical selection module(s) should allow both the use of rules and of statistics. Rules for things we "know", statistics for those we don't. |
||
=== Inferring rules from collocations === |
|||
Rules as described below are already used in <code>apertium-cy-en</code>, <code>apertium-br-fr</code> and <code>apertium-sme-smj</code>. This stage |
Rules as described below are already used in <code>apertium-cy-en</code>, <code>apertium-br-fr</code> and <code>apertium-sme-smj</code>. This stage |
||
Line 50: | Line 50: | ||
* Jin Yang (1999) "Towards the Automatic Acquisition of Lexical Selection Rules" |
* Jin Yang (1999) "Towards the Automatic Acquisition of Lexical Selection Rules" |
||
* Eckhard Bick (2005) "Dan2eng: Wide-Coverage Danish-English Machine Translation" |
* Eckhard Bick (2005) "Dan2eng: Wide-Coverage Danish-English Machine Translation" |
||
;Examples |
|||
* o huñvreal muioc'h eget o pediñ . |
|||
* Koulskoude e tiviz Francis pediñ e zaou vreur d'ober |
|||
* O fal a zo pediñ arzourien a bep seurt evel kizellerien |
|||
* bleunioù ha peadra da yac'haat o zreid hag o pediñ evito |
|||
* ha tu a oa bet d'al labourerien pediñ o familhoù hag o mignoned |
|||
* Raktresoù all a zo ivez : pediñ skrivagnerien a-benn eskemm ganto |
|||
* Sharon Stone eo bet an hini gwellañ evit pediñ an embregerien da zisammañ |
Revision as of 15:12, 7 October 2009
Lexical selection
Information
- Surface form -- tud etc.
- Lemma -- den etc.
- Category -- n.f etc.
- Syntax -- @SUBJ etc.
Ideas
For some things linguistic knowledge is better, or easier. It is also better for hacking. For other things, statistics are better. Wider coverage for cheaper. The lexical selection module(s) should allow both the use of rules and of statistics. Rules for things we "know", statistics for those we don't.
Inferring rules from collocations
Rules as described below are already used in apertium-cy-en
, apertium-br-fr
and apertium-sme-smj
. This stage
would be the first pass of lexical selection.
- The bilingual dictionary has several translations for each ambiguous word.
- Rules are created to select between them based on context.
- For each word in the bilingual dictionary, collocations (n-grams) are extracted from a source language corpus.
- reisa þetta hús og fullgjöra
- reisa þetta hús og fullgjöra
- niður þetta hús Guðs í
- gjört fyrir hús Guðs himnanna
- inn í hús Semaja Delajasonar
- For each ambiguous word, these collocations are run with each of the entries in the bilingual dictionary through the translator.
- Translations are scored on a target language corpus.
- Where the difference in score between one translation and another reaches a threshold, a rule is created in the form of:
MAP (sense1) ("hús") IF (1 ("Guðs"));
- Syntax could also be included.
MAP (sense1) ("hús") IF (1 @SUBJ);
- It would be interesting to see if rules can be learnt which use different discriminators (e.g. surface form, syntax) etc.
- Advantages
- Fairly straightforward -- the rules can be created automatically in constraint grammar.
- Human readable / editable.
- Doesn't require parallel corpus -- although might work better with one.
- Unsupervised.
- Disadvantages
- Many rules will be slow.
- Might not work very well.
- Relevant prior work
- Jin Yang (1999) "Towards the Automatic Acquisition of Lexical Selection Rules"
- Eckhard Bick (2005) "Dan2eng: Wide-Coverage Danish-English Machine Translation"
- Examples
- o huñvreal muioc'h eget o pediñ .
- Koulskoude e tiviz Francis pediñ e zaou vreur d'ober
- O fal a zo pediñ arzourien a bep seurt evel kizellerien
- bleunioù ha peadra da yac'haat o zreid hag o pediñ evito
- ha tu a oa bet d'al labourerien pediñ o familhoù hag o mignoned
- Raktresoù all a zo ivez : pediñ skrivagnerien a-benn eskemm ganto
- Sharon Stone eo bet an hini gwellañ evit pediñ an embregerien da zisammañ