Difference between revisions of "Machine translation with Constraint Grammar"
Jump to navigation
Jump to search
(2 intermediate revisions by one other user not shown) | |||
Line 3: | Line 3: | ||
==Input== |
==Input== |
||
+ | |||
+ | The input is a standard CG format stream with dependency labels (this can also be with CG-proc and [[Apertium stream format]]). |
||
<pre> |
<pre> |
||
"<Í>" |
"<Í>" |
||
− | "í" Pr @ADVL→ #1->3 |
+ | "í" Pr @ADVL→ #1->3 |
"<upphavi>" |
"<upphavi>" |
||
− | "upphav" N Neu Sg Dat Indef @P← #2->1 |
+ | "upphav" N Neu Sg Dat Indef @P← #2->1 |
"<skapti>" |
"<skapti>" |
||
− | "skapa" V Ind Prt Sg @VMAIN #3->0 |
+ | "skapa" V Ind Prt Sg @VMAIN #3->0 |
"<Gud>" |
"<Gud>" |
||
− | "gudur" N Msc Sg Nom Indef @←SUBJ #4->3 |
+ | "gudur" N Msc Sg Nom Indef @←SUBJ #4->3 |
"<himmal>" |
"<himmal>" |
||
− | "himmal" N Msc Sg Acc Indef @←OBJ #5->3 |
+ | "himmal" N Msc Sg Acc Indef @←OBJ #5->3 |
"<og>" |
"<og>" |
||
− | "og" CC @CC #6->5 |
+ | "og" CC @CC #6->5 |
"<jørð>" |
"<jørð>" |
||
"jørð" N Fem Sg Acc Indef @←OBJ #7->5 |
"jørð" N Fem Sg Acc Indef @←OBJ #7->5 |
||
"<.>" |
"<.>" |
||
− | "." CLB #8->0 |
+ | "." CLB #8->0 |
</pre> |
</pre> |
||
Line 26: | Line 28: | ||
===Lexical=== |
===Lexical=== |
||
+ | |||
+ | You can use some other system for lexical transfer (e.g. an Apertium bilingual dictionary), or you can do it directly in CG. |
||
<pre> |
<pre> |
||
Line 41: | Line 45: | ||
===Movement=== |
===Movement=== |
||
+ | Here we move a subject which is right of its main verb to the left (V2 → SVO). |
||
⚫ | |||
⚫ | |||
$ cat /tmp/movement.cg |
$ cat /tmp/movement.cg |
||
⚫ | |||
⚫ | |||
MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ; |
MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ; |
||
SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ; |
SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ; |
||
⚫ | |||
+ | |||
+ | === Generation === |
||
+ | |||
+ | In this step we add the definite article before any definite NP. |
||
+ | |||
⚫ | |||
+ | $ cat /tmp/generate.cg |
||
+ | |||
⚫ | |||
+ | SUBSTITUTE (Indef) (Def) ("beginning") ; |
||
⚫ | |||
</pre> |
</pre> |
||
===Morphological transfer=== |
===Morphological transfer=== |
||
+ | |||
+ | We remove unused features like gender and definiteness. |
||
<pre> |
<pre> |
||
$ cat /tmp/morphtrans.cg |
$ cat /tmp/morphtrans.cg |
||
+ | |||
SECTION |
SECTION |
||
SUBSTITUTE (Neu) (*) (Neu); |
SUBSTITUTE (Neu) (*) (Neu); |
||
Line 62: | Line 82: | ||
SUBSTITUTE (Acc) (*) (Acc); |
SUBSTITUTE (Acc) (*) (Acc); |
||
SUBSTITUTE (Indef) (*) (Indef); |
SUBSTITUTE (Indef) (*) (Indef); |
||
+ | </pre> |
||
− | |||
...or... |
...or... |
||
+ | <pre> |
||
+ | $ cat /tmp/morphtrans.cg |
||
SECTION |
SECTION |
||
LIST ToKill = Neu Fem Msc Nom Dat Acc Indef ; |
LIST ToKill = Neu Fem Msc Nom Dat Acc Indef ; |
||
− | SUBSTITUTE |
+ | SUBSTITUTE ToKill (*) $$ToKill ; |
</pre> |
</pre> |
||
− | == |
+ | ==Output== |
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
+ | And finally run the whole thing. |
||
− | ==Output== |
||
<pre> |
<pre> |
||
− | $ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/morphtrans.cg |
+ | $ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/generate.cg | vislcg3 --grammar /tmp/morphtrans.cg |
"<Í>" |
"<Í>" |
||
− | "in" Pr #1-> |
+ | "in" Pr @ADVL→ #1->5 |
+ | "<the>" |
||
+ | "the" Det Def Sg #2->2 |
||
"<upphavi>" |
"<upphavi>" |
||
− | "beginning" N Sg |
+ | "beginning" N Sg Def @P← #3->1 |
"<Gud>" |
"<Gud>" |
||
− | "god" N Sg # |
+ | "god" N Sg @SUBJ→ #4->5 |
"<skapti>" |
"<skapti>" |
||
− | "create" V Ind Prt Sg # |
+ | "create" V Ind Prt Sg @VMAIN #5->0 |
"<himmal>" |
"<himmal>" |
||
− | "heaven" N Sg # |
+ | "heaven" N Sg @←OBJ #6->5 |
"<og>" |
"<og>" |
||
− | "and" CC # |
+ | "and" CC @CC #7->6 |
"<jørð>" |
"<jørð>" |
||
− | "earth" N Sg # |
+ | "earth" N Sg @←OBJ #8->6 |
"<.>" |
"<.>" |
||
− | "." CLB # |
+ | "." CLB #9->0 |
− | |||
</pre> |
</pre> |
||
Latest revision as of 11:35, 26 August 2011
Constraint Grammar is pretty flexible, it lets you shoot off your feet.
Input[edit]
The input is a standard CG format stream with dependency labels (this can also be with CG-proc and Apertium stream format).
"<Í>" "í" Pr @ADVL→ #1->3 "<upphavi>" "upphav" N Neu Sg Dat Indef @P← #2->1 "<skapti>" "skapa" V Ind Prt Sg @VMAIN #3->0 "<Gud>" "gudur" N Msc Sg Nom Indef @←SUBJ #4->3 "<himmal>" "himmal" N Msc Sg Acc Indef @←OBJ #5->3 "<og>" "og" CC @CC #6->5 "<jørð>" "jørð" N Fem Sg Acc Indef @←OBJ #7->5 "<.>" "." CLB #8->0
Grammars[edit]
Lexical[edit]
You can use some other system for lexical transfer (e.g. an Apertium bilingual dictionary), or you can do it directly in CG.
$ cat /tmp/lexical_transfer.cg SECTION SUBSTITUTE ("í") ("in") ("í"); SUBSTITUTE ("upphav") ("beginning") ("upphav"); SUBSTITUTE ("himmal") ("heaven") ("himmal"); SUBSTITUTE ("og") ("and") ("og"); SUBSTITUTE ("jørð") ("earth") ("jørð"); SUBSTITUTE ("skapa") ("create") ("skapa"); SUBSTITUTE ("gudur") ("god") ("gudur");
Movement[edit]
Here we move a subject which is right of its main verb to the left (V2 → SVO).
$ cat /tmp/movement.cg SECTION MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ; SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ;
Generation[edit]
In this step we add the definite article before any definite NP.
$ cat /tmp/generate.cg SECTION SUBSTITUTE (Indef) (Def) ("beginning") ; ADDCOHORT ("<the>" "the" Det Def Sg) BEFORE (N Def) ;
Morphological transfer[edit]
We remove unused features like gender and definiteness.
$ cat /tmp/morphtrans.cg SECTION SUBSTITUTE (Neu) (*) (Neu); SUBSTITUTE (Fem) (*) (Fem); SUBSTITUTE (Msc) (*) (Msc); SUBSTITUTE (Nom) (*) (Nom); SUBSTITUTE (Dat) (*) (Dat); SUBSTITUTE (Acc) (*) (Acc); SUBSTITUTE (Indef) (*) (Indef);
...or...
$ cat /tmp/morphtrans.cg SECTION LIST ToKill = Neu Fem Msc Nom Dat Acc Indef ; SUBSTITUTE ToKill (*) $$ToKill ;
Output[edit]
And finally run the whole thing.
$ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/generate.cg | vislcg3 --grammar /tmp/morphtrans.cg "<Í>" "in" Pr @ADVL→ #1->5 "<the>" "the" Det Def Sg #2->2 "<upphavi>" "beginning" N Sg Def @P← #3->1 "<Gud>" "god" N Sg @SUBJ→ #4->5 "<skapti>" "create" V Ind Prt Sg @VMAIN #5->0 "<himmal>" "heaven" N Sg @←OBJ #6->5 "<og>" "and" CC @CC #7->6 "<jørð>" "earth" N Sg @←OBJ #8->6 "<.>" "." CLB #9->0