Machine translation with Constraint Grammar

From Apertium
Revision as of 11:35, 26 August 2011 by Francis Tyers (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Constraint Grammar is pretty flexible, it lets you shoot off your feet.

Input[edit]

The input is a standard CG format stream with dependency labels (this can also be with CG-proc and Apertium stream format).

"<Í>"
        "í" Pr @ADVL→ #1->3
"<upphavi>"
        "upphav" N Neu Sg Dat Indef @P← #2->1
"<skapti>"
        "skapa" V Ind Prt Sg @VMAIN #3->0
"<Gud>"
        "gudur" N Msc Sg Nom Indef @←SUBJ #4->3
"<himmal>"
        "himmal" N Msc Sg Acc Indef @←OBJ #5->3
"<og>"
        "og" CC @CC #6->5
"<jørð>"
        "jørð" N Fem Sg Acc Indef @←OBJ #7->5
"<.>"
        "." CLB #8->0

Grammars[edit]

Lexical[edit]

You can use some other system for lexical transfer (e.g. an Apertium bilingual dictionary), or you can do it directly in CG.

$ cat /tmp/lexical_transfer.cg 
SECTION
SUBSTITUTE ("í") ("in") ("í");
SUBSTITUTE ("upphav") ("beginning") ("upphav");
SUBSTITUTE ("himmal") ("heaven") ("himmal");
SUBSTITUTE ("og") ("and") ("og");
SUBSTITUTE ("jørð") ("earth") ("jørð");
SUBSTITUTE ("skapa") ("create") ("skapa");
SUBSTITUTE ("gudur") ("god") ("gudur");

Movement[edit]

Here we move a subject which is right of its main verb to the left (V2 → SVO).

$ cat /tmp/movement.cg 

SECTION
MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ;
SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ;

Generation[edit]

In this step we add the definite article before any definite NP.

$ cat /tmp/generate.cg

SECTION
SUBSTITUTE (Indef) (Def) ("beginning") ;
ADDCOHORT ("<the>" "the" Det Def Sg) BEFORE (N Def) ;

Morphological transfer[edit]

We remove unused features like gender and definiteness.

$ cat /tmp/morphtrans.cg 

SECTION
SUBSTITUTE (Neu) (*) (Neu);
SUBSTITUTE (Fem) (*) (Fem);
SUBSTITUTE (Msc) (*) (Msc);
SUBSTITUTE (Nom) (*) (Nom);
SUBSTITUTE (Dat) (*) (Dat);
SUBSTITUTE (Acc) (*) (Acc);
SUBSTITUTE (Indef) (*) (Indef);

...or...

$ cat /tmp/morphtrans.cg 

SECTION
LIST ToKill = Neu Fem Msc Nom Dat Acc Indef ;
SUBSTITUTE ToKill (*) $$ToKill ;

Output[edit]

And finally run the whole thing.

$ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/generate.cg | vislcg3 --grammar /tmp/morphtrans.cg 
"<Í>"
	"in" Pr @ADVL→ #1->5
"<the>"
	"the" Det Def Sg #2->2
"<upphavi>"
	"beginning" N Sg Def @P← #3->1
"<Gud>"
	"god" N Sg @SUBJ→ #4->5
"<skapti>"
	"create" V Ind Prt Sg @VMAIN #5->0
"<himmal>"
	"heaven" N Sg @←OBJ #6->5
"<og>"
	"and" CC @CC #7->6
"<jørð>"
	"earth" N Sg @←OBJ #8->6
"<.>"
	"." CLB #9->0