Difference between revisions of "Machine translation with Constraint Grammar"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
| (4 intermediate revisions by one other user not shown) | |||
| Line 3: | Line 3: | ||
==Input==  | 
  ==Input==  | 
||
The input is a standard CG format stream with dependency labels (this can also be with CG-proc and [[Apertium stream format]]).  | 
|||
<pre>  | 
  <pre>  | 
||
"<Í>"  | 
  "<Í>"  | 
||
        "í" Pr @ADVL→ #1->3  | 
          "í" Pr @ADVL→ #1->3  | 
||
"<upphavi>"  | 
  "<upphavi>"  | 
||
        "upphav" N Neu Sg Dat Indef @P← #2->1  | 
          "upphav" N Neu Sg Dat Indef @P← #2->1  | 
||
"<skapti>"  | 
  "<skapti>"  | 
||
        "skapa" V Ind Prt Sg @VMAIN #3->0  | 
          "skapa" V Ind Prt Sg @VMAIN #3->0  | 
||
"<Gud>"  | 
  "<Gud>"  | 
||
        "gudur" N Msc Sg Nom Indef @←SUBJ #4->3  | 
          "gudur" N Msc Sg Nom Indef @←SUBJ #4->3  | 
||
"<himmal>"  | 
  "<himmal>"  | 
||
        "himmal" N Msc Sg Acc Indef @←OBJ #5->3  | 
          "himmal" N Msc Sg Acc Indef @←OBJ #5->3  | 
||
"<og>"  | 
  "<og>"  | 
||
        "og" CC @CC #6->5  | 
          "og" CC @CC #6->5  | 
||
"<jørð>"  | 
  "<jørð>"  | 
||
        "jørð" N Fem Sg Acc Indef @←OBJ #7->5  | 
          "jørð" N Fem Sg Acc Indef @←OBJ #7->5  | 
||
"<.>"  | 
  "<.>"  | 
||
        "." CLB #8->0  | 
          "." CLB #8->0  | 
||
</pre>  | 
  </pre>  | 
||
| Line 26: | Line 28: | ||
===Lexical===  | 
  ===Lexical===  | 
||
You can use some other system for lexical transfer (e.g. an Apertium bilingual dictionary), or you can do it directly in CG.  | 
|||
<pre>  | 
  <pre>  | 
||
| Line 41: | Line 45: | ||
===Movement===  | 
  ===Movement===  | 
||
Here we move a subject which is right of its main verb to the left (V2 → SVO).  | 
|||
| ⚫ | |||
| ⚫ | |||
$ cat /tmp/movement.cg   | 
  $ cat /tmp/movement.cg   | 
||
| ⚫ | |||
| ⚫ | |||
MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ;  | 
  MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ;  | 
||
SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ;  | 
  SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ;  | 
||
</pre>  | 
|||
=== Generation ===  | 
|||
In this step we add the definite article before any definite NP.  | 
|||
<pre>  | 
|||
$ cat /tmp/generate.cg  | 
|||
SECTION  | 
|||
SUBSTITUTE (Indef) (Def) ("beginning") ;  | 
|||
ADDCOHORT ("<the>" "the" Det Def Sg) BEFORE (N Def) ;  | 
|||
</pre>  | 
  </pre>  | 
||
===Morphological transfer===  | 
  ===Morphological transfer===  | 
||
We remove unused features like gender and definiteness.  | 
|||
<pre>  | 
  <pre>  | 
||
$ cat /tmp/morphtrans.cg   | 
  $ cat /tmp/morphtrans.cg   | 
||
SECTION  | 
  SECTION  | 
||
SUBSTITUTE (Neu) (*) (Neu);  | 
  SUBSTITUTE (Neu) (*) (Neu);  | 
||
| Line 62: | Line 82: | ||
SUBSTITUTE (Acc) (*) (Acc);  | 
  SUBSTITUTE (Acc) (*) (Acc);  | 
||
SUBSTITUTE (Indef) (*) (Indef);  | 
  SUBSTITUTE (Indef) (*) (Indef);  | 
||
</pre>  | 
|||
...or...  | 
|||
<pre>  | 
|||
$ cat /tmp/morphtrans.cg   | 
|||
SECTION  | 
|||
LIST ToKill = Neu Fem Msc Nom Dat Acc Indef ;  | 
|||
SUBSTITUTE ToKill (*) $$ToKill ;  | 
|||
</pre>  | 
  </pre>  | 
||
==Output==  | 
  ==Output==  | 
||
And finally run the whole thing.  | 
|||
<pre>  | 
  <pre>  | 
||
$ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/morphtrans.cg   | 
  $ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/generate.cg | vislcg3 --grammar /tmp/morphtrans.cg   | 
||
"<Í>"  | 
  "<Í>"  | 
||
	"in" Pr #1->  | 
  	"in" Pr @ADVL→ #1->5  | 
||
"<the>"  | 
|||
	"the" Det Def Sg #2->2  | 
|||
"<upphavi>"  | 
  "<upphavi>"  | 
||
	"beginning" N Sg   | 
  	"beginning" N Sg Def @P← #3->1  | 
||
"<Gud>"  | 
  "<Gud>"  | 
||
	"god" N Sg #  | 
  	"god" N Sg @SUBJ→ #4->5  | 
||
"<skapti>"  | 
  "<skapti>"  | 
||
	"create" V Ind Prt Sg #  | 
  	"create" V Ind Prt Sg @VMAIN #5->0  | 
||
"<himmal>"  | 
  "<himmal>"  | 
||
	"heaven" N Sg #  | 
  	"heaven" N Sg @←OBJ #6->5  | 
||
"<og>"  | 
  "<og>"  | 
||
	"and" CC #  | 
  	"and" CC @CC #7->6  | 
||
"<jørð>"  | 
  "<jørð>"  | 
||
	"earth" N Sg #  | 
  	"earth" N Sg @←OBJ #8->6  | 
||
"<.>"  | 
  "<.>"  | 
||
	"." CLB #  | 
  	"." CLB #9->0  | 
||
</pre>  | 
  </pre>  | 
||
Latest revision as of 11:35, 26 August 2011
Constraint Grammar is pretty flexible, it lets you shoot off your feet.
Input[edit]
The input is a standard CG format stream with dependency labels (this can also be with CG-proc and Apertium stream format).
"<Í>"
        "í" Pr @ADVL→ #1->3
"<upphavi>"
        "upphav" N Neu Sg Dat Indef @P← #2->1
"<skapti>"
        "skapa" V Ind Prt Sg @VMAIN #3->0
"<Gud>"
        "gudur" N Msc Sg Nom Indef @←SUBJ #4->3
"<himmal>"
        "himmal" N Msc Sg Acc Indef @←OBJ #5->3
"<og>"
        "og" CC @CC #6->5
"<jørð>"
        "jørð" N Fem Sg Acc Indef @←OBJ #7->5
"<.>"
        "." CLB #8->0
Grammars[edit]
Lexical[edit]
You can use some other system for lexical transfer (e.g. an Apertium bilingual dictionary), or you can do it directly in CG.
$ cat /tmp/lexical_transfer.cg 
SECTION
SUBSTITUTE ("í") ("in") ("í");
SUBSTITUTE ("upphav") ("beginning") ("upphav");
SUBSTITUTE ("himmal") ("heaven") ("himmal");
SUBSTITUTE ("og") ("and") ("og");
SUBSTITUTE ("jørð") ("earth") ("jørð");
SUBSTITUTE ("skapa") ("create") ("skapa");
SUBSTITUTE ("gudur") ("god") ("gudur");
Movement[edit]
Here we move a subject which is right of its main verb to the left (V2 → SVO).
$ cat /tmp/movement.cg SECTION MOVE WITHCHILD (*) (@←SUBJ) BEFORE (-1* (@VMAIN)) ; SUBSTITUTE (@←SUBJ) (@SUBJ→) (@←SUBJ) (1 (@VMAIN)) ;
Generation[edit]
In this step we add the definite article before any definite NP.
$ cat /tmp/generate.cg
SECTION
SUBSTITUTE (Indef) (Def) ("beginning") ;
ADDCOHORT ("<the>" "the" Det Def Sg) BEFORE (N Def) ;
Morphological transfer[edit]
We remove unused features like gender and definiteness.
$ cat /tmp/morphtrans.cg SECTION SUBSTITUTE (Neu) (*) (Neu); SUBSTITUTE (Fem) (*) (Fem); SUBSTITUTE (Msc) (*) (Msc); SUBSTITUTE (Nom) (*) (Nom); SUBSTITUTE (Dat) (*) (Dat); SUBSTITUTE (Acc) (*) (Acc); SUBSTITUTE (Indef) (*) (Indef);
...or...
$ cat /tmp/morphtrans.cg SECTION LIST ToKill = Neu Fem Msc Nom Dat Acc Indef ; SUBSTITUTE ToKill (*) $$ToKill ;
Output[edit]
And finally run the whole thing.
$ cat /tmp/in | vislcg3 --grammar /tmp/movement.cg | vislcg3 --grammar /tmp/lexical_transfer.cg | vislcg3 --grammar /tmp/generate.cg | vislcg3 --grammar /tmp/morphtrans.cg "<Í>" "in" Pr @ADVL→ #1->5 "<the>" "the" Det Def Sg #2->2 "<upphavi>" "beginning" N Sg Def @P← #3->1 "<Gud>" "god" N Sg @SUBJ→ #4->5 "<skapti>" "create" V Ind Prt Sg @VMAIN #5->0 "<himmal>" "heaven" N Sg @←OBJ #6->5 "<og>" "and" CC @CC #7->6 "<jørð>" "earth" N Sg @←OBJ #8->6 "<.>" "." CLB #9->0