Difference between revisions of "Dependency based re-ordering"
(→Matxin) |
|||
Line 55: | Line 55: | ||
<pre> |
<pre> |
||
S1 (SENT) |
|||
__|_____ |
|||
C2 (grup-verb) --------| | |----------------- C4 (F-term) |
|||
| | |
|||
N4 sacude C3 (obj) |
|||
| | |
|||
C1 (subj) | |
|||
| N5 Bagdad |
|||
N3 atentado |
|||
| |
|||
------------ |
|||
| | |
|||
N1 Un N2 triple |
|||
</pre> |
</pre> |
||
Line 89: | Line 89: | ||
<pre> |
<pre> |
||
<SENTENCE ord=" |
<SENTENCE ord="0"> |
||
<NODE form='skapti' lem='skapa' ord='3' mi='V.Ind.Prt.Sg' si='VMAIN'> |
<NODE form='skapti' lem='skapa' ord='3' mi='V.Ind.Prt.Sg' si='VMAIN'> |
||
<NODE form='Í' lem='Í' ord='1' mi='Pr' si='ADVL'> |
<NODE form='Í' lem='Í' ord='1' mi='Pr' si='ADVL'> |
||
Line 95: | Line 95: | ||
</NODE> |
</NODE> |
||
<NODE form='Gud' lem='Gud' ord='4' mi='N.Prop.Sg.Nom' si='SUBJ'/> |
<NODE form='Gud' lem='Gud' ord='4' mi='N.Prop.Sg.Nom' si='SUBJ'/> |
||
<NODE form='himmal' lem='himmal' ord='5' mi='N.Msc.Sg.Acc.Indef' si='OBJ' |
<NODE form='himmal' lem='himmal' ord='5' mi='N.Msc.Sg.Acc.Indef' si='OBJ'/> |
||
</NODE> |
</NODE> |
||
</SENTENCE> |
</SENTENCE> |
Revision as of 12:09, 20 April 2009
There are dependency parsers based on constraint grammar for a few languages which Apertium would like to treat (e.g. the Sámi languages and Faroese), it might be a nice idea to be able to do re-ordering before transfer (or during transfer) based on the dependency tree (this would not do lexical transfer, concordance or anything else, just LU reordering). The sister project Matxin already does something like this, so it would be worth looking there for ideas.
The first stage would be to convert cg-proc
to output dependency information along with the lexical units. The second stage would be to write a module that builds a tree and does moving operations. Special care would need to be taken of superblanks.
Contents
Examples
In the example below,
Í upphavi skapti Gud himmal og jørð In beginning created God heaven and earth `In the beginning God created the heavens and the earth'
The subject could be moved before the verb using the dependency information, while inserting the determiners and doing concordance etc. would be left up to the rest of the transfer. The benefit to using the dependency graph to move stuff around is that it allows for limitless sized NPs etc.
Annotation
"<Í>" "í" Pr @ADVL> #1->3 "<upphavi>" "upphav" N Neu Sg Dat Indef @P< #2->1 "<skapti>" "skapa" V Ind Prt Sg @VMAIN #3->0 "<Gud>" "gudur" N Msc Sg Acc Indef @<SUBJ #4->3 "<himmal>" "himmal" N Msc Sg Acc Indef @<OBJ #5->3 "<og>" "og" CC @CC #6->5 "<jørð>" "jørð" N Fem Sg Acc Indef @<OBJ #7->3 "<.>" "." CLB #8->0
Graph
0 | (2)upphav----- (1)í[@ADVL]-------(3)skapa[@VMAIN] | | | |________ (5)himmal[@OBJ]----(6)og ________| | | ---------(7)jørð (4)gudur[@SUBJ]
Matxin
We could also try outputting Matxin format. Although Matxin uses chunks as well as nodes, we should be able to just do node based trees, e.g.
S1 (SENT) __|_____ C2 (grup-verb) --------| | |----------------- C4 (F-term) | | N4 sacude C3 (obj) | | C1 (subj) | | N5 Bagdad N3 atentado | ------------ | | N1 Un N2 triple
Could be represented
N0 (SENT) __|_____ ----------------------| | |----------------- N6 . (F-term) | | N4 sacude (grup-verb) N5 Bagdad (obj) | N3 atentado (subj) | ------------ | | N1 Un N2 triple
The above example in Faroese might come out something like:
<SENTENCE ord="0"> <NODE form='skapti' lem='skapa' ord='3' mi='V.Ind.Prt.Sg' si='VMAIN'> <NODE form='Í' lem='Í' ord='1' mi='Pr' si='ADVL'> <NODE form='upphavi' lem='upphav' ord='2' mi='N.Neu.Sg.Dat.Indef' si='P'/> </NODE> <NODE form='Gud' lem='Gud' ord='4' mi='N.Prop.Sg.Nom' si='SUBJ'/> <NODE form='himmal' lem='himmal' ord='5' mi='N.Msc.Sg.Acc.Indef' si='OBJ'/> </NODE> </SENTENCE>