Difference between revisions of "Dependency based re-ordering"

From Apertium
Jump to navigation Jump to search
 
(9 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TOCD}}
There are dependency parsers based on [[constraint grammar]] for a few languages which Apertium would like to treat (e.g. the [[Sámi languages]] and [[Faroese]]), it might be a nice idea to be able to do re-ordering before transfer (or during transfer) based on the dependency tree (this would not do lexical transfer, concordance or anything else, just LU reordering). The sister project [[Matxin]] already does something like this, so it would be worth looking there for ideas.
There are dependency parsers based on [[constraint grammar]] for a few languages which Apertium would like to treat (e.g. the [[Sámi languages]] and [[Faroese]]), it might be a nice idea to be able to do re-ordering before transfer (or during transfer) based on the dependency tree (this would not do lexical transfer, concordance or anything else, just LU reordering). The sister project [[Matxin]] already does something like this, so it would be worth looking there for ideas.


Line 26: Line 27:
"skapa" V Ind Prt Sg @VMAIN #3->0
"skapa" V Ind Prt Sg @VMAIN #3->0
"<Gud>"
"<Gud>"
"gudur" N Msc Sg Acc Indef @<SUBJ #4->3
"gudur" N Msc Sg Nom Indef @<SUBJ #4->3
"<himmal>"
"<himmal>"
"himmal" N Msc Sg Acc Indef @<OBJ #5->3
"himmal" N Msc Sg Acc Indef @<OBJ #5->3
Line 32: Line 33:
"og" CC @CC #6->5
"og" CC @CC #6->5
"<jørð>"
"<jørð>"
"jørð" N Fem Sg Acc Indef @<OBJ #7->3
"jørð" N Fem Sg Acc Indef @<OBJ #7->5
"<.>"
"<.>"
"." CLB #8->0
"." CLB #8->0
</pre>

====In Apertium====

In Apertium format, this input might look something like:

<pre>
^Í/í<pr><@ADVL→><#1→3>$
^upphavi/upphav<n><nt><sg><dat><ind><@P←><#2→1>$
^skapti/skapa<vblex><pri><sg><@VMAIN><#3→0>$
^Gud/gudur<n><m><sg><nom<ind><@←SUBJ><#4→3>$
^himmal/himmal<n><m><sg><acc><ind><@←OBJ><#5→3>$
^og/og<cnjcoo><@CC><#6→5>$
^jørð/jørð<n><m><sg><acc><ind><@←OBJ><#7→5>$
^./.<sent><#8→0>$
</pre>

And then you might have a rule which looks something like:

<pre>
<reorder>
<pattern>
<head><pattern-item n="vmain"/></head>
<child><pattern-item n="subj"/></child>
<child><pattern-item n="obj"/></child>
</pattern>
<out>
<clip pos="2"/> <!-- subject -->
<clip pos="1"/> <!-- verb -->
<clip pos="3"/> <!-- object -->
</out>
</reorder>
</pre>

(Well, ok it will probably look quite different -- format to be determined.)

Giving the output:

<pre>
^Í/í<pr><@ADVL→><#1→4>$
^upphavi/upphav<n><nt><sg><dat><ind><@P←><#2→1>$
^Gud/gudur<n><m><sg><nom<ind><@←SUBJ><#3→4>$
^skapti/skapa<vblex><pri><sg><@VMAIN><#4→0>$
^himmal/himmal<n><m><sg><acc><ind><#←OBJ><#5→4>$
^og/og<cnjcoo><@CC><#6→5>$
^jørð/jørð<n><m><sg><acc><ind><#←OBJ><#7→5>$
^./.<sent><#8→0>$
</pre>

Note how the words have been reordered and the positions updated (it would be also nice to relabel <code>@←SUBJ</code> to <code>@SUBJ→</code>).

In this example, the subject that moved was a single word, but say we wanted to translate into a language which may have Object-Verb-Subject word order (like Tamil reported speech or… Klingon), then the rule would only get a slight change:

<pre>
<reorder>
<pattern>
<head><pattern-item n="vmain"/></head>
<child><pattern-item n="subj"/></child>
<child><pattern-item n="obj"/></child>
</pattern>
<out>
<clip pos="3"/> <!-- object -->
<clip pos="1"/> <!-- verb -->
<clip pos="2"/> <!-- subject -->
</out>
</reorder>
</pre>

but the output would have the whole object phrase moved in front of the verb:

<pre>
^Í/í<pr><@ADVL→><#1→4>$
^upphavi/upphav<n><nt><sg><dat><ind><@P←><#2→1>$
^himmal/himmal<n><m><sg><acc><ind><#←OBJ><#5→4>$
^og/og<cnjcoo><@CC><#6→5>$
^jørð/jørð<n><m><sg><acc><ind><#←OBJ><#7→5>$
^skapti/skapa<vblex><pri><sg><@VMAIN><#4→0>$
^Gud/gudur<n><m><sg><nom<ind><@←SUBJ><#3→4>$
^./.<sent><#8→0>$
</pre>
</pre>


Line 55: Line 135:


<pre>
<pre>
S1 (SENT)
S1 (SENT)
__|_____
__|_____
C2 (grup-verb) --------| | |----------------- C4 (F-term)
C2 (grup-verb) --------| | |----------------- C4 (F-term)
| |
| |
N4 sacude C3 (obj)
N4 sacude C3 (obj)
| |
| |
C1 (subj) |
C1 (subj) |
| N5 Bagdad
| N5 Bagdad
N3 atentado
N3 atentado
|
|
------------
------------
| |
| |
N1 Un N2 triple
N1 Un N2 triple
</pre>
</pre>


Line 95: Line 175:
</NODE>
</NODE>
<NODE form='Gud' lem='Gud' ord='4' mi='N.Prop.Sg.Nom' si='SUBJ'/>
<NODE form='Gud' lem='Gud' ord='4' mi='N.Prop.Sg.Nom' si='SUBJ'/>
<NODE form='himmal' lem='himmal' ord='5' mi='N.Msc.Sg.Acc.Indef' si='OBJ'></NODE>
<NODE form='himmal' lem='himmal' ord='5' mi='N.Msc.Sg.Acc.Indef' si='OBJ'/>
</NODE>
</NODE>
</SENTENCE>
</SENTENCE>
Line 102: Line 182:


[[Category:Development]]
[[Category:Development]]
[[Category:Documentation in English]]
[[Category:Transfer]]

Latest revision as of 08:44, 3 October 2013

There are dependency parsers based on constraint grammar for a few languages which Apertium would like to treat (e.g. the Sámi languages and Faroese), it might be a nice idea to be able to do re-ordering before transfer (or during transfer) based on the dependency tree (this would not do lexical transfer, concordance or anything else, just LU reordering). The sister project Matxin already does something like this, so it would be worth looking there for ideas.

The first stage would be to convert cg-proc to output dependency information along with the lexical units. The second stage would be to write a module that builds a tree and does moving operations. Special care would need to be taken of superblanks.

Examples[edit]

In the example below,

Í  upphavi   skapti  Gud himmal og  jørð
In beginning created God heaven and earth

`In the beginning God created the heavens and the earth'

The subject could be moved before the verb using the dependency information, while inserting the determiners and doing concordance etc. would be left up to the rest of the transfer. The benefit to using the dependency graph to move stuff around is that it allows for limitless sized NPs etc.

Annotation[edit]

"<Í>"
        "í" Pr @ADVL> #1->3 
"<upphavi>"
        "upphav" N Neu Sg Dat Indef @P< #2->1 
"<skapti>"
        "skapa" V Ind Prt Sg @VMAIN #3->0 
"<Gud>"
        "gudur" N Msc Sg Nom Indef @<SUBJ #4->3 
"<himmal>"
        "himmal" N Msc Sg Acc Indef @<OBJ #5->3 
"<og>"
        "og" CC @CC #6->5 
"<jørð>"
        "jørð" N Fem Sg Acc Indef @<OBJ #7->5
"<.>"
        "." CLB #8->0 

In Apertium[edit]

In Apertium format, this input might look something like:

^Í/í<pr><@ADVL→><#1→3>$ 
^upphavi/upphav<n><nt><sg><dat><ind><@P←><#2→1>$ 
^skapti/skapa<vblex><pri><sg><@VMAIN><#3→0>$ 
^Gud/gudur<n><m><sg><nom<ind><@←SUBJ><#4→3>$ 
^himmal/himmal<n><m><sg><acc><ind><@←OBJ><#5→3>$
^og/og<cnjcoo><@CC><#6→5>$
^jørð/jørð<n><m><sg><acc><ind><@←OBJ><#7→5>$
^./.<sent><#8→0>$

And then you might have a rule which looks something like:

<reorder>
  <pattern>
    <head><pattern-item n="vmain"/></head>
    <child><pattern-item n="subj"/></child>
    <child><pattern-item n="obj"/></child>
  </pattern>
  <out>
    <clip pos="2"/>  <!-- subject -->
    <clip pos="1"/>  <!-- verb -->
    <clip pos="3"/>  <!-- object -->
  </out>
</reorder>

(Well, ok it will probably look quite different -- format to be determined.)

Giving the output:

^Í/í<pr><@ADVL→><#1→4>$ 
^upphavi/upphav<n><nt><sg><dat><ind><@P←><#2→1>$ 
^Gud/gudur<n><m><sg><nom<ind><@←SUBJ><#3→4>$ 
^skapti/skapa<vblex><pri><sg><@VMAIN><#4→0>$ 
^himmal/himmal<n><m><sg><acc><ind><#←OBJ><#5→4>$
^og/og<cnjcoo><@CC><#6→5>$
^jørð/jørð<n><m><sg><acc><ind><#←OBJ><#7→5>$
^./.<sent><#8→0>$

Note how the words have been reordered and the positions updated (it would be also nice to relabel @←SUBJ to @SUBJ→).

In this example, the subject that moved was a single word, but say we wanted to translate into a language which may have Object-Verb-Subject word order (like Tamil reported speech or… Klingon), then the rule would only get a slight change:

<reorder>
  <pattern>
    <head><pattern-item n="vmain"/></head>
    <child><pattern-item n="subj"/></child>
    <child><pattern-item n="obj"/></child>
  </pattern>
  <out>
    <clip pos="3"/>  <!-- object -->
    <clip pos="1"/>  <!-- verb -->
    <clip pos="2"/>  <!-- subject -->
  </out>
</reorder>

but the output would have the whole object phrase moved in front of the verb:

^Í/í<pr><@ADVL→><#1→4>$ 
^upphavi/upphav<n><nt><sg><dat><ind><@P←><#2→1>$ 
^himmal/himmal<n><m><sg><acc><ind><#←OBJ><#5→4>$
^og/og<cnjcoo><@CC><#6→5>$
^jørð/jørð<n><m><sg><acc><ind><#←OBJ><#7→5>$
^skapti/skapa<vblex><pri><sg><@VMAIN><#4→0>$ 
^Gud/gudur<n><m><sg><nom<ind><@←SUBJ><#3→4>$ 
^./.<sent><#8→0>$

Graph[edit]

                                       0
                                       |
 (2)upphav----- (1)í[@ADVL]-------(3)skapa[@VMAIN]
                                     |   | 
                                     |   |________ (5)himmal[@OBJ]----(6)og
                             ________|                | 
                            |                         ---------(7)jørð
                       (4)gudur[@SUBJ]

Matxin[edit]

We could also try outputting Matxin format. Although Matxin uses chunks as well as nodes, we should be able to just do node based trees, e.g.

                           S1 (SENT)
                          __|_____
  C2 (grup-verb) --------|   |  |----------------- C4 (F-term)
   |                         |
  N4 sacude                 C3 (obj)
   |                         |
  C1 (subj)                  |
   |                        N5 Bagdad
  N3 atentado
   |
  ------------
  |          |
 N1 Un      N2 triple

Could be represented

                                 N0 (SENT)
                                  __|_____
            ----------------------|   |  |----------------- N6 . (F-term)
            |                         |
           N4 sacude (grup-verb)     N5 Bagdad (obj)
            |
           N3 atentado (subj)
            |
      ------------
      |          |
     N1 Un      N2 triple 

The above example in Faroese might come out something like:

<SENTENCE ord="1">
  <NODE form='skapti' lem='skapa' ord='3' mi='V.Ind.Prt.Sg' si='VMAIN'>
    <NODE form='Í' lem='Í' ord='1' mi='Pr' si='ADVL'>
      <NODE form='upphavi' lem='upphav' ord='2' mi='N.Neu.Sg.Dat.Indef' si='P'/>
    </NODE>
    <NODE form='Gud' lem='Gud' ord='4' mi='N.Prop.Sg.Nom' si='SUBJ'/>
    <NODE form='himmal' lem='himmal' ord='5' mi='N.Msc.Sg.Acc.Indef' si='OBJ'/>
  </NODE>
</SENTENCE>