N-Stage transfer

From Apertium
Jump to navigation Jump to search

The idea of n-Stage transfer is to extend the apertium-interchunk so that it can "merge" chunks, for example NP CC NP → NP

This is something like the idea of cascaded finite-state chunking, as described by Abney (1995).

Examples

1. The girl with the telescope shouted at the boy who saw the dog in the field.

The current chunk-based transfer would normally chunk this into:

[The girl] [with] [the telescope] [shouted] [at] [the boy] [who] [saw] [the dog] [in] [the field]
 NP         PREP   NP              V        PREP  NP        REL   V     NP       PREP  NP 

This is quite a shallow analysis, with more stages of chunking, we could unify some of those chunks into more coherent phrases. So for example the next stage might be to unify PREP NP → PP then NP PP → NP, then V NP → VP and then NP REL VP → NP. We'd end up with a more coherent and "deep" analysis which might look something like

 The girl  with   the telescope    shouted  at    the boy   who   saw   the dog  in    the field
 DET NOM   PREP   DET NOM          V        PREP  DET NOM   REL   V     DET NOM  PREP  DET NOM       *
 NP        PREP   NP               V        PREP  NP        REL   V     NP       PREP  NP            (PREP NP → PP)
 NP        PP                      V        PP              REL   V     NP       PP                  (NP PP → NP)
 NP                                V        PP              REL   V     NP       PP                  (V NP → VP)
 NP                                V        PP              REL   VP                                 (NP REL VP → NP)
 NP                                V        NP

This would not give us any more "transfer power", as the rules would still be finite-state, and non-recursive, but it would make certain tasks easier. Probably we wouldn't reach 5 levels of interchunk, but even having one more level could help a lot.

2. My country's largest shopping centres

The current transfer chunks this into:

[My country]['s] [largest shopping centres]
NP          GEN  NP

An intermediate stage of the transfer could have a rule to join NP + GEN + NP to create a single NP chunk. This will avoid the huge work of having to specify in the first stage the many different word combinations that may form a NP.

In this example, the head of the new NP would be the second original NP, that means that the morphological information of the new chunk would be that of "largest shopping centres" (plural) and not that of "my country" (singular). This information is important so that the next stage of the transfer (the current interchunk module) can perform some concordance operations:

              [My country]['s] [largest shopping centres] [will prepare] (...)
 1st stage:    NP<sg>      GEN  NP<pl>                     V
 2nd stage:    NP<pl>                                      V
 3rd stage:    NP<pl>                                      V<pl>

Test implementation


from	Jacob Nordfalk <jacob.nordfalk@gmail.com>
til	Apertium-stuff <apertium-stuff@lists.sourceforge.net>
dato	12. apr. 2009 02.51
subject	Experimental 4-stage transfer introduced in Apertium!
sendt af	gmail.com
	
skjul detaljer 02.51 (2 minutter siden)
	
	
Svar
	
	

Dear all,

I took the freedom to add support for simple n-stage transfer, by adding 5 simple lines of code to Apertium today.

The lines I added to Apertium can be seen here:
http://apertium.svn.sourceforge.net/viewvc/apertium?view=rev&revision=9616

Basically I added support for a new part:

<clip pos="3" part="x_pgcontent"/>

which gives the CONTENT INSIDE a chunk.
so from ^adj_nom<SN><nom>{^granda<adj><sg><2>$ ^hundo<n><m><sg><2>$}$
it gets ^granda<adj><sg><2>$ ^hundo<n><m><sg><2>$

This can be used to merge chunks.

I have put an example here:
http://apertium.svn.sourceforge.net/viewvc/apertium?view=rev&revision=9613

Try
big black cat's nice blue eyes
belaj bluaj okuloj de granda nigra kato

<SN>{granda nigra kato} <GEN>{de} SN{belaj bluaj okuloj}    ->    <SN>{belaj bluaj okuloj de granda nigra kato}


Of course the name part="x_pgcontent" is temporary (eXperimental) and shoudn't be counted on,

But I will ask you to please leave it in Apertium until someone makes some kind of improved ('proper') n-stage transfer support.

As Esperanto's grammar is simple this very simple n-stage tranfer support is satisfactory for most tasks (and can really make a big difference in this language pair).

In the meanwhile I will use this to improve the English-Esperanto pair, and gain experience about n-stage tranfer to share with Apertium community (and for you to try it you must svn up apertium and install with the patch, of course),

I have already idintified the following poblems:

- Case handling  (Big black cat's nice blue eyes  -> belaj bluaj okuloj de Granda nigra kato  )

- Tag reference handling (an option to unpack <2>'s and <3>'s).
  This should not always happen, as its sometimes good to keep the <2>'s and <3>'s and sometimes its not.
  I can give some examples of this on request.


Jacob


Working Example

In is-en pair:

spectie: Á morgun kemur maðurinn sem hann sá í gær.
spectie: ^Á morgun<ADV>{^tomorrow<adv>$}$ ^come<SV><@+FMAINV><pres><p3><sg>{^come<vblex><3><4><5>$}$ ^det_nom<SN><@←SUBJ><sg>{^the<det><def><3>$
         ^man<n><3>$}$ ^sem<REL>{^that<rel><an><mf><sp>$}$ ^prn<SN><@SUBJ→><p3><m><sg>{^prpers<prn><subj><p3><m><sg>$}$
         ^see<SV><@+FMAINV><past> <p3><sg>{^see<vblex><3>$}$ ^í gær<ADV>{^yesterday<adv>$}$^sent<SENT>{^..<sent>$}$
spectie: (chunker, t1x)
spectie: 
spectie: ^Á morgun<ADV>{^tomorrow<adv>$}$ ^come<SV><@+FMAINV><pres><p3><sg>{^come<vblex><3><4><5>$}$ ^sn_rel_sn_v_adv<SN><@←SUBJ><sg>{^the<det>
         <def><3>$ ^man<n><3>$ ^that<rel><an><mf><sp>$ ^prpers<prn><subj><p3><m><sg>$ ^see<vblex><past>$ ^yesterday<adv>$}$^sent<SENT>{^..<sent>$}$
spectie: (interchunk1, t2x)
spectie: 
spectie: ^Á morgun<ADV>{^tomorrow<adv>$}$ ^sn_rel_sn_v_adv<SN><@←SUBJ><sg>{^the<det><def><3>$ ^man<n><3>$ ^that<rel><an><mf><sp>$ ^prpers<prn><subj>
         <p3><m><sg>$ ^see<vblex><past>$ ^yesterday<adv>$}$ ^come<SV><@+FMAINV><pres><p3><sg>{^come<vblex><3><4><5>$}$^sent<SENT>{^..<sent>$}$
spectie: (interchunk2, t3x)
spectie: 
spectie: ^Tomorrow<adv>$ ^the<det><def><sg>$ ^man<n><sg>$ ^that<rel><an><mf><sp>$ ^prpers<prn><subj><p3><m><sg>$ ^see<vblex><past>$ ^yesterday<adv>$
         ^come<vblex><pres><p3><sg>$^..<sent>$
spectie: (postchunk, t4x)
spectie: 
spectie: <rule comment="REGLA: SN_SUBJ_R REL SN_SUBJ_L FMAINV ADV → SN">
spectie: 
spectie: this is the rule in t2x

Translated output: Tomorrow the man that he saw yesterday comes.

References