Difference between revisions of "N-Stage transfer"

From Apertium
Jump to navigation Jump to search
m (input == output: project is for chunk merging)
Line 1: Line 1:
The idea of '''n-Stage transfer''' is to extend the <code>apertium-interchunk</code> so that it can output the same format as it inputs which would allow it to be called more than once, it would also be good to be able to "merge" chunks, for example <code>NP CC NP → NP</code>
+
The idea of '''n-Stage transfer''' is to extend the <code>apertium-interchunk</code> so that it can "merge" chunks, for example <code>NP CC NP → NP</code>
   
 
This is something like the idea of cascaded finite-state chunking, as described by Abney (1995).
 
This is something like the idea of cascaded finite-state chunking, as described by Abney (1995).

Revision as of 12:46, 23 March 2009

The idea of n-Stage transfer is to extend the apertium-interchunk so that it can "merge" chunks, for example NP CC NP → NP

This is something like the idea of cascaded finite-state chunking, as described by Abney (1995).

Examples

1. The girl with the telescope shouted at the boy who saw the dog in the field.

The current chunk-based transfer would normally chunk this into:

[The girl] [with] [the telescope] [shouted] [at] [the boy] [who] [saw] [the dog] [in] [the field]
 NP         PREP   NP              V        PREP  NP        REL   V     NP       PREP  NP 

This is quite a shallow analysis, with more stages of chunking, we could unify some of those chunks into more coherent phrases. So for example the next stage might be to unify PREP NP → PP then NP PP → NP, then V NP → VP and then NP REL VP → NP. We'd end up with a more coherent and "deep" analysis which might look something like

 The girl  with   the telescope    shouted  at    the boy   who   saw   the dog  in    the field
 DET NOM   PREP   DET NOM          V        PREP  DET NOM   REL   V     DET NOM  PREP  DET NOM       *
 NP        PREP   NP               V        PREP  NP        REL   V     NP       PREP  NP            (PREP NP → PP)
 NP        PP                      V        PP              REL   V     NP       PP                  (NP PP → NP)
 NP                                V        PP              REL   V     NP       PP                  (V NP → VP)
 NP                                V        PP              REL   VP                                 (NP REL VP → NP)
 NP                                V        NP

This would not give us any more "transfer power", as the rules would still be finite-state, and non-recursive, but it would make certain tasks easier.

2. My country's largest shopping centres

The current transfer chunks this into:

[My country]['s] [largest shopping centres]
NP          GEN  NP

An intermediate stage of the transfer could have a rule to join NP + GEN + NP to create a single NP chunk. This will avoid the huge work of having to specify in the first stage the many different word combinations that may form a NP.

In this example, the head of the new NP would be the second original NP, that means that the morphological information of the new chunk would be that of "largest shopping centres" (plural) and not that of "my country" (singular). This information is important so that the next stage of the transfer (the current interchunk module) can perform some concordance operations:

              [My country]['s] [largest shopping centres] [will prepare] (...)
 1st stage:    NP<sg>      GEN  NP<pl>                     V
 2nd stage:    NP<pl>                                      V
 3rd stage:    NP<pl>                                      V<pl>

References