N-Stage transfer
The idea of n-Stage transfer is to extend the apertium-interchunk
so that it can output the same format as it inputs which would allow it to be called more than once, it would also be good to be able to "merge" chunks, for example NP CC NP → NP
This is something like the idea of cascaded finite-state chunking, as described by Abney (1995).
Example
- 1 - The girl with the telescope shouted at the boy who saw the dog in the field.
The current chunk-based transfer would normally chunk this into:
[The girl] [with] [the telescope] [shouted] [at] [the boy] [who] [saw] [the dog] [in] [the field] NP PREP NP V PREP NP REL V NP PREP NP
This is quite a shallow analysis, with more stages of chunking, we could unify some of those chunks into more coherent phrases. So for example the next stage might be to unify PREP NP → PP
then NP PP → NP
, then V NP → VP
and then NP REL VP → NP
. We'd end up with a more coherent and "deep" analysis which might look something like
The girl with the telescope shouted at the boy who saw the dog in the field DET NOM PREP DET NOM V PREP DET NOM REL V DET NOM PREP DET NOM * NP PREP NP V PREP NP REL V NP PREP NP (PREP NP → PP) NP PP V PP REL V NP PP (NP PP → NP) NP V PP REL V NP PP (V NP → VP) NP V PP REL VP (NP REL VP → NP) NP V NP
This would not give us any more "transfer power", as the rules would still be finite-state, and non-recursive, but it would make certain tasks easier.
- 2 - My country's largest shopping centres
The current transfer chunkes this into:
[My country]['s] [largest shopping centres] NP GEN NP
An intermediate stage of the transfer could have a rule to join NP + GEN + NP to create a single NP chunk. This will avoid the huge work of having to specify in the first stage the many different word combinations that may form a NP.
In this example, the head of the new NP would be the second original NP, that means that the morphological information of the new chunk would be that of "largest shopping centres" (plural) and not that of "my country" (singular). This information is important so that the next stage of the transfer (the current interchunk module) can perform some concordance operations:
[My country]['s] [largest shopping centres] [will prepare] (...) 1st stage: NP<sg> GEN NP<pl> V 2nd stage: NP<pl> V 3rd stage: NP<pl> V<pl>
References
- Steven Abney. (1996) "Partial Parsing via Finite-State Cascades". J. of Natural Language Engineering, 2(4): 337-344.