Cascaded Interchunk

From Apertium
Revision as of 19:05, 12 January 2012 by BrendenD14 (talk | contribs)
Jump to navigation Jump to search



Chunking is based on source language patterns. It is used in language pairs such as English-Esperanto.

  • First, words are reordered into chunks.
  • Then, the chunks are reordered by matching patterns like adj+noun or adj+adj+noun.
  • From this, a ‘pseudo lemma’ is made with a tag containing the type – normally ‘SN’ (Noun Phrase) or ‘SV’ (Verb Phrase).
  • Basically after this, the translation is done with these pseudo words breaking the language down to its roots.

Chunks for an English phrase may look like:

SN (The dog)    SV (played with)    SN (the boy)

"The dog" is a noun phrase and so is "the boy" so they are chunked as such.

"played with" is a verb phrase and so is chunked as such and not as a noun phrase.

This method is used in shallow transfer translation engines such as Apertium because it doesn't use parse trees (which are normally used in "deep transfer"). See Parse tree on Wikipedia.