Difference between revisions of "Cascaded Interchunk"

From Apertium
Jump to navigation Jump to search
(Created page with '{{TOCD}} == What Cascaded Interchunk Is == '''In Progress''' == What Cascaded Interchunk Does in Apertium == '''In Progress'''')
 
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
== What Cascaded Interchunk Is ==
 
   
  +
==Chunking==
   
  +
Chunking is based on source language patterns. It is used in language pairs such as English-Esperanto.
'''In Progress'''
 
   
  +
*First, words are reordered into chunks.
  +
  +
*Then, the chunks are reordered by matching patterns like adj+noun or adj+adj+noun.
   
  +
*From this, a ‘pseudo lemma’ is made with a tag containing the type – normally ‘SN’ (Noun Phrase) or ‘SV’ (Verb Phrase).
   
  +
*Basically after this, the translation is done with these pseudo words breaking the language down to its roots.
   
  +
Chunks for an English phrase may look like:
   
  +
<pre>
== What Cascaded Interchunk Does in Apertium ==
 
  +
SN (The dog) SV (played with) SN (the boy)
  +
</pre>
   
  +
<nowiki>"The dog" is a noun phrase and so is "the boy" so they are chunked as such.</nowiki>
   
  +
"played with" is a verb phrase and so is chunked as such and not as a noun phrase.
'''In Progress'''
 
  +
  +
  +
This method is used in shallow transfer translation engines such as Apertium because it doesn't use parse trees (which are normally used in "deep transfer"). See [http://en.wikipedia.org/wiki/Parse_tree Parse tree on Wikipedia].

Revision as of 19:05, 12 January 2012

Contents

Chunking

Chunking is based on source language patterns. It is used in language pairs such as English-Esperanto.

  • First, words are reordered into chunks.
  • Then, the chunks are reordered by matching patterns like adj+noun or adj+adj+noun.
  • From this, a ‘pseudo lemma’ is made with a tag containing the type – normally ‘SN’ (Noun Phrase) or ‘SV’ (Verb Phrase).
  • Basically after this, the translation is done with these pseudo words breaking the language down to its roots.

Chunks for an English phrase may look like:

SN (The dog)    SV (played with)    SN (the boy)

"The dog" is a noun phrase and so is "the boy" so they are chunked as such.

"played with" is a verb phrase and so is chunked as such and not as a noun phrase.


This method is used in shallow transfer translation engines such as Apertium because it doesn't use parse trees (which are normally used in "deep transfer"). See Parse tree on Wikipedia.