Chunking: A full example

From Apertium
Revision as of 17:26, 28 September 2008 by Jacob Nordfalk (talk | contribs) (New page: This will be a full example of chunking, which we build from the ground up. We will look at Esperanto <-> English and try to translate the sentence "La libro estas bona" to "The book is g...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This will be a full example of chunking, which we build from the ground up.

We will look at Esperanto <-> English and try to translate the sentence "La libro estas bona" to "The book is good".

First a little overview of how 3-stage transfer normally works:

  • First the individual words are categorized and put into chunks (in the .t1x file).

Here the tags in the words can also be added, removed or made into 'pointers' that points to the tags in the enclosing chunk.

  • Then the chunks are reordered, combined and split (in the .t2x file)
  • Then the chunks thrown are reordered, combined and split (in the .t3x file)

If we look at how "The book is good" goes throgh the system, we have just before transfer:

^The<det><def><sp>$ ^book<n><sg>$ 
^be<vbser><pres><p3><sg>$ 
^good<adj><sint>$

which is chunked into

^det_nom<SN><sg><nom>{^La<det><def><2><3>$ ^libro<n><2><3>$}$ 
^ser<SV><pres><p3><sg>{^esti<vbser><pres>$}$ 
^adj<SN><sg>{^bona<adj><sg><nom>$}$

Here 'det_nom' is the name of the chunk and

<SN><sg><nom>

the chunk's tags. The content is {^La<det><def><2><3>$ ^libro<n><2><3>$} where the <2> and <3> are pointers to the chunk's tag (<sg> and <nom> respectively). This allows us to change the values at chunk level later on, if necessary.





Word/chunk reordering

Now that "La libro estas bona" -> "The book is good" works, lets look at how chunk reordering works. In Esperanto you make a sentence into a question by putting "Ĉu" in the start of the sentence: "Ĉu la libro estas bona?" In English the verb needs to come first: "Is the book good?".