How Apertium Works

From Apertium
Revision as of 12:31, 1 December 2013 by Wei2912 (talk | contribs)
Jump to navigation Jump to search

How Apertium Works

In this example, apertium-en-ca will be used to demonstrate how text is translated.

Apertium takes text from a source, which it formats into text which Apertium can translate without affecting the formatting.

This is done by enclosing formatting with superblanks and escaping backslashes.

This book is a <b>great</b> read!

is formatted into:

$ echo -n "This book is a <b>great</b> read." | apertium-deshtml
This book is a[ <b>]great[<\/b> ]read..[][

Note: There is no way for the formatter to know when there is no more input from stdin, so you may see an unclosed superblank. You can safely remove it.

All formatting and whitespace within superblanks are ignored.

Afterwards, the text goes through morphological analysis.

$ echo "This book is a[ <b>]great[<\/b> ]read..[]" | lt-proc en-ca.automorf.bin
^This/This<det><dem><sg>/This<prn><tn><mf><sg>$ ^book/book<n><sg>/book<vblex><inf>/book<vblex><pres>$ ^is/be<vbser><pri><p3><sg>$[ <b>]^a/a<det><ind><sg>$ ^great/great<adj><sint>$[<\/b> ]^read/read<vblex><inf>/read<vblex><pres>/read<vblex><past>/read<vblex><pp>$^./.<sent>$^./.<sent>$[]

Text is tagged with PoS (Part of Speech) tags. This helps in translation of text as during translation, words may be translated to other words with different PoS tags. This would cause major errors. A verb being translated into a noun would mess up the whole translation, for example.

More to be written...