Difference between revisions of "How Apertium Works"

From Apertium
Jump to navigation Jump to search
m
m
 
Line 32: Line 32:
   
 
More to be written...
 
More to be written...
  +
  +
[[Category:Documentation]]
  +
[[Category:Documentation in English]]

Latest revision as of 17:08, 26 September 2016

How Apertium Works[edit]

In this example, apertium-en-ca will be used to demonstrate how text is translated.

Apertium takes text from a source, which it formats into text which Apertium can translate without affecting the formatting.

This is done by enclosing formatting with superblanks and escaping backslashes.

This book is a <b>great</b> read!

is formatted into:

$ echo -n "This book is a <b>great</b> read." | apertium-deshtml
This book is a[ <b>]great[<\/b> ]read..[][

Note: There is no way for the formatter to know when there is no more input from stdin, so you may see an unclosed superblank. You can safely remove it.

All formatting and whitespace within superblanks are ignored.

Afterwards, the text goes through morphological analysis.

$ echo "This book is a[ <b>]great[<\/b> ]read..[]" | lt-proc en-ca.automorf.bin
^This/This<det><dem><sg>/This<prn><tn><mf><sg>$ ^book/book<n><sg>/book<vblex><inf>/book<vblex><pres>$ ^is/be<vbser><pri><p3><sg>$[ <b>]^a/a<det><ind><sg>$ ^great/great<adj><sint>$[<\/b> ]^read/read<vblex><inf>/read<vblex><pres>/read<vblex><past>/read<vblex><pp>$^./.<sent>$^./.<sent>$[]

Text is tagged with PoS (Part of Speech) tags. This helps in translation of text as during translation, words may be translated to other words with different PoS tags. This would cause major errors. A verb being translated into a noun would mess up the whole translation, for example.

More to be written...