Difference between revisions of "How Apertium Works"

From Apertium
Jump to navigation Jump to search
(Add formatting and morphological analysation section)
 
m
Line 1: Line 1:
== How Apertium Works ==
== How Apertium Works ==


In this example, I will be using [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca apertium-en-ca] to demonstrate how text is translated.
In this example, [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca apertium-en-ca] will be used to demonstrate how text is translated.


Apertium takes text from a source, which it [[Format handling|formats]] into text which Apertium can translate without affecting the formatting.
Apertium takes text from a source, which it [[Format handling|formats]] into text which Apertium can translate without affecting the formatting.
Line 22: Line 22:
All formatting and whitespace within superblanks are ignored.
All formatting and whitespace within superblanks are ignored.


Afterwards, the text goes through [morphological analysis].
Afterwards, the text goes through [[morphological analysis]].


<pre>
<pre>
Line 29: Line 29:
</pre>
</pre>


Text is tagged with PoS (Part of Speech) tags. This helps in translation of text as during translation, words may be translated to other words with different PoS tags. This would cause major errors. Imagine a verb being translated into a noun; that would mess up the whole translation.
Text is tagged with PoS (Part of Speech) tags. This helps in translation of text as during translation, words may be translated to other words with different PoS tags. This would cause major errors. A verb being translated into a noun would mess up the whole translation, for example.

Revision as of 11:04, 27 November 2013

How Apertium Works

In this example, apertium-en-ca will be used to demonstrate how text is translated.

Apertium takes text from a source, which it formats into text which Apertium can translate without affecting the formatting.

This is done by enclosing formatting with superblanks and escaping backslashes.

This book is a <b>great</b> read!

is formatted into:

$ echo -n "This book is a <b>great</b> read." | apertium-deshtml
This book is a[ <b>]great[<\/b> ]read..[][

Note: There is no way for the formatter to know when there is no more input from stdin, so you may see an unclosed superblank. You can safely remove it.

All formatting and whitespace within superblanks are ignored.

Afterwards, the text goes through morphological analysis.

$ echo "This book is a[ <b>]great[<\/b> ]read..[]" | lt-proc en-ca.automorf.bin
^This/This<det><dem><sg>/This<prn><tn><mf><sg>$ ^book/book<n><sg>/book<vblex><inf>/book<vblex><pres>$ ^is/be<vbser><pri><p3><sg>$[ <b>]^a/a<det><ind><sg>$ ^great/great<adj><sint>$[<\/b> ]^read/read<vblex><inf>/read<vblex><pres>/read<vblex><past>/read<vblex><pp>$^./.<sent>$^./.<sent>$[]

Text is tagged with PoS (Part of Speech) tags. This helps in translation of text as during translation, words may be translated to other words with different PoS tags. This would cause major errors. A verb being translated into a noun would mess up the whole translation, for example.