Tips for translators

From Apertium
Jump to navigation Jump to search

This page collects practical tips and tricks for using apertium as a translator.

How do I make the translator ignore certain strings?

Use one of the XML based modes, e.g. html and put <apertium-notrans> tags around the text you don't want translated. E.g.

$ echo "Translate me <apertium-notrans>don't translate me</apertium-notrans> but translate me" |apertium en-es -f html
Me traduzco <apertium-notrans>don't translate me</apertium-notrans> pero traducirme

The HTML format adds entities, I want plain (Unicode) symbols

When using the HTML format, most non-ASCII characters are turned into HTML entities:

$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
Avui  <a id="foo" href=http://time.org/>la data</a> és March 12è

This might not be preferable.

You can use the html-noent mode instead to avoid this.


With older versions of apertium you have to use this hack: With have perl and perl-html-parser installed, you can append the following little script to the command:

perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'

e.g.

$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
Avui  <a id="foo" href=http://time.org/>la data</a> és March 12è

How do I use my translation memory (TMX) with Apertium?

See Translation memory.

See also