Difference between revisions of "Tips for translators"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
This page collects practical tips and tricks for ''using'' apertium as a translator.
This page collects practical tips and tricks for ''using'' apertium as a translator.



{{TOCD}}
{{TOCD}}



==How do I make the translator ignore certain strings?==
==How do I make the translator ignore certain strings?==
Line 26: Line 28:
Avui <a id="foo" href=http://time.org/>la data</a> és March 12è</pre>
Avui <a id="foo" href=http://time.org/>la data</a> és March 12è</pre>


==How do I use my translation memory (TMX) with Apertium?==
See [[Translation memory]].


==See also==
==See also==
* [[Translation memory]] for translating TMX / .tmx files
* [[Translating QT Linguist TS-files]] for how to translate .ts files
* [[Translating gettext]] for how to translate .po files
* [[Translating gettext]] for how to translate .po files
* [[Translating subtitles]]
* [[Translating subtitles]]
* [[Translating wikimedia]]
* [[Translating wikimedia]]
* [[Format handling]] for a list of supported input/output formats
* [[Format handling]] for a list of built-in supported input/output formats





Revision as of 07:53, 5 April 2014

This page collects practical tips and tricks for using apertium as a translator.



How do I make the translator ignore certain strings?

Use one of the XML based modes, e.g. html and put <apertium-notrans> tags around the text you don't want translated. E.g.

$ echo "Translate me <apertium-notrans>don't translate me</apertium-notrans> but translate me" |apertium en-es -f html
Me traduzco <apertium-notrans>don't translate me</apertium-notrans> pero traducirme

The HTML format adds entities, I want plain (Unicode) symbols

When using the HTML format, most non-ASCII characters are turned into HTML entities:

$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
Avui  <a id="foo" href=http://time.org/>la data</a> és March 12è

This might not be preferable.

You can use the html-noent mode instead to avoid this.


With older versions of apertium you have to use this hack: With have perl and perl-html-parser installed, you can append the following little script to the command:

perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'

e.g.

$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
Avui  <a id="foo" href=http://time.org/>la data</a> és March 12è


See also