Difference between revisions of "Tips for translators"
Line 17: | Line 17: | ||
==The HTML format adds entities, I want plain (Unicode) symbols== |
==The HTML format adds entities, I want plain (Unicode) symbols== |
||
When using the HTML format, most non-ASCII characters are turned into HTML entities: |
When using the HTML format, most non-ASCII characters are turned into HTML entities: |
||
<pre>$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca |
<pre>$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca |
||
Avui <a href=http://time.org/>la data</a> és March 12è</pre> |
Avui <a id="foo" href=http://time.org/>la data</a> és March 12è</pre> |
||
This might not be preferable. |
This might not be preferable. |
||
You can use the <code>html-noent</code> mode instead to avoid this. |
|||
⚫ | |||
⚫ | |||
<pre>perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'</pre> |
<pre>perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'</pre> |
||
e.g. |
e.g. |
||
<pre>$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }' |
<pre>$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }' |
||
Avui <a href=http://time.org/>la data</a> és March 12è</pre> |
Avui <a id="foo" href=http://time.org/>la data</a> és March 12è</pre> |
||
==How do I use my translation memory (TMX) with Apertium?== |
==How do I use my translation memory (TMX) with Apertium?== |
Revision as of 07:48, 5 April 2014
This page collects practical tips and tricks for using apertium as a translator.
How do I make the translator ignore certain strings?
To ensure certain text is not translated, you can use the HTML format and put it in e.g. an html attribute. Say you're translating for some software and you have the input string:
Today's date is DATE, and the weather outside is WEATHER.
Then you could change it to e.g.
Today's date is <a rel="DATE"/>, and the weather outside is <a rel="WEATHER"/>.
and translate it with apertium -f html, then strip the html you added after it's translated, e.g.
$ cat input.html Today's date is <a rel="DATE"/>, and the weather outside is <a rel="WEATHER"/> $ apertium en-ca -f html input.html | sed 's%<a rel="\([^"]*\)"/>%\1%g' Avui la data és DATE, i el temps exterior és WEATHER
The HTML format adds entities, I want plain (Unicode) symbols
When using the HTML format, most non-ASCII characters are turned into HTML entities:
$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca Avui <a id="foo" href=http://time.org/>la data</a> és March 12è
This might not be preferable.
You can use the html-noent
mode instead to avoid this.
With older versions of apertium you have to use this hack: With have perl and perl-html-parser installed, you can append the following little script to the command:
perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'
e.g.
$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }' Avui <a id="foo" href=http://time.org/>la data</a> és March 12è
How do I use my translation memory (TMX) with Apertium?
See Translation memory.
See also
- Translating gettext for how to translate .po files
- Translating subtitles
- Translating wikimedia
- Format handling for a list of supported input/output formats