Difference between revisions of "Tips for translators"

From Apertium
Jump to navigation Jump to search
Line 10: Line 10:
$ apertium en-ca -f html input.html | sed 's%<a rel="\([^"]*\)"/>%\1%g'
$ apertium en-ca -f html input.html | sed 's%<a rel="\([^"]*\)"/>%\1%g'
Avui la data &eacute;s DATE, i el temps exterior &eacute;s WEATHER</pre>
Avui la data &eacute;s DATE, i el temps exterior &eacute;s WEATHER</pre>

==The HTML format adds entities, I want plain (Unicode) symbols==
When using the HTML format, most non-ASCII characters are turned into HTML entities:
<pre>$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
Avui <a href=http://time.org/>la data</a> &eacute;s March 12&egrave;</pre>
This might not be preferable.

If you have perl and perl-html-parser installed, you can append the following little script to the command:
<pre>perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'</pre>
e.g.
<pre>$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
Avui <a href=http://time.org/>la data</a> és March 12è</pre>


==See also==
==See also==

Revision as of 11:29, 5 November 2011

Not translating certain parts of the text

To ensure certain text is not translated, you can use the HTML format and put it in e.g. an html attribute. Say you're translating for some software and you have the input string:

Today's date is DATE, and the weather outside is WEATHER.

Then you could change it to e.g.

Today's date is <a rel="DATE"/>, and the weather outside is <a rel="WEATHER"/>.

and translate it with apertium -f html, then strip the html you added after it's translated, e.g.

$ cat input.html 
Today's date is <a rel="DATE"/>, and the weather outside is <a rel="WEATHER"/>
$ apertium en-ca -f html input.html | sed 's%<a rel="\([^"]*\)"/>%\1%g'
Avui  la data és DATE, i el temps exterior és WEATHER

The HTML format adds entities, I want plain (Unicode) symbols

When using the HTML format, most non-ASCII characters are turned into HTML entities:

$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
Avui  <a href=http://time.org/>la data</a> és March 12è

This might not be preferable.

If you have perl and perl-html-parser installed, you can append the following little script to the command:

perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'

e.g.

$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
Avui  <a href=http://time.org/>la data</a> és March 12è

See also