Difference between revisions of "Tips for translators"

From Apertium
Jump to navigation Jump to search
m
 
(14 intermediate revisions by one other user not shown)
Line 1: Line 1:
This page collects practical tips and tricks for ''using'' apertium as a translator.
==Not translating certain parts of the text==
To ensure certain text is not translated, you can use the HTML format and put it in e.g. an html attribute. Say you're translating for some software and you have the input string:
<pre>Today's date is DATE, and the weather outside is WEATHER.</pre>
Then you could change it to e.g.
<pre>Today's date is <a rel="DATE"/>, and the weather outside is <a rel="WEATHER"/>.</pre>
and translate it with apertium -f html, then strip the html you added after it's translated, e.g.



<pre>$ cat input.html
{{TOCD}}
Today's date is <a rel="DATE"/>, and the weather outside is <a rel="WEATHER"/>

$ apertium en-ca -f html input.html | sed 's%<a rel="\([^"]*\)"/>%\1%g'
==General tips==
Avui la data &eacute;s DATE, i el temps exterior &eacute;s WEATHER</pre>
If you're translating something that is to be published, you'll get the best results if you
* run a spellcheck on the source text before translating, and
* run a spellcheck on the target text after translating

Remember that any machine translated text needs to be [[Post-editing|post-edited]] before publication.

==Are there any graphical user interfaces or apps?==
Other than our web site http://apertium.org, see [[Tools#Tools_for_users_.2F_translators]]

==What do the funny symbols like */#@ mean?==
A star * means a word was unknown to the translator and passed through unchanged. For proper nouns, this is often OK, but other words might need manual correction. (Some times you might see other symbols like #/@, these are [[Apertium stream format|debug symbols]] which indicate a bug in the translator.)

==How do I make the translator ignore certain strings?==
Use one of the XML based modes, e.g. '''html''' and put <code><apertium-notrans></code> tags around the text you don't want translated. E.g.

<pre>
$ echo "Translate me <apertium-notrans>don't translate me</apertium-notrans> but translate me" |apertium en-es -f html
Me traduzco <apertium-notrans>don't translate me</apertium-notrans> pero traducirme
</pre>


==The HTML format adds entities, I want plain (Unicode) symbols==
==The HTML format adds entities, I want plain (Unicode) symbols==
When using the HTML format, most non-ASCII characters are turned into HTML entities:
When using the HTML format, most non-ASCII characters are turned into HTML entities:
<pre>$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
<pre>$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
Avui <a href=http://time.org/>la data</a> &eacute;s March 12&egrave;</pre>
Avui <a id="foo" href=http://time.org/>la data</a> &eacute;s March 12&egrave;</pre>
This might not be preferable.
This might not be preferable.


You can use the '''html-noent''' mode instead to avoid this.
If you have perl and perl-html-parser installed, you can append the following little script to the command:


With older versions of apertium you have to use this hack: With have perl and perl-html-parser installed, you can append the following little script to the command:
<pre>perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'</pre>
<pre>perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'</pre>
e.g.
e.g.
<pre>$ echo "Today's <a href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
<pre>$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
Avui <a href=http://time.org/>la data</a> és March 12è</pre>
Avui <a id="foo" href=http://time.org/>la data</a> és March 12è</pre>



==See also==
==See also==
* [[Translation memory]] for translating TMX / .tmx files
* [[Translating QT Linguist TS-files]] for how to translate .ts files
* [[Translating gettext]] for how to translate .po files
* [[Translating gettext]] for how to translate .po files
* [[Translating JSON]] for how not to translate .json files
* [[Translating subtitles]]
* [[Translating wikimedia]]
* [[Format handling]] for a list of built-in supported input/output formats


[[Category:Using Apertium]]
[[Category:Documentation in English]]

Latest revision as of 18:56, 26 September 2016

This page collects practical tips and tricks for using apertium as a translator.


General tips[edit]

If you're translating something that is to be published, you'll get the best results if you

  • run a spellcheck on the source text before translating, and
  • run a spellcheck on the target text after translating

Remember that any machine translated text needs to be post-edited before publication.

Are there any graphical user interfaces or apps?[edit]

Other than our web site http://apertium.org, see Tools#Tools_for_users_.2F_translators

What do the funny symbols like */#@ mean?[edit]

A star * means a word was unknown to the translator and passed through unchanged. For proper nouns, this is often OK, but other words might need manual correction. (Some times you might see other symbols like #/@, these are debug symbols which indicate a bug in the translator.)

How do I make the translator ignore certain strings?[edit]

Use one of the XML based modes, e.g. html and put <apertium-notrans> tags around the text you don't want translated. E.g.

$ echo "Translate me <apertium-notrans>don't translate me</apertium-notrans> but translate me" |apertium en-es -f html
Me traduzco <apertium-notrans>don't translate me</apertium-notrans> pero traducirme

The HTML format adds entities, I want plain (Unicode) symbols[edit]

When using the HTML format, most non-ASCII characters are turned into HTML entities:

$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca
Avui  <a id="foo" href=http://time.org/>la data</a> és March 12è

This might not be preferable.

You can use the html-noent mode instead to avoid this.


With older versions of apertium you have to use this hack: With have perl and perl-html-parser installed, you can append the following little script to the command:

perl -we 'use HTML::Entities;binmode(STDOUT,":utf8");while(<STDIN>){print decode_entities($_);}'

e.g.

$ echo "Today's <a id="foo" href="http://time.org"/>date</a> is March 12th" |apertium -f html en-ca|perl -we 'use HTML::Entities; binmode(STDOUT, ":utf8");while(<STDIN>) { print decode_entities($_); }'
Avui  <a id="foo" href=http://time.org/>la data</a> és March 12è


See also[edit]