Translating subtitles

From Apertium
Revision as of 16:53, 5 February 2011 by Unhammer (talk | contribs) (a new howto)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

If you want to translate subtitles with Apertium, you can use Apertium Subtitles to get translation suggestions one-by-one either from a local installation or from the server, but it might be more efficient to simply translate a full file at once, and then use your favorite subtitling application (e.g. Gaupol or Jubler) to post-edit.

There are no format filters for srt or sub files in Apertium, but with some trickery we can get it translated.

We'll use Translate Toolkit's sub2po to turn our subtitle file into a po-file, then use Pology's pomtrans to translate that with our local Apertium installation, then Translate Toolkit's po2sub can turn it back into a subtitle file.

Install the prerequisites

On Arch Linux:

   $ pacman -S translate-toolkit gaupol python2-chardet

(Translate Toolkit needs Gaupol in order to perform the conversion.) Pology is in the Arch User Repository; either download and build with makepkg or use yaourt:

   $ yaourt -S pology-svn

Convert, translate, convert back, post-edit

Say you have the file Sintel.es.srt that you want to translate into Catalan.

Convert it into a po-file like this:

  sub2po -i Sintel.es.srt -o Sintel.es-ca.po

This po-file will have the Spanish as the source text, and completely empty target entries. Run that po-file through Apertium:

  /opt/pology/bin/pomtrans -s es -t ca -T /usr/local/bin/apertium -M es-ca apertium Sintel.es-ca.po

where -T gives the path to your apertium installation, and -M is the mode to use. Now the po-file will have Catalan text in the target entries :-) Convert it back to a po-file, using the timestamps from the original subtitle file:

  po2sub --fuzzy -t Sintel.es.srt -i Sintel.es-ca.po -o Sintel.ca.srt

We give --fuzzy to include fuzzy entries. Pology's pomtrans by default (as it should) marks all machine translated text as fuzzy.

Now you can post-edit Sintel.ca.srt in Jubler or Gaupol or whatever we want. Alternatively, you can edit the actual po-file in a po-editor like Virtaal or Lokalize before the po2sub step, although then you won't be able to compare with the movie as easily.