Difference between revisions of "Preparing data for Moses factored training using Apertium"

From Apertium
Jump to navigation Jump to search
(Created page with 'This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly. Category:Documentation')
 
Line 1: Line 1:
 
This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.
 
This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.
   
  +
==Steps==
  +
  +
===Download parallel corpus===
  +
  +
For example Europarl:
  +
  +
<pre>
  +
$ wget http://www.statmt.org/europarl/v5/da-en.tgz
  +
  +
</pre>
  +
  +
===Clean and tag both sides of corpus===
  +
  +
<pre>
  +
  +
</pre>
  +
  +
===Convert to Moses factored format===
  +
  +
Download the tagger to factored script:
  +
  +
<pre>
  +
$ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py
  +
</pre>
   
   

Revision as of 12:26, 21 January 2010

This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.

Steps

Download parallel corpus

For example Europarl:

$ wget http://www.statmt.org/europarl/v5/da-en.tgz

Clean and tag both sides of corpus


Convert to Moses factored format

Download the tagger to factored script:

$ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py