Difference between revisions of "Preparing data for Moses factored training using Apertium"

Revision as of 12:26, 21 January 2010

This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.

For example Europarl:

$ wget http://www.statmt.org/europarl/v5/da-en.tgz

Download the tagger to factored script:

$ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py

@@ Line 1: / Line 1: @@
 This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.
+==Steps==
+===Download parallel corpus===
+For example Europarl:
+<pre>
+$ wget http://www.statmt.org/europarl/v5/da-en.tgz
+</pre>
+===Clean and tag both sides of corpus===
+<pre>
+</pre>
+===Convert to Moses factored format===
+Download the tagger to factored script:
+<pre>
+$ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py
+</pre>