Difference between revisions of "Preparing data for Moses factored training using Apertium"
Jump to navigation
Jump to search
(Created page with 'This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly. Category:Documentation') |
|||
Line 1: | Line 1: | ||
This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly. |
This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly. |
||
+ | ==Steps== |
||
+ | |||
+ | ===Download parallel corpus=== |
||
+ | |||
+ | For example Europarl: |
||
+ | |||
+ | <pre> |
||
+ | $ wget http://www.statmt.org/europarl/v5/da-en.tgz |
||
+ | |||
+ | </pre> |
||
+ | |||
+ | ===Clean and tag both sides of corpus=== |
||
+ | |||
+ | <pre> |
||
+ | |||
+ | </pre> |
||
+ | |||
+ | ===Convert to Moses factored format=== |
||
+ | |||
+ | Download the tagger to factored script: |
||
+ | |||
+ | <pre> |
||
+ | $ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py |
||
+ | </pre> |
||
Revision as of 12:26, 21 January 2010
This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.
Contents
Steps
Download parallel corpus
For example Europarl:
$ wget http://www.statmt.org/europarl/v5/da-en.tgz
Clean and tag both sides of corpus
Convert to Moses factored format
Download the tagger to factored script:
$ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py