Preparing data for Moses factored training using Apertium

From Apertium

Revision as of 12:26, 21 January 2010 by Francis Tyers (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

This page gives a description of how to preprocess a corpus using Apertium so it can be used to train Moses factoredly.

Contents

1 Steps

Steps

Download parallel corpus

For example Europarl:

$ wget http://www.statmt.org/europarl/v5/da-en.tgz

Clean and tag both sides of corpus

Convert to Moses factored format

Download the tagger to factored script:

$ wget http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-tools/tagger-to-factored.py

Retrieved from "https://wiki.apertium.org/w/index.php?title=Preparing_data_for_Moses_factored_training_using_Apertium&oldid=16125"

Documentation