Corpus based preposition selection - HOWTO

From Apertium

Revision as of 16:42, 20 August 2012 by Fpetkovski (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

The general algorithm for performing corpus based preposition selection is as follows:

Download a parallel corpus
Extract patterns which contain prepositions from the source-language corpus
Align the patterns to their translations in the target-language corpus
Extract the features and label (the correct preposition from the target-language corpus) for classification.
Train a model
Use the trained model in the pipeline

The general toolkit for performing these tasks can be found here:
[https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/

toolkig]

Extracting training data for your classifier

For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.

Retrieved from "https://wiki.apertium.org/w/index.php?title=Corpus_based_preposition_selection_-_HOWTO&oldid=35999"