Corpus based preposition selection - HOWTO

From Apertium
Revision as of 16:43, 20 August 2012 by Fpetkovski (talk | contribs)
Jump to navigation Jump to search

The general algorithm for performing corpus based preposition selection is as follows:

  • Download a parallel corpus
  • Extract patterns which contain prepositions from the source-language corpus
  • Align the patterns to their translations in the target-language corpus
  • Extract the features and label (the correct preposition from the target-language corpus) for classification.
  • Train a model
  • Use the trained model in the pipeline

The general toolkit for performing these tasks can be found here.

Extracting training data for your classifier

For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.