Difference between revisions of "Corpus based preposition selection - HOWTO"

From Apertium
Jump to navigation Jump to search
(Created page with '=== Extracting training data for your classifier ===')
 
Line 1: Line 1:
  +
The general algorithm for performing corpus based preposition selection is as follows:
  +
* Download a parallel corpus
  +
* Extract patterns which contain prepositions from the source-language corpus
  +
* Align the patterns to their translations in the target-language corpus
  +
* Extract the features and label (the correct preposition from the target-language corpus) for classification.
  +
* Train a model
  +
* Use the trained model in the pipeline
  +
  +
The general toolkit for performing these tasks can be found here:
  +
apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/
  +
 
=== Extracting training data for your classifier ===
 
=== Extracting training data for your classifier ===
  +
For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.

Revision as of 16:40, 20 August 2012

The general algorithm for performing corpus based preposition selection is as follows:

  • Download a parallel corpus
  • Extract patterns which contain prepositions from the source-language corpus
  • Align the patterns to their translations in the target-language corpus
  • Extract the features and label (the correct preposition from the target-language corpus) for classification.
  • Train a model
  • Use the trained model in the pipeline

The general toolkit for performing these tasks can be found here: apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/

Extracting training data for your classifier

For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.