Difference between revisions of "Corpus based preposition selection - HOWTO"

From Apertium
Jump to navigation Jump to search
Line 7: Line 7:
 
* Use the trained model in the pipeline
 
* Use the trained model in the pipeline
   
The general toolkit for performing these tasks can be found here: <br />
+
The general toolkit for performing these tasks can be found [https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/ here].
[https://apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/ toolkit]
 
 
=== Extracting training data for your classifier ===
 
=== Extracting training data for your classifier ===
 
For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.
 
For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.

Revision as of 16:43, 20 August 2012

The general algorithm for performing corpus based preposition selection is as follows:

  • Download a parallel corpus
  • Extract patterns which contain prepositions from the source-language corpus
  • Align the patterns to their translations in the target-language corpus
  • Extract the features and label (the correct preposition from the target-language corpus) for classification.
  • Train a model
  • Use the trained model in the pipeline

The general toolkit for performing these tasks can be found here.

Extracting training data for your classifier

For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.