Corpus based preposition selection - HOWTO
Revision as of 16:43, 20 August 2012 by Fpetkovski (talk | contribs)
The general algorithm for performing corpus based preposition selection is as follows:
- Download a parallel corpus
- Extract patterns which contain prepositions from the source-language corpus
- Align the patterns to their translations in the target-language corpus
- Extract the features and label (the correct preposition from the target-language corpus) for classification.
- Train a model
- Use the trained model in the pipeline
The general toolkit for performing these tasks can be found here.
Extracting training data for your classifier
For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.