Difference between revisions of "Corpus based preposition selection - HOWTO"
Jump to navigation
Jump to search
Fpetkovski (talk | contribs) (Created page with '=== Extracting training data for your classifier ===') |
Fpetkovski (talk | contribs) |
||
Line 1: | Line 1: | ||
+ | The general algorithm for performing corpus based preposition selection is as follows: |
||
+ | * Download a parallel corpus |
||
+ | * Extract patterns which contain prepositions from the source-language corpus |
||
+ | * Align the patterns to their translations in the target-language corpus |
||
+ | * Extract the features and label (the correct preposition from the target-language corpus) for classification. |
||
+ | * Train a model |
||
+ | * Use the trained model in the pipeline |
||
+ | |||
+ | The general toolkit for performing these tasks can be found here: |
||
+ | apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/ |
||
+ | |||
=== Extracting training data for your classifier === |
=== Extracting training data for your classifier === |
||
+ | For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool. |
Revision as of 16:40, 20 August 2012
The general algorithm for performing corpus based preposition selection is as follows:
- Download a parallel corpus
- Extract patterns which contain prepositions from the source-language corpus
- Align the patterns to their translations in the target-language corpus
- Extract the features and label (the correct preposition from the target-language corpus) for classification.
- Train a model
- Use the trained model in the pipeline
The general toolkit for performing these tasks can be found here: apertium.svn.sourceforge.net/svnroot/apertium/branches/gsoc2012/fpetkovski/morph-parser/
Extracting training data for your classifier
For the purpose of extracting training data for your classifier, you can use the preposition-extraction tool.