Difference between revisions of "Ideas for Google Summer of Code/Improvements in lexical-selection module"
Jump to navigation
Jump to search
(→Tasks) |
|||
Line 9: | Line 9: | ||
* Work on a way to trim non-significant features from the maximum-entropy models. |
* Work on a way to trim non-significant features from the maximum-entropy models. |
||
* Rewrite the <code>LRXProcessor::processME</code> and <code>LRXProcessor::process</code> methods so that they share more code and are more modularised. Having a 650 line method is not something I'm proud of ;__; |
* Rewrite the <code>LRXProcessor::processME</code> and <code>LRXProcessor::process</code> methods so that they share more code and are more modularised. Having a 650 line method is not something I'm proud of ;__; |
||
+ | * Make sure that capitalisation, any tag and any character work as expected. |
||
* ''more here'' |
* ''more here'' |
||
Revision as of 21:09, 18 March 2013
Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it more efficient and faster, and easier to install and use.
Tasks
- Script/program for finding possibly missing bidix entries from an aligned parallel corpus.
- Do proper processing of tags in all scripts.
- Remove unused and redundant scripts.
- Work on a way to trim non-significant features from the maximum-entropy models.
- Rewrite the
LRXProcessor::processME
andLRXProcessor::process
methods so that they share more code and are more modularised. Having a 650 line method is not something I'm proud of ;__; - Make sure that capitalisation, any tag and any character work as expected.
- more here
Coding challenge
- Install Apertium and the constraint-based lexical selection module
- Run through the Generating lexical-selection rules from a parallel corpus HOWTO for a language pair of your choice.