Difference between revisions of "Ideas for Google Summer of Code/Improvements in lexical-selection module"
Jump to navigation
Jump to search
(Created page with 'Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be…') |
(→Tasks) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | {{TOCD}} |
||
− | Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it more efficient and faster |
+ | Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it more efficient and faster, and easier to install and use. |
==Tasks== |
==Tasks== |
||
+ | * Script/program for finding possibly missing bidix entries from an aligned parallel corpus. |
||
− | * Make the module process word by word, instead of sentence by sentence. |
||
+ | * Do proper processing of tags in all scripts. |
||
− | * Move away from using regular expressions as transitions, to using lemma/tag pairs. |
||
+ | * Remove unused and redundant scripts. |
||
+ | * Work on a way to trim non-significant features from the maximum-entropy models. |
||
+ | * Rewrite the <code>LRXProcessor::processME</code> and <code>LRXProcessor::process</code> methods so that they share more code and are more modularised. Having a 650 line method is not something I ([[User:Francis Tyers|Francis Tyers]]) am proud of ;__; |
||
+ | * Make sure that capitalisation, any tag and any character work as expected. |
||
+ | * ''more here'' |
||
==Coding challenge== |
==Coding challenge== |
||
* Install Apertium and the [[constraint-based lexical selection module]] |
* Install Apertium and the [[constraint-based lexical selection module]] |
||
− | * Run through the [[Generating lexical-selection rules from a parallel corpus]] HOWTO. |
+ | * Run through the [[Generating lexical-selection rules from a parallel corpus]] HOWTO for a language pair of your choice. |
==Frequently asked questions== |
==Frequently asked questions== |
||
+ | |||
+ | * none yet, ''[[contact|ask us]] something!'' :) |
||
==See also== |
==See also== |
Latest revision as of 18:06, 22 March 2013
Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it more efficient and faster, and easier to install and use.
Tasks[edit]
- Script/program for finding possibly missing bidix entries from an aligned parallel corpus.
- Do proper processing of tags in all scripts.
- Remove unused and redundant scripts.
- Work on a way to trim non-significant features from the maximum-entropy models.
- Rewrite the
LRXProcessor::processME
andLRXProcessor::process
methods so that they share more code and are more modularised. Having a 650 line method is not something I (Francis Tyers) am proud of ;__; - Make sure that capitalisation, any tag and any character work as expected.
- more here
Coding challenge[edit]
- Install Apertium and the constraint-based lexical selection module
- Run through the Generating lexical-selection rules from a parallel corpus HOWTO for a language pair of your choice.
Frequently asked questions[edit]
- none yet, ask us something! :)