Difference between revisions of "Ideas for Google Summer of Code/Improvements in lexical-selection module"

Latest revision as of 18:06, 22 March 2013

Tasks[edit]

Script/program for finding possibly missing bidix entries from an aligned parallel corpus.
Do proper processing of tags in all scripts.
Remove unused and redundant scripts.
Work on a way to trim non-significant features from the maximum-entropy models.
Rewrite the LRXProcessor::processME and LRXProcessor::process methods so that they share more code and are more modularised. Having a 650 line method is not something I (Francis Tyers) am proud of ;__;
Make sure that capitalisation, any tag and any character work as expected.
more here

Coding challenge[edit]

Install Apertium and the constraint-based lexical selection module
Run through the Generating lexical-selection rules from a parallel corpus HOWTO for a language pair of your choice.

Frequently asked questions[edit]

none yet, ask us something! :)

@@ Line 1: / Line 1: @@
+{{TOCD}}
-Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it more efficient and faster.
+Implement a number of optimisations to the lexical selection module. The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it more efficient and faster, and easier to install and use.
 ==Tasks==
+* Script/program for finding possibly missing bidix entries from an aligned parallel corpus.
-* Make the module process word by word, instead of sentence by sentence.
+* Do proper processing of tags in all scripts.
-* Move away from using regular expressions as transitions, to using lemma/tag pairs.
+* Remove unused and redundant scripts.
+* Work on a way to trim non-significant features from the maximum-entropy models.
+* Rewrite the <code>LRXProcessor::processME</code> and <code>LRXProcessor::process</code> methods so that they share more code and are more modularised. Having a 650 line method is not something I ([[User:Francis Tyers|Francis Tyers]]) am proud of ;__;
+* Make sure that capitalisation, any tag and any character work as expected.
+* ''more here''
 ==Coding challenge==
 * Install Apertium and the [[constraint-based lexical selection module]]
-* Run through the [[Generating lexical-selection rules from a parallel corpus]] HOWTO.
+* Run through the [[Generating lexical-selection rules from a parallel corpus]] HOWTO for a language pair of your choice.
 ==Frequently asked questions==
+* none yet, ''[[contact|ask us]] something!'' :)
 ==See also==

Difference between revisions of "Ideas for Google Summer of Code/Improvements in lexical-selection module"

Latest revision as of 18:06, 22 March 2013

Contents

Tasks[edit]

Coding challenge[edit]

Frequently asked questions[edit]

See also[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools