User:Fpetkovski/GSOC 2013 Application - Improving the lexical selection module

The lexical selection module in Apertium is currently a prototype. There are many optimisations that could be made to make it faster and more efficient. There are a number of scripts which can be used for learning lexical-selection rules, but the scripts are not particularly well written. Part of the task will be to rewrite the scripts taking into account all possible corner cases.

The project idea is located here: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Improvements_in_lexical-selection_module.

TODO list:

Merge the four different implementations of irstlm_ranker into a single implementation
Move lex-learner to lex-tools
Script/program for finding possibly missing bidix entries from an aligned parallel corpus.
Do proper processing of tags in all scripts.
Remove unused and redundant scripts.
Work on a way to trim non-significant features from the maximum-entropy models.
Rewrite the LRXProcessor::processME and LRXProcessor::process methods so that they share more code and are more modularised. Having a 650 line method is not the right thing.
Make sure that capitalisation, any tag and any character work as expected.
Ensure that all scripts process escaped characters correctly, e.g. ^ \ / $ < >

User:Fpetkovski/GSOC 2013 Application - Improving the lexical selection module

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools