User:Mlforcada/sandbox/GSoC

Task	Difficulty	Description	Rationale	Requirements	Interested mentors
Easy dictionary maintenance	2. Hard	Write code that simplifies the maintenance of the single-word part of Apertium monolingual and bilingual dictionaries. This involves building an application that parses and reads the open-class (noun, adjective, verb) single-word part of the dictionary amenable to simple, data-base-like treatment, saving the remaining (hard to treat) part of the dictionaries, allows the user to easily add words (together with their inflection paradigms) through a friendly user interface and then combines the extended single-word data with the remaining data into Apertium monolingual and bilingual dictionaries ready to be compiled. Ideas and code from Apertium-dixtools could be useful.	Apertium dictionaries are very heterogeneous, but a great part of the development of a language pair consists in adding single words to monolingual and bilingual dictionaries, and, indeed, work on this part of the dictionaries is crucial for coverage and usefulness. Currently, dictionary maintenance is difficult because it involves editing an XML file. This may be slowing down the development of many language pairs.	Knowledge of XML, XSLT and one programming language that allows XML processing and easy writing of a user interface	Mikel L. Forcada
Hybrid MT	2. Hard	Building Apertium-Marclator rule-based/corpus-based hybrids	Both the rule-based machine translation system Apertium and the corpus-based machine translation system Marclator do some kind of chunking of the input as well as use a relatively straightforward left-to-right machine translation strategy. This has been explored before but there are other ways to organize hybridization which should be explored (the mentor is haopy to discuss). Hybridization may make it easier to adapt Apertium to a particular corpus by using chunk pairs derived from it.	Knowledge of Java, C++, and scripting languages, and appreciation for research-like coding projects	Mlforcada

User:Mlforcada/sandbox/GSoC

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools