Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Task ideas for Google Code-in

From Apertium
Revision as of 20:46, 26 October 2010 by Jimregan (Talk | contribs)

Jump to: navigation, search

This is the task ideas page for Google Code-in, here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. The time column gives the minimum estimated amount of time that should be spent on the task. It does not include time taken to install / set up apertium.

Task list

Area Difficulty Title Description Time
(hours)
People
code 2. Medium Cross a language pair: Occitan-French Using apertium-crossdics, build a dictionary for Occitan-French from Occitan-Catalan and Catalan-French, and clean up the result. 4—10 Francis Tyers
code 2. Medium Cross a language pair: Aragonese-Catalan Using apertium-crossdics, build a dictionary for Aragonese-Catalan from Aragonese-Spanish and Spanish-Catalan, and clean up the result. 4—10 Jimregan
code 1. Hard Convert existing resource Take an existing linguistic resource and adapt it to be used in Apertium. For example, take a morphological analyser for Punjabi in Functional Morphology and convert it to lttoolbox or take a Kurdish morphology in Alexina and convert it to HFST. 8—10 Francis Tyers, Jimregan
quality 3. Easy Quality evaluation Perform a human post-edition evaluation of one of our non-evaluated pairs. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 4,000 words. 4—8 Francis Tyers
research 3. Easy Create manually tagged corpora Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (see above), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. Jimregan
research 3. Easy Catalogue resources Pick an under-resourced language of your choice (e.g. Chechen, Guaraní, Aromanian, Chuvash, Swazi, ...) and catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for it along with the licences they are under. Francis Tyers, Jimregan
quality 1. Hard Improve a language pair Find some faults in an existing language pair and fix them. In particular minor→major pairs, e.g. Welsh-English, Basque-Spanish, Breton-French. Francis Tyers, Jimregan, Mikel L. Forcada
translation 2. Medium Translate the HOWTO Translate the new language pair HOWTO into another language, and go through it for a new pair of languages. When finished, upload to the Incubator. 2—3 Francis Tyers, Jimregan
documentation 2. Medium Document undocumented features Find a feature that can't be found in the existing documentation (e.g. cascaded interchunk transfer), and write about it. Mikel L. Forcada
outreach 3. Easy Encourage interest in Apertium among Wikipedians Some smaller Wikipedias could really benefit from having someone translate articles from bigger Wikipedias. And where the languages are close, using Apertium could make this more efficient (e.g. Bulgarian WP has 107,355 articles, Macedonian WP has 42,112, less than half as many. Candidates should check with the local Wikipedia community before proceeding. It would be good for smaller Wikipedias to know that Apertium exists, is free software, and can be useful. 1—4 Francis Tyers
training 3. Easy Simple step-by-step "become a developer" guide Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and start doing development or polishing tasks like the ones above. Mikel L. Forcada
code 2. Medium NSIS script Write an NSIS script to install the Cygwin version of Apertium on Windows. Jimregan
user interface 1. Hard Design a user-friendly interface for Apertium Apertium does not currently have a friendly user interface for translators. Look at other translation software on the market, and sketch out some ideas for how to design a user interface. This will not require programming, but could, for example involve using Glade to demonstrate the ideas. Jimregan
outreach 2. Medium Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'. Students around the world use Apertium (and other MT systems) to do their second-language homework. The documents would summarize the do's and don'ts, and could even elaborate on how students using Apertium for their homework could discover ways in which Apertium could be improved. Mikel L. Forcada
research 2. Medium Contrastive analysis for a language pair Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for a new or existing language pair. The tests should cover as many features of the languages in question as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. 4—6 Francis Tyers
quality 3.  Easy Thorough checkup of bn-en morphological analyser While the current bn-en morphological analyser has a pretty good coverage, it should have been higher. Part of the reason is that a lot of verbs have one/two slight different surface forms that differ from the regular ones and the analyser misses them. Using lt-expand it's possible to generate all forms of the verbs, then manually check these and using another script (already in the pair) rebuild the analyser file. This checking will require a native speaker/expert on Bengali language Abu Zaher
Personal tools