Task ideas for Google Code-in (2012)

This is the task ideas page for Google Code-In 2012 (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

For current GCI task ideas, see Task ideas for Google Code-in

The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:

this does not include time taken to install / set up apertium.
this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.

Categories:

code: Tasks related to writing or refactoring code
documentation: Tasks related to creating/editing documents and helping others learn more
research: Tasks related to community management, outreach/markеting, or studying problems and recommending solutions
quality: Tasks related to testing and ensuring code is of high quality.
interface: Tasks related to user experience research or user interface design and interaction

Task list[edit]

Category	Title	Description	Mentors
code	Write lexical selection rules	Write 50 lexical selection rules for 10 words. For further information, you can consult the getting started guide.	Francis Tyers
code	Add entries to transfer lexicon	Add 100 entries to a transfer lexicon of your choice. This will involve adding lexical transfer entries, which consist of a translation, and its corresponding grammatical features.	Francis Tyers - Hrvoje Peradin - Unhammer - Firespeaker
code	Write transfer rules	Write 5 transfer rules for a new language pair. You can find basic documentation in the New language pair HOWTO and more in-depth (but incomplete) documentation in the long introduction to transfer rules.	Jimregan - Hrvoje Peradin - Unhammer - Firespeaker
code	Dictionary conversion	Write a conversion module for an existing dictionary for apertium-dixtools.	Jimregan
code	Dictionary conversion in python	Write a conversion module for an existing free bilingual dictionary to lttoolbox format using Python.	Firespeaker
code	Apertium support for Morfologik	Add support to morfologik for reading Apertium-format dictionaries.	Jimregan
code	Write disambiguation rules for apertium-tur	Write 3 disambiguation rules for apertium-tur. For further information, contact your mentor.	zfe
code	Write disambiguation rules for apertium-sh-sl	Write 5 disambiguation rules for Slovene. For further information contact your mentor.	Zfe - Hrvoje Peradin
research	Write a contrastive grammar	Using a grammar document 20 sample cases of grammatical differences between two languages	Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker
research	Disambiguating text	You will be given ambiguous sentences in a language, totalling 500 words. Your job is to pick the correct morphological reading in context. First read the page "morphologically disambiguating text" and then ask your mentor for more information.	Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
quality	Profile apertium-lex-tools	Take a large corpus, and run it through apertium-lex-tools to find out how long the program spends in each part of the code. You can use tools such as `valgrind` and `gprof`	Francis Tyers
interface	Design a nice javascript drop-down box	Taking as inspiration the Google Translate drop-down box, design a similar drop-down for Apertium. Will require knowledge of JavaScript and possibly Jquery.	Francis Tyers - Hrvoje Peradin - Firespeaker
interface	Dictionary lookup	Integrate the Javascript dictionary lookup tool into the translation interface (AWI), to offer alternative translations where available	Jimregan
interface	Google n-grams visualisation	Design an interface to compare possible translations using Google N-Grams	Jimregan
quality	System quality control	Read 500 words of machine translation output and report on translation errors	Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker
quality	Input correction	Write 10 rules for LanguageTool for common errors that affect translation	Jimregan
quality	Post-correction rules	Write 10 rules for LanguageTool to fix Apertium-generated errors	Jimregan
quality	New release check	Compare released language pairs with their SVN version, to see which language pairs need a new release	Jimregan
quality	Testvoc	Help prepare a language pair for a new release by fixing 20 dictionary entries with generation errors	Jimregan - Hrvoje Peradin - Unhammer
documentation	Check installation instructions	Check that the installation instructions are up to date and work. Report any problems.	Francis Tyers - Zfe - Unhammer
documentation	Check new language pair howto	Read through the new language pair howto, follow the steps, and check to see if it works.	Francis Tyers - Zfe Hrvoje Peradin - Unhammer
documentation	Check new language with lttoobox howto	Read through the new language with lttoolbox howto, follow the steps, and check to see if it works.	Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
documentation	Check new language with HFST howto	Read through the new language with HFST howto, follow the steps, and check to see if it works.	Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
interface	Design the new webpage of apertium.org	Design, using HTML+CSS. The page should validate with the W3C validator.	Francis Tyers - Zfe - Hrvoje Peradin
interface	Interface the new website with the webservice	Interface the new website with the webservice provided by Apertium. JavaScript knowledge required.	Francis Tyers - Zfe - Hrvoje Peradin
research	Categorise words	For 100 words find the right inflection paradigm. You can learn more about inflectional paradigms on the page "Monodix basics".	Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
research	Catalogue resources	Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page Aromanian which was documented last year.	Francis Tyers - Zfe - Unhammer
research	Make a list of potential language pairs.	Make a list of pairs of closely-related languages. For each pair of languages, collect information about Wikipedia size, number of editors, if there are existing MT systems or not. Contact your mentor for further information.	Francis Tyers - Zfe
research	Make a 50 sentences long translation memory	Make a 50 sentence translation memory using text found on Wikipedia. This will involve finding articles which are translations of each other, and putting the equivalent sentences in an XML file.	Francis Tyers - Zfe - Unhammer
documentation	Document the mecab tag set	see Japanese. Mecab is a morphological analysis and part-of-speech tagging module for Japanese. The tags are written in Japanese. We'd like to find translations for each of the tags, along with example word forms for each tag. This task requires knowledge of Japanese.	Francis Tyers - Kanmuri
documentation	Document the Turmorph tag set	Document, with the use of samples sentences, the tag set used by turmorph	Zfe
research	Investigate CIA replacement options	The CIA bot was an IRC bot which reported SVN commits to our IRC channel. Unfortunately the service has been offline for sometime. This task is to investigate other options for commit reporting to IRC which are compatible with having our SVN in SourceForge.	Francis Tyers - Unhammer
code	Write a morphological transducer	Write a morphological analyser to analyse a paragraph of text. This will involve reading through the morphological analyser Howtos (e.g. HFST, lttoolbox), choosing a language you want to work with, and going through the process for that language. It is not expected to be complete, but should be to analyse a paragraph of text of your choosing.	Firespeaker - Francis Tyers - Unhammer
research	Design some localised stickers	Design some Apertium stickers with the Apertium logo, localised to your country or region.	Francis Tyers - Firespeaker
code	Implement IBM model 1 alignment	Implement IBM model 1 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++.	Francis Tyers
code	Implement IBM model 2 alignment	Implement IBM model 2 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++.	Francis Tyers
code	Implement IBM model 3 alignment	Implement IBM model 3 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++.	Francis Tyers
code	Implement IBM model 4 alignment	Implement IBM model 4 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++.	Francis Tyers
code	Implement IBM model 5 alignment	Implement IBM model 5 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++.	Francis Tyers
research	Build a translation memory	You will be given some free parallel text in two languages (e.g. the Bible, Parliament proceedings, etc.) and sentence align it. The difference between this task and the Wikipedia one is that in the Wikipedia one you will need to search for the documents, in this one, the documents will be provided.	Francis Tyers - Zfe
code	Design a testvoc script for biltrans	Design a testvoc script which can deal with pairs which have ambiguous bilingual dictionaries. It should test each of the entries in turn.	Francis Tyers
code	Port paradigm chopper to python3/elementtree	Take the `paradigm-chopper.py` from Speling tools and port it to use python 3 and ElementTree instead of python 2 and 4suite.	User:Francis Tyers
code	Port speling tools to python3	Port the speling tools (except `paradigm chopper`) to python 3.	Francis Tyers
code	Improve paradigm review	Make paradigm review in speling tools sort paradigms by frequency of use.	Francis Tyers
code	Clean up and document yasmet	www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html YASMET is a small toolkit for maximum entropy modelling -- only around 130 lines of code. The task is to deobfuscate and document it. The `.cc` file can be found in apertium SVN www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html here.	Francis Tyers
code	Extract a category from Wiktionary	Screen scrape and extract inflectional information in speling format for a given category from Wiktionary.	Francis Tyers
code	Extract translations from Wiktionary	Screen scrape and extract translations for a given category from Wiktionary.	Francis Tyers

Task ideas for Google Code-in (2012)

Task list[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools