Task ideas for Google Code-in

From Apertium
Jump to navigation Jump to search

This is the task ideas page for Google Code-in (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task my an experienced developer, however:

  1. this does not include time taken to install / set up apertium.
  2. this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.

Categories:

  • code: Tasks related to writing or refactoring code
  • documentation: Tasks related to creating/editing documents and helping others learn more
  • research: Tasks related to community management, outreach/marketing, or studying problems and recommending solutions
  • quality: Tasks related to testing and ensuring code is of high quality.
  • interface: Tasks related to user experience research or user interface design and interaction

Task list

Category Title Description Mentors
code Write lexical selection rules Write 50 lexical selection rules for 10 words. For further information, you can consult the getting started guide. Francis Tyers
code Add entries to transfer lexicon Add 100 entries to a transfer lexicon of your choice. This will involve adding lexical transfer entries, which consist of a translation, and its corresponding grammatical features. Francis Tyers - Hrvoje Peradin - Unhammer - Firespeaker
code Write transfer rules Write 5 transfer rules for a new language pair. You can find basic documentation in the New language pair HOWTO and more in-depth (but incomplete) documentation in the long introduction to transfer rules. Jimregan - Hrvoje Peradin - Unhammer - Firespeaker
code Dictionary conversion Write a conversion module for an existing dictionary for apertium-dixtools. Jimregan
code Dictionary conversion in python Write a conversion module for an existing free bilingual dictionary using python Firespeaker
code Apertium support for Morfologik Add support to morfologik for reading Apertium-format dictionaries. Jimregan
code Write disambiguation rules for apertium-tur Write 3 disambiguation rules for apertium-tur. For further information, contact your mentor. zfe
code Write disambiguation rules for apertium-sh-sl Write 5 disambiguation rules for Slovene. For further information contact your mentor. Zfe - Hrvoje Peradin
research Write a contrastive grammar Using a grammar document 20 sample cases of grammatical differences between two languages Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker
research Disambiguating text You will be given ambiguous sentences in a language, totalling 500 words. Your job is to pick the correct morphological reading in context. First read the page "morphologically disambiguating text" and then ask your mentor for more information. Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
quality Profile apertium-lex-tools Take a large corpus, and run it through apertium-lex-tools to find out how long the program spends in each part of the code. You can use tools such as valgrind and gprof Francis Tyers
interface Design a nice javascript drop-down box Taking as inspiration the Google Translate drop-down box, design a similar drop-down for Apertium. Will require knowledge of JavaScript and possibly Jquery. Francis Tyers - Hrvoje Peradin - Firespeaker
interface Dictionary lookup Integrate the Javascript dictionary lookup tool into the translation interface (AWI), to offer alternative translations where available Jimregan
interface Google n-grams visualisation Design an interface to compare possible translations using Google N-Grams Jimregan
quality System quality control Read 500 words of machine translation output and report on translation errors Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker
quality Input correction Write 10 rules for LanguageTool for common errors that affect translation Jimregan
quality Post-correction rules Write 10 rules for LanguageTool to fix Apertium-generated errors Jimregan
quality New release check Compare released language pairs with their SVN version, to see which language pairs need a new release Jimregan
quality Testvoc Help prepare a language pair for a new release by fixing 20 dictionary entries with generation errors Jimregan - Hrvoje Peradin - Unhammer
documentation Check installation instructions Check that the installation instructions are up to date and work. Report any problems. Francis Tyers - Zfe - Unhammer
documentation Check new language pair howto Read through the new language pair howto, follow the steps, and check to see if it works. Francis Tyers - Zfe Hrvoje Peradin - Unhammer
documentation Check new language with lttoobox howto Read through the new language with lttoolbox howto, follow the steps, and check to see if it works. Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
documentation Check new language with HFST howto Read through the new language with HFST howto, follow the steps, and check to see if it works. Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
interface Design the new webpage of apertium.org Design, using HTML+CSS. The page should validate with the W3C validator. Francis Tyers - Zfe - Hrvoje Peradin
interface Interface the new website with the webservice Interface the new website with the webservice provided by Apertium. JavaScript knowledge required. Francis Tyers - Zfe - Hrvoje Peradin
research Categorise words For 100 words find the right inflection paradigm. You can learn more about inflectional paradigms on the page "Monodix basics". Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
research Catalogue resources Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page Aromanian which was documented last year. Francis Tyers - Zfe - Unhammer
research Make a list of potential language pairs. Make a list of pairs of closely-related languages. For each pair of languages, collect information about Wikipedia size, number of editors, if there are existing MT systems or not. Contact your mentor for further information. Francis Tyers - Zfe
research Make a 50 sentences long translation memory Make a 50 sentence translation memory using text found on Wikipedia. This will involve finding articles which are translations of each other, and putting the equivalent sentences in an XML file. Francis Tyers - Zfe - Unhammer
documentation Document the mecab tag set see Japanese. Mecab is a morphological analysis and part-of-speech tagging module for Japanese. The tags are written in Japanese. We'd like to find translations for each of the tags, along with example word forms for each tag. This task requires knowledge of Japanese. Francis Tyers - Kanmuri
documentation Document the Turmorph tag set Document, with the use of samples sentences, the tag set used by turmorph Zfe
research Investigate CIA replacement options The CIA bot was an IRC bot which reported SVN commits to our IRC channel. Unfortunately the service has been offline for sometime. This task is to investigate other options for commit reporting to IRC which are compatible with having our SVN in SourceForge. Francis Tyers - Unhammer
code Write a morphological transducer Write a morphological analyser to analyse a paragraph of text. This will involve reading through the morphological analyser Howtos (e.g. HFST, lttoolbox), choosing a language you want to work with, and going through the process for that language. It is not expected to be complete, but should be to analyse a paragraph of text of your choosing. Firespeaker - Francis Tyers - Unhammer