Difference between revisions of "Task ideas for Google Code-in"

From Apertium
Jump to navigation Jump to search
m
Line 20: Line 20:
 
! Category !! Title !! Description !! Mentors
 
! Category !! Title !! Description !! Mentors
 
|-
 
|-
  +
| {{sc|code}} || -|| - || -
| {{sc|code}} || Write lexical selection rules || Write 50 [[constraint-based lexical selection module|lexical selection rules]] for 10 words. For further information, you can consult the [[How to get started with lexical selection rules|getting started guide]]. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Add entries to transfer lexicon || Add 100 entries to a transfer lexicon of your choice. This will involve adding lexical transfer entries, which consist of a translation, and its corresponding grammatical features. || [[User:Francis Tyers|Francis Tyers]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]] - [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|code}} || Write transfer rules || Write 5 transfer rules for a new language pair. You can find basic documentation in the ''[[New language pair HOWTO]]'' and more in-depth (but incomplete) documentation in the [[A_long_introduction_to_transfer_rules|long introduction to transfer rules]]. || [[User:Jimregan|Jimregan]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]] - [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|code}} || Dictionary conversion || Write a conversion module for an existing dictionary for apertium-dixtools. || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|code}} || Dictionary conversion in python || Write a conversion module for an existing free bilingual dictionary to [[lttoolbox]] format using Python. || [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|code}} || Apertium support for Morfologik || Add support to [[morfologik]] for reading Apertium-format dictionaries. || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|code}} || Write disambiguation rules for apertium-tur || Write 3 disambiguation rules for [[apertium-tur]]. For further information, contact your mentor. || [[User:Zfe|zfe]]
 
|-
 
| {{sc|code}} || Write disambiguation rules for apertium-sh-sl || Write 5 disambiguation rules for Slovene. For further information contact your mentor. || [[User:Zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]]
 
|-
 
| {{sc|research}} || Write a contrastive grammar || Using a grammar document 20 sample cases of grammatical differences between two languages || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]] - [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|research}} || Disambiguating text || You will be given ambiguous sentences in a language, totalling 500 words. Your job is to pick the correct morphological reading in context. First read the page "[[morphologically disambiguating text]]" and then ask your mentor for more information. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|quality}} || Profile apertium-lex-tools || Take a large corpus, and run it through apertium-lex-tools to find out how long the program spends in each part of the code. You can use tools such as <code>valgrind</code> and <code>gprof</code> || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|interface}} || Design a nice javascript drop-down box || Taking as inspiration the Google Translate drop-down box, design a similar drop-down for Apertium. Will require knowledge of JavaScript and possibly Jquery. || [[User:Francis Tyers|Francis Tyers]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|interface}} || Dictionary lookup || Integrate the Javascript dictionary lookup tool into the translation interface (AWI), to offer alternative translations where available || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|interface}} || Google n-grams visualisation || Design an interface to compare possible translations using Google N-Grams || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|quality}} || System quality control || Read 500 words of machine translation output and report on translation errors || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]] - [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|quality}} || Input correction || Write 10 rules for [[LanguageTool]] for common errors that affect translation || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|quality}} || Post-correction rules || Write 10 rules for [[LanguageTool]] to fix Apertium-generated errors || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|quality}} || New release check || Compare released language pairs with their SVN version, to see which language pairs need a new release || [[User:Jimregan|Jimregan]]
 
|-
 
| {{sc|quality}} || Testvoc || Help prepare a language pair for a new release by fixing 20 dictionary entries with generation errors || [[User:Jimregan|Jimregan]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|documentation}} || Check installation instructions || Check that the installation instructions are up to date and work. Report any problems. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|documentation}} || Check new language pair howto || Read through the [[new language pair howto]], follow the steps, and check to see if it works. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|documentation}} || Check new language with lttoobox howto || Read through the [[Starting a new language with lttoolbox|new language with lttoolbox howto]], follow the steps, and check to see if it works. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|documentation}} || Check new language with HFST howto || Read through the [[Starting a new language with HFST|new language with HFST howto]], follow the steps, and check to see if it works. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|interface}} || Design the new webpage of apertium.org || Design, using HTML+CSS. The page should validate with the W3C validator. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]]
 
|-
 
| {{sc|interface}} || Interface the new website with the webservice || Interface the new website with the webservice provided by Apertium. JavaScript knowledge required. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]]
 
|-
 
| {{sc|research}} || Categorise words || For 100 words find the right inflection paradigm. You can learn more about inflectional paradigms on the page "[[Monodix basics]]". || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Krvoje|Hrvoje Peradin]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|research}} || Catalogue resources || Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page [[Aromanian]] which was documented last year. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|research}} || Make a list of potential language pairs. || Make a list of pairs of closely-related languages. For each pair of languages, collect information about Wikipedia size, number of editors, if there are existing MT systems or not. Contact your mentor for further information. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]]
 
|-
 
| {{sc|research}} || Make a 50 sentences long translation memory || Make a 50 sentence [[translation memory]] using text found on Wikipedia. This will involve finding articles which are translations of each other, and putting the equivalent sentences in an XML file. || [[User:Francis Tyers|Francis Tyers]] - [[User:zfe|Zfe]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|documentation}} || Document the mecab tag set || see [[Japanese]]. Mecab is a morphological analysis and part-of-speech tagging module for Japanese. The tags are written in Japanese. We'd like to find translations for each of the tags, along with example word forms for each tag. This task requires knowledge of Japanese. || [[User:Francis Tyers|Francis Tyers]] - [[User:Kanmuri|Kanmuri]]
 
|-
 
| {{sc|documentation}} || Document the [[Turmorph]] tag set || Document, with the use of samples sentences, the tag set used by turmorph || [[User:zfe|Zfe]]
 
|-
 
| {{sc|research}} || Investigate CIA replacement options || The CIA bot was an [[IRC]] bot which reported [[SVN]] commits to our IRC channel. Unfortunately the service has been offline for sometime. This task is to investigate other options for commit reporting to IRC which are compatible with having our SVN in SourceForge. || [[User:Francis Tyers|Francis Tyers]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|code}} || Write a morphological transducer || Write a morphological analyser to analyse a paragraph of text. This will involve reading through the morphological analyser Howtos (e.g. [[Starting a new language with HFST|HFST]], [[Starting a new language with lttoolbox|lttoolbox]]), choosing a language you want to work with, and going through the process for that language. It is not expected to be complete, but should be to analyse a paragraph of text of your choosing. || [[User:Firespeaker|Firespeaker]] - [[User:Francis Tyers|Francis Tyers]] - [[User:Unhammer|Unhammer]]
 
|-
 
| {{sc|research}} || Design some localised stickers || Design some Apertium stickers with the Apertium logo, localised to your country or region. || [[User:Francis Tyers|Francis Tyers]] - [[User:Firespeaker|Firespeaker]]
 
|-
 
| {{sc|code}} || Implement IBM model 1 alignment || Implement IBM model 1 alignment for a tagged parallel corpus in [[Apertium stream format]]. Use python or C++. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Implement IBM model 2 alignment || Implement IBM model 2 alignment for a tagged parallel corpus in [[Apertium stream format]]. Use python or C++. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Implement IBM model 3 alignment || Implement IBM model 3 alignment for a tagged parallel corpus in [[Apertium stream format]]. Use python or C++. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Implement IBM model 4 alignment || Implement IBM model 4 alignment for a tagged parallel corpus in [[Apertium stream format]]. Use python or C++. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Implement IBM model 5 alignment || Implement IBM model 5 alignment for a tagged parallel corpus in [[Apertium stream format]]. Use python or C++. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|research}} || Build a translation memory || You will be given some free parallel text in two languages (e.g. the Bible, Parliament proceedings, etc.) and sentence align it. The difference between this task and the Wikipedia one is that in the Wikipedia one you will need to search for the documents, in this one, the documents will be provided. || [[User:Francis Tyers|Francis Tyers]] - [[User:Zfe|Zfe]]
 
|-
 
| {{sc|code}} || Design a testvoc script for biltrans || Design a [[testvoc]] script which can deal with pairs which have ambiguous bilingual dictionaries. It should test each of the entries in turn. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Port paradigm chopper to python3/elementtree || Take the <code>paradigm-chopper.py</code> from [[Speling tools]] and port it to use python 3 and ElementTree instead of python 2 and 4suite. || [[User:Francis Tyers]]
 
|-
 
| {{sc|code}} || Port speling tools to python3 || Port the [[speling tools]] (except <code>paradigm chopper</code>) to python 3. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Improve paradigm review || Make paradigm review in [[speling tools]] sort paradigms by frequency of use. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Clean up and document yasmet || [http://www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html YASMET] is a small toolkit for maximum entropy modelling -- only around 130 lines of code. The task is to deobfuscate and document it. The <code>.cc</code> file can be found in apertium [[SVN]] [http://www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html here]. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Extract a category from Wiktionary || Screen scrape and extract inflectional information in [[speling format]] for a given category from Wiktionary. || [[User:Francis Tyers|Francis Tyers]]
 
|-
 
| {{sc|code}} || Extract translations from Wiktionary || Screen scrape and extract translations for a given category from Wiktionary. || [[User:Francis Tyers|Francis Tyers]]
 
 
|-
 
|-
 
|}
 
|}

Revision as of 23:00, 9 October 2013

This is the task ideas page for Google Code-in (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:

  1. this does not include time taken to install / set up apertium.
  2. this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.

Categories:

  • code: Tasks related to writing or refactoring code
  • documentation: Tasks related to creating/editing documents and helping others learn more
  • research: Tasks related to community management, outreach/marketing, or studying problems and recommending solutions
  • quality: Tasks related to testing and ensuring code is of high quality.
  • interface: Tasks related to user experience research or user interface design and interaction

Task list

Category Title Description Mentors
code - - -