Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Task ideas for Google Code-in (2012)

From Apertium
Jump to: navigation, search

This is the task ideas page for Google Code-In 2012 (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

For current GCI task ideas, see Task ideas for Google Code-in

The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:

  1. this does not include time taken to install / set up apertium.
  2. this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.

Categories:

  • code: Tasks related to writing or refactoring code
  • documentation: Tasks related to creating/editing documents and helping others learn more
  • research: Tasks related to community management, outreach/markеting, or studying problems and recommending solutions
  • quality: Tasks related to testing and ensuring code is of high quality.
  • interface: Tasks related to user experience research or user interface design and interaction

[edit] Task list

Category Title Description Mentors
code Write lexical selection rules Write 50 lexical selection rules for 10 words. For further information, you can consult the getting started guide. Francis Tyers
code Add entries to transfer lexicon Add 100 entries to a transfer lexicon of your choice. This will involve adding lexical transfer entries, which consist of a translation, and its corresponding grammatical features. Francis Tyers - Hrvoje Peradin - Unhammer - Firespeaker
code Write transfer rules Write 5 transfer rules for a new language pair. You can find basic documentation in the New language pair HOWTO and more in-depth (but incomplete) documentation in the long introduction to transfer rules. Jimregan - Hrvoje Peradin - Unhammer - Firespeaker
code Dictionary conversion Write a conversion module for an existing dictionary for apertium-dixtools. Jimregan
code Dictionary conversion in python Write a conversion module for an existing free bilingual dictionary to lttoolbox format using Python. Firespeaker
code Apertium support for Morfologik Add support to morfologik for reading Apertium-format dictionaries. Jimregan
code Write disambiguation rules for apertium-tur Write 3 disambiguation rules for apertium-tur. For further information, contact your mentor. zfe
code Write disambiguation rules for apertium-sh-sl Write 5 disambiguation rules for Slovene. For further information contact your mentor. Zfe - Hrvoje Peradin
research Write a contrastive grammar Using a grammar document 20 sample cases of grammatical differences between two languages Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker
research Disambiguating text You will be given ambiguous sentences in a language, totalling 500 words. Your job is to pick the correct morphological reading in context. First read the page "morphologically disambiguating text" and then ask your mentor for more information. Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
quality Profile apertium-lex-tools Take a large corpus, and run it through apertium-lex-tools to find out how long the program spends in each part of the code. You can use tools such as valgrind and gprof Francis Tyers
interface Design a nice javascript drop-down box Taking as inspiration the Google Translate drop-down box, design a similar drop-down for Apertium. Will require knowledge of JavaScript and possibly Jquery. Francis Tyers - Hrvoje Peradin - Firespeaker
interface Dictionary lookup Integrate the Javascript dictionary lookup tool into the translation interface (AWI), to offer alternative translations where available Jimregan
interface Google n-grams visualisation Design an interface to compare possible translations using Google N-Grams Jimregan
quality System quality control Read 500 words of machine translation output and report on translation errors Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker
quality Input correction Write 10 rules for LanguageTool for common errors that affect translation Jimregan
quality Post-correction rules Write 10 rules for LanguageTool to fix Apertium-generated errors Jimregan
quality New release check Compare released language pairs with their SVN version, to see which language pairs need a new release Jimregan
quality Testvoc Help prepare a language pair for a new release by fixing 20 dictionary entries with generation errors Jimregan - Hrvoje Peradin - Unhammer
documentation Check installation instructions Check that the installation instructions are up to date and work. Report any problems. Francis Tyers - Zfe - Unhammer
documentation Check new language pair howto Read through the new language pair howto, follow the steps, and check to see if it works. Francis Tyers - Zfe Hrvoje Peradin - Unhammer
documentation Check new language with lttoobox howto Read through the new language with lttoolbox howto, follow the steps, and check to see if it works. Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
documentation Check new language with HFST howto Read through the new language with HFST howto, follow the steps, and check to see if it works. Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
interface Design the new webpage of apertium.org Design, using HTML+CSS. The page should validate with the W3C validator. Francis Tyers - Zfe - Hrvoje Peradin
interface Interface the new website with the webservice Interface the new website with the webservice provided by Apertium. JavaScript knowledge required. Francis Tyers - Zfe - Hrvoje Peradin
research Categorise words For 100 words find the right inflection paradigm. You can learn more about inflectional paradigms on the page "Monodix basics". Francis Tyers - Zfe - Hrvoje Peradin - Unhammer
research Catalogue resources Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page Aromanian which was documented last year. Francis Tyers - Zfe - Unhammer
research Make a list of potential language pairs. Make a list of pairs of closely-related languages. For each pair of languages, collect information about Wikipedia size, number of editors, if there are existing MT systems or not. Contact your mentor for further information. Francis Tyers - Zfe
research Make a 50 sentences long translation memory Make a 50 sentence translation memory using text found on Wikipedia. This will involve finding articles which are translations of each other, and putting the equivalent sentences in an XML file. Francis Tyers - Zfe - Unhammer
documentation Document the mecab tag set see Japanese. Mecab is a morphological analysis and part-of-speech tagging module for Japanese. The tags are written in Japanese. We'd like to find translations for each of the tags, along with example word forms for each tag. This task requires knowledge of Japanese. Francis Tyers - Kanmuri
documentation Document the Turmorph tag set Document, with the use of samples sentences, the tag set used by turmorph Zfe
research Investigate CIA replacement options The CIA bot was an IRC bot which reported SVN commits to our IRC channel. Unfortunately the service has been offline for sometime. This task is to investigate other options for commit reporting to IRC which are compatible with having our SVN in SourceForge. Francis Tyers - Unhammer
code Write a morphological transducer Write a morphological analyser to analyse a paragraph of text. This will involve reading through the morphological analyser Howtos (e.g. HFST, lttoolbox), choosing a language you want to work with, and going through the process for that language. It is not expected to be complete, but should be to analyse a paragraph of text of your choosing. Firespeaker - Francis Tyers - Unhammer
research Design some localised stickers Design some Apertium stickers with the Apertium logo, localised to your country or region. Francis Tyers - Firespeaker
code Implement IBM model 1 alignment Implement IBM model 1 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. Francis Tyers
code Implement IBM model 2 alignment Implement IBM model 2 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. Francis Tyers
code Implement IBM model 3 alignment Implement IBM model 3 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. Francis Tyers
code Implement IBM model 4 alignment Implement IBM model 4 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. Francis Tyers
code Implement IBM model 5 alignment Implement IBM model 5 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. Francis Tyers
research Build a translation memory You will be given some free parallel text in two languages (e.g. the Bible, Parliament proceedings, etc.) and sentence align it. The difference between this task and the Wikipedia one is that in the Wikipedia one you will need to search for the documents, in this one, the documents will be provided. Francis Tyers - Zfe
code Design a testvoc script for biltrans Design a testvoc script which can deal with pairs which have ambiguous bilingual dictionaries. It should test each of the entries in turn. Francis Tyers
code Port paradigm chopper to python3/elementtree Take the paradigm-chopper.py from Speling tools and port it to use python 3 and ElementTree instead of python 2 and 4suite. User:Francis Tyers
code Port speling tools to python3 Port the speling tools (except paradigm chopper) to python 3. Francis Tyers
code Improve paradigm review Make paradigm review in speling tools sort paradigms by frequency of use. Francis Tyers
code Clean up and document yasmet www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html YASMET is a small toolkit for maximum entropy modelling -- only around 130 lines of code. The task is to deobfuscate and document it. The .cc file can be found in apertium SVN www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html here. Francis Tyers
code Extract a category from Wiktionary Screen scrape and extract inflectional information in speling format for a given category from Wiktionary. Francis Tyers
code Extract translations from Wiktionary Screen scrape and extract translations for a given category from Wiktionary. Francis Tyers
Personal tools