Task ideas for Google Code-in (2012)
Jump to navigation
Jump to search
This is the task ideas page for Google Code-In 2012 (http://www.google-melange.com/gci/homepage/google/gci2012), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.
For current GCI task ideas, see Task ideas for Google Code-in
The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:
- this does not include time taken to install / set up apertium.
- this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.
Categories:
- code: Tasks related to writing or refactoring code
- documentation: Tasks related to creating/editing documents and helping others learn more
- research: Tasks related to community management, outreach/markеting, or studying problems and recommending solutions
- quality: Tasks related to testing and ensuring code is of high quality.
- interface: Tasks related to user experience research or user interface design and interaction
Task list[edit]
Category | Title | Description | Mentors |
---|---|---|---|
code | Write lexical selection rules | Write 50 lexical selection rules for 10 words. For further information, you can consult the getting started guide. | Francis Tyers |
code | Add entries to transfer lexicon | Add 100 entries to a transfer lexicon of your choice. This will involve adding lexical transfer entries, which consist of a translation, and its corresponding grammatical features. | Francis Tyers - Hrvoje Peradin - Unhammer - Firespeaker |
code | Write transfer rules | Write 5 transfer rules for a new language pair. You can find basic documentation in the New language pair HOWTO and more in-depth (but incomplete) documentation in the long introduction to transfer rules. | Jimregan - Hrvoje Peradin - Unhammer - Firespeaker |
code | Dictionary conversion | Write a conversion module for an existing dictionary for apertium-dixtools. | Jimregan |
code | Dictionary conversion in python | Write a conversion module for an existing free bilingual dictionary to lttoolbox format using Python. | Firespeaker |
code | Apertium support for Morfologik | Add support to morfologik for reading Apertium-format dictionaries. | Jimregan |
code | Write disambiguation rules for apertium-tur | Write 3 disambiguation rules for apertium-tur. For further information, contact your mentor. | zfe |
code | Write disambiguation rules for apertium-sh-sl | Write 5 disambiguation rules for Slovene. For further information contact your mentor. | Zfe - Hrvoje Peradin |
research | Write a contrastive grammar | Using a grammar document 20 sample cases of grammatical differences between two languages | Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker |
research | Disambiguating text | You will be given ambiguous sentences in a language, totalling 500 words. Your job is to pick the correct morphological reading in context. First read the page "morphologically disambiguating text" and then ask your mentor for more information. | Francis Tyers - Zfe - Hrvoje Peradin - Unhammer |
quality | Profile apertium-lex-tools | Take a large corpus, and run it through apertium-lex-tools to find out how long the program spends in each part of the code. You can use tools such as valgrind and gprof |
Francis Tyers |
interface | Design a nice javascript drop-down box | Taking as inspiration the Google Translate drop-down box, design a similar drop-down for Apertium. Will require knowledge of JavaScript and possibly Jquery. | Francis Tyers - Hrvoje Peradin - Firespeaker |
interface | Dictionary lookup | Integrate the Javascript dictionary lookup tool into the translation interface (AWI), to offer alternative translations where available | Jimregan |
interface | Google n-grams visualisation | Design an interface to compare possible translations using Google N-Grams | Jimregan |
quality | System quality control | Read 500 words of machine translation output and report on translation errors | Francis Tyers - Zfe - Hrvoje Peradin - Unhammer - Firespeaker |
quality | Input correction | Write 10 rules for LanguageTool for common errors that affect translation | Jimregan |
quality | Post-correction rules | Write 10 rules for LanguageTool to fix Apertium-generated errors | Jimregan |
quality | New release check | Compare released language pairs with their SVN version, to see which language pairs need a new release | Jimregan |
quality | Testvoc | Help prepare a language pair for a new release by fixing 20 dictionary entries with generation errors | Jimregan - Hrvoje Peradin - Unhammer |
documentation | Check installation instructions | Check that the installation instructions are up to date and work. Report any problems. | Francis Tyers - Zfe - Unhammer |
documentation | Check new language pair howto | Read through the new language pair howto, follow the steps, and check to see if it works. | Francis Tyers - Zfe Hrvoje Peradin - Unhammer |
documentation | Check new language with lttoobox howto | Read through the new language with lttoolbox howto, follow the steps, and check to see if it works. | Francis Tyers - Zfe - Hrvoje Peradin - Unhammer |
documentation | Check new language with HFST howto | Read through the new language with HFST howto, follow the steps, and check to see if it works. | Francis Tyers - Zfe - Hrvoje Peradin - Unhammer |
interface | Design the new webpage of apertium.org | Design, using HTML+CSS. The page should validate with the W3C validator. | Francis Tyers - Zfe - Hrvoje Peradin |
interface | Interface the new website with the webservice | Interface the new website with the webservice provided by Apertium. JavaScript knowledge required. | Francis Tyers - Zfe - Hrvoje Peradin |
research | Categorise words | For 100 words find the right inflection paradigm. You can learn more about inflectional paradigms on the page "Monodix basics". | Francis Tyers - Zfe - Hrvoje Peradin - Unhammer |
research | Catalogue resources | Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.), along with the licences they are under. See for example the page Aromanian which was documented last year. | Francis Tyers - Zfe - Unhammer |
research | Make a list of potential language pairs. | Make a list of pairs of closely-related languages. For each pair of languages, collect information about Wikipedia size, number of editors, if there are existing MT systems or not. Contact your mentor for further information. | Francis Tyers - Zfe |
research | Make a 50 sentences long translation memory | Make a 50 sentence translation memory using text found on Wikipedia. This will involve finding articles which are translations of each other, and putting the equivalent sentences in an XML file. | Francis Tyers - Zfe - Unhammer |
documentation | Document the mecab tag set | see Japanese. Mecab is a morphological analysis and part-of-speech tagging module for Japanese. The tags are written in Japanese. We'd like to find translations for each of the tags, along with example word forms for each tag. This task requires knowledge of Japanese. | Francis Tyers - Kanmuri |
documentation | Document the Turmorph tag set | Document, with the use of samples sentences, the tag set used by turmorph | Zfe |
research | Investigate CIA replacement options | The CIA bot was an IRC bot which reported SVN commits to our IRC channel. Unfortunately the service has been offline for sometime. This task is to investigate other options for commit reporting to IRC which are compatible with having our SVN in SourceForge. | Francis Tyers - Unhammer |
code | Write a morphological transducer | Write a morphological analyser to analyse a paragraph of text. This will involve reading through the morphological analyser Howtos (e.g. HFST, lttoolbox), choosing a language you want to work with, and going through the process for that language. It is not expected to be complete, but should be to analyse a paragraph of text of your choosing. | Firespeaker - Francis Tyers - Unhammer |
research | Design some localised stickers | Design some Apertium stickers with the Apertium logo, localised to your country or region. | Francis Tyers - Firespeaker |
code | Implement IBM model 1 alignment | Implement IBM model 1 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. | Francis Tyers |
code | Implement IBM model 2 alignment | Implement IBM model 2 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. | Francis Tyers |
code | Implement IBM model 3 alignment | Implement IBM model 3 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. | Francis Tyers |
code | Implement IBM model 4 alignment | Implement IBM model 4 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. | Francis Tyers |
code | Implement IBM model 5 alignment | Implement IBM model 5 alignment for a tagged parallel corpus in Apertium stream format. Use python or C++. | Francis Tyers |
research | Build a translation memory | You will be given some free parallel text in two languages (e.g. the Bible, Parliament proceedings, etc.) and sentence align it. The difference between this task and the Wikipedia one is that in the Wikipedia one you will need to search for the documents, in this one, the documents will be provided. | Francis Tyers - Zfe |
code | Design a testvoc script for biltrans | Design a testvoc script which can deal with pairs which have ambiguous bilingual dictionaries. It should test each of the entries in turn. | Francis Tyers |
code | Port paradigm chopper to python3/elementtree | Take the paradigm-chopper.py from Speling tools and port it to use python 3 and ElementTree instead of python 2 and 4suite. |
User:Francis Tyers |
code | Port speling tools to python3 | Port the speling tools (except paradigm chopper ) to python 3. |
Francis Tyers |
code | Improve paradigm review | Make paradigm review in speling tools sort paradigms by frequency of use. | Francis Tyers |
code | Clean up and document yasmet | www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html YASMET is a small toolkit for maximum entropy modelling -- only around 130 lines of code. The task is to deobfuscate and document it. The .cc file can be found in apertium SVN www-i6.informatik.rwth-aachen.de/web/Software/YASMET.html here. |
Francis Tyers |
code | Extract a category from Wiktionary | Screen scrape and extract inflectional information in speling format for a given category from Wiktionary. | Francis Tyers |
code | Extract translations from Wiktionary | Screen scrape and extract translations for a given category from Wiktionary. | Francis Tyers |