Task ideas for Google Code-in

This is the task ideas page for Google Code-in (http://code.google.com/gci), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. The time column gives the minimum estimated amount of time that should be spent on the task. It does not include time taken to install / set up apertium.

Task list

Area	Difficulty	Title	Description	Time (hours)	People
code	2. Medium	Cross a language pair: Occitan-French	Using apertium-crossdics, build a dictionary for Occitan-French from Occitan-Catalan and Catalan-French, and clean up the result.	8—12	Mireia Ginestí
code	2. Medium	Cross a language pair: Aragonese-Catalan	Using apertium-crossdics, build a dictionary for Aragonese-Catalan from Aragonese-Spanish and Spanish-Catalan, and clean up the result.	8—12	Jimregan
code	2. Medium	Cross a language pair: Romanian-French	Using apertium-crossdics, build a dictionary for Romanian-French from Romanian-Spanish and Spanish-French, and clean up the result.	8—12	Gramirez
code	2. Medium	Cross a language pair: Romanian-Italian	Using apertium-crossdics, build a dictionary for Romanian-Italian from Romanian-Spanish and Spanish-Italian, and clean up the result.	8—12	Gramirez
code	1. Hard	Convert existing resource: Urdu morphological analyser	Take Muhammad Humayoun's Urdu Morphology and convert to lttoolbox format.	8—10	Francis Tyers
code	1. Hard	Convert existing resource: Punjabi morphological analyser	Take Muhammad Humayoun's Punjabi Morphology and convert to lttoolbox format.	8—10	Francis Tyers
code	1. Hard	Convert existing resource: Kurdish morphological analyser	Take the Alexina Kurdish Morphology and convert to lttoolbox format.	8—10	Francis Tyers
code	3. Easy	Convert existing resource: Reta Vortaro Belarusian-Esperanto	Take the Belarusian-Esperanto lexicon and convert to lttoolbox format.	2—4	Jimregan
code	3. Easy	Convert existing resource: Reta Vortaro Breton-Esperanto	Take the Breton-Esperanto lexicon and convert to lttoolbox format.	2—4	Jacob_Nordfalk
code	3. Easy	Convert existing resource: Reta Vortaro Bulgarian-Esperanto	Take the Bulgarian-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro Czech-Esperanto	Take the Czech-Esperanto lexicon and convert to lttoolbox format.	2—4	Jimregan
code	3. Easy	Convert existing resource: Reta Vortaro Finnish-Esperanto	Take the Finnish-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro German-Esperanto	Take the German-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro Greek-Esperanto	Take the Greek-Esperanto lexicon and convert to lttoolbox format.	2—4	Jacob_Nordfalk
code	3. Easy	Convert existing resource: Reta Vortaro Hebrew-Esperanto	Take the Hebrew-Esperanto lexicon and convert to lttoolbox format.	2—4	Jacob_Nordfalk
code	3. Easy	Convert existing resource: Reta Vortaro Hungarian-Esperanto	Take the Hungarian-Esperanto lexicon and convert to lttoolbox format.	2—4	Jacob_Nordfalk
code	3. Easy	Convert existing resource: Reta Vortaro Italian-Esperanto	Take the Italian-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro Dutch-Esperanto	Take the Dutch-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro Persian-Esperanto	Take the Persian-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro Polish-Esperanto	Take the Polish-Esperanto lexicon and convert to lttoolbox format.	2—4	Jimregan
code	3. Easy	Convert existing resource: Reta Vortaro Portuguese-Esperanto	Take the Portuguese-Esperanto lexicon and convert to lttoolbox format.	2—4	Hectoralos
code	3. Easy	Convert existing resource: Reta Vortaro Russian-Esperanto	Take the Russian-Esperanto lexicon and convert to lttoolbox format.	2—4	Jimregan
code	3. Easy	Convert existing resource: Reta Vortaro Slovakian-Esperanto	Take the Slovakian-Esperanto lexicon and convert to lttoolbox format.	2—4	Jimregan
code	3. Easy	Convert existing resource: Reta Vortaro Swedish-Esperanto	Take the Swedish-Esperanto lexicon and convert to lttoolbox format.	2—4	Jacob_Nordfalk
code	3. Easy	Convert existing resource: Reta Vortaro Turkish-Esperanto	Take the Turkish-Esperanto lexicon and convert to lttoolbox format.	2—4	Jimregan
code	2. Medium	Convert Apertium resources: nn-nb for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Nynorsk-Bokmal dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: es-ca for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Spanish-Catalan dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: is-en for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Icelandic-English dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: es-ast for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Asturian-Spanish dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: oc-ca for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Occitan-Catalan dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: mk-bg for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Macedonian-Bulgarian dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: mk-en for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Macedonian-English dictionary.	2—4	Piotr Bański
code	2. Medium	Convert Apertium resources: nn-nb for Freedict	Apertium's lexicons would make an excellent start for bilingual dictionaries. FreeDict currently has no Nynorsk-Bokmal dictionary.	2—4	Piotr Bański
code	3. Easy	Convert existing resource: English-Slovakian dictionary	Take MSAS/MASS and convert to lttoolbox format.	1—4	Zdenko Podobný
code	2. Medium	Convert existing resource: Slovakian morphological analyser	Take the morphological analyser distributed with LanguageTool and convert to lttoolbox format.	1—4	Zdenko Podobný
code	2. Medium	Convert existing resource: Polish-Slovakian transfer rules	Much of the existing rules in Apertium's pl-cs system originated in pl-sk. Take the new rules in pl-cs and apply them to pl-sk. No knowledge of Polish, Slovakian, or Czech is required, though it will help	1—4	Zdenko Podobný
outreach	3. Easy	Apertium on Macedonian Wikipedia	Bulgarian WP has 107,355 articles, Macedonian WP has 42,112, less than half as many. Translate some articles from Bulgarian Wikipedia to Macedonian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand.	1—4	Francis Tyers
outreach	3. Easy	Apertium on Occitan Wikipedia	Catalan WP has 290,059 articles, Occitan WP has 22,579, less than a tenth as many. Translate some articles from Catalan Wikipedia to Occitan Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand.	1—4	Francis Tyers
outreach	3. Easy	Apertium on Asturian Wikipedia	Spanish WP has 663,567 articles, Asturian WP has 13,869, almost a fiftieth as few. Translate some articles from Spanish Wikipedia to Asturian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand.	1—4	Francis Tyers
quality	3. Easy	Thorough checkup of bn-en morphological analyser	While the current bn-en morphological analyser has a pretty good coverage, it should have been higher. Part of the reason is that a lot of verbs have one/two slight different surface forms that differ from the regular ones and the analyser misses them. Using lt-expand it's possible to generate all forms of the verbs, then manually check these and using another script (already in the pair) rebuild the analyser file. This checking will require a native speaker/expert on Bengali language	2—4	Abu Zaher
code	2. Medium	NSIS script	Write an NSIS script to install the Cygwin version of Apertium on Windows.	2—6	Jimregan
code	2. Medium	Dixtools: TEI export	Take the code from Dix2CC.java or Dix2Tiny.java and adapt to export TEI P5 format dictionaries, suitable for FreeDict. This project is suitable for someone interested in learning Java.	2—4	Jimregan
research	2. Medium	Contrastive analysis: Macedonian and Albanian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Macedonian and Albanian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Kurdish and Persian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Kurdish and Persian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Hindi and Urdu	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Hindu and Urdu. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Finnish and Estonian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Finnish and Estonian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Spanish and Italian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Spanish and Italian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Gramirez
research	2. Medium	Contrastive analysis: Catalan and Sardinian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Catalan and Sardinian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Italian and Sardinian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Italian and Sardinian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Deadbeef
research	3. Easy	Catalogue resources: Aromanian	Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under.		Francis Tyers
translation	2. Medium	Translate the HOWTO: Polish	Translate the new language pair HOWTO into Polish.	5—8	Jimregan
translation	2. Medium	Translate the HOWTO: Slovakian	Translate the new language pair HOWTO into Slovakian.	5—8	Jimregan
translation	2. Medium	Translate the HOWTO: Italian	Translate the new language pair HOWTO into Italian.	5—8	Deadbeef
translation	2. Medium	Translate the HOWTO: Norwegian	Translate the new language pair HOWTO into Nynorsk.	5—8	Unhammer
translation	2. Medium	Translate the HOWTO: Norwegian	Translate the new language pair HOWTO into Bokmål.	5—8	Unhammer
quality	3. Easy	Quality evaluation: Spanish and French	Perform a human post-edition evaluation of the Spanish and French language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words.	4—8	Francis Tyers
quality	3. Easy	Quality evaluation: Spanish and Occitan	Perform a human post-edition evaluation of the Spanish and Occitan language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words.	4—8	Mireia Ginestí
quality	3. Easy	Quality evaluation: Spanish and Asturian	Perform a human post-edition evaluation of the Spanish and Asturian language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words.	4—8	Francis Tyers
user interface	1. Hard	Design a user-friendly interface for Apertium	Apertium does not currently have a friendly user interface for translators. Look at other translation software on the market, and sketch out some ideas for how to design a user interface. This will not require programming, but could, for example involve using Glade to demonstrate the ideas.		Jimregan
training	3. Easy	Step-by-step "become a developer" guide	Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and make a single change in a language pair. This should include everything, from checking out with SVN to requesting committer access on SourceForge. Document everything you do!	2—3	Mikel L. Forcada
training	3. Easy	Step-by-step "constraint grammar" guide	Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install Constraint Grammar and fix 5 disambiguation problems in a single sentence, then committing to the incubator. This should include everything, from checking out with SVN to requesting committer access on SourceForge. Document everything you do!	2—3	Unhammer
quality	3. Easy	Release freshness	Go through all the 25 released pairs and note down their date of last release and how many dictionary entries and rules they have. Then go to SVN and look at the module for the released pair and find out how many dictionary entries and rules it has. Put this into a spreadsheet and email the mailing list. Why? Our release cycle is very slow, and often we get pairs in trunk which have substantial improvements but have not been released.	2—4	Francis Tyers
outreach	3. Easy	Translate the Wikipedia article on Apertium: Macedonian	Translate the article on Apertium into Macedonian for the Macedonian Wikipedia	30m-1h	Francis Tyers
outreach	3. Easy	Translate the Wikipedia article on Apertium: Aragonese	Translate the article on Apertium into Aragonese for the Aragonese Wikipedia	30m-1h	Jimregan
documentation	3. Easy	Create a dictionary crossing guide	Create a full guide to crossing dictionaries, using notes that will be provided.	2—3	Jimregan
outreach	3. Easy	Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'.	Students around the world use Apertium (and other MT systems) to do their second-language homework. The documents would summarize the do's and don'ts, and could even elaborate on how students using Apertium for their homework could discover ways in which Apertium could be improved.	2—3	Mikel L. Forcada
documentation	3. Easy	Document undocumented features: manpages	Work through each of the manpages in apertium and lttoolbox, checking that each of the options listed by --help is documented.	2—4	Jimregan
research	3. Easy	Create manually tagged corpora: Occitan	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one.	2—4	Mireia Ginestí
research	3. Easy	Create manually tagged corpora: Italian	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one.	2—4	Mireia Ginestí
research	3. Easy	Create manually tagged corpora: Catalan	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking the corpus in the es-ca package, and adapting it in terms of the multiwords present in en-ca, but absent in es-ca.	2—4	Mireia Ginestí
research	3. Easy	Create manually tagged corpora: Polish	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger.	2—4	Jimregan
research	3. Easy	Create manually tagged corpora: Czech	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger.	2—4	Jimregan
research	3. Easy	Create manually tagged corpora: Slovakian	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger.	2—4	Jimregan
research	3. Easy	Create manually tagged corpora: Russian	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger.	2—4	Jimregan
research	3. Easy	Create manually tagged corpora: Ukrainian	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger.	2—4	Jimregan
quality	1. Hard	Improve a language pair: Welsh-English	Find some faults in Welsh-English and fix them.	8—12	Francis Tyers
quality	1. Hard	Improve a language pair: Breton-French	Find some faults in Breton-French and fix them.	8—12	Francis Tyers
quality	1. Hard	Improve a language pair: Basque-Spanish	Find some faults in Basque-Spanish and fix them.	8—12	Mireia Ginestí
documentation	2. Medium	Document undocumented features: cascaded interchunk	Update the Apertium manual to document cascaded interchunk.	4—8	Mikel L. Forcada
documentation	2. Medium	Document undocumented features: transliteration	Update the Apertium manual to document the transliteration features in lttoolbox.	4—8	Francis Tyers
quality	1. Hard	Fix some tagger errors in Swedish->Danish	apertium-sv-da could be improved with a Constraint Grammar. Find 10 sentences that get wrong translations due to tagging, and write CG rules to fix them. The student should have good knowledge of Swedish, or at least some Scandinavian language.	8—12	Unhammer
quality	1. Easy	Improve Swedish-Danish	Add 50 nouns you feel are missing in translations from Swedish to Danish.	3—6	Jacob Nordfalk
quality	1. Easy	Improve English-Esperanto	Add 50 words you feel are missing in translations from English to Esperanto.	3—6	Jacob Nordfalk
quality	1. Easy	Improve Spanish-Esperanto	Add 50 words you feel are missing in translations from Spanish to Esperanto.	3—6	Hectoralos
quality	1. Easy	Improve Catalan-Esperanto	Add 50 words you feel are missing in translations from Catalan to Esperanto.	3—6	Hectoralos
quality	1. Easy	Improve Irish-Manx Gaelic coverage	I can provide a list of the most common Irish words not covered by the bilingual dictionary, and their English translations. Manx translations needed for these.	3—6	Kevin Scannell
quality	1. Easy	Add gender information to Manx dictionary	Most of the nouns in the Manx dictionary have gender information in place - look up and add any that are missing.	3—6	Kevin Scannell
quality	1. Easy	Proofread Albanian analyser	We have a morphological analyser for Albanian, but it has been written by a non-native speaker and needs to be checked.	6—10	Francis Tyers
translation	1. Easy	Proofread Catalan--Sardian dictionary	Go through the Catalan--Sardinian dictionary and check the entries, there are only around a thousand or so.	1—2	Francis Tyers

Task ideas for Google Code-in

Task list

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools