Task ideas for Google Code-in
This is the task ideas page for Google Code-in (http://code.google.com/gci), here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.
The people column lists people who you should get in contact with to request further information. The time column gives the minimum estimated amount of time that should be spent on the task. It does not include time taken to install / set up apertium.
Task list
Area | Difficulty | Title | Description | Time (hours) |
People |
---|---|---|---|---|---|
code | 2. Medium | Cross a language pair: Occitan-French | Using apertium-crossdics, build a dictionary for Occitan-French from Occitan-Catalan and Catalan-French, and clean up the result. | 8—12 | Mireia Ginestí |
code | 2. Medium | Cross a language pair: Aragonese-Catalan | Using apertium-crossdics, build a dictionary for Aragonese-Catalan from Aragonese-Spanish and Spanish-Catalan, and clean up the result. | 8—12 | Jimregan |
code | 2. Medium | Cross a language pair: Romanian-French | Using apertium-crossdics, build a dictionary for Romanian-French from Romanian-Spanish and Spanish-French, and clean up the result. | 8—12 | Francis Tyers |
code | 2. Medium | Cross a language pair: Romanian-Italian | Using apertium-crossdics, build a dictionary for Romanian-Italian from Romanian-Spanish and Spanish-Italian, and clean up the result. | 8—12 | Francis Tyers |
code | 1. Hard | Convert existing resource: Urdu morphological analyser | Take Muhammad Humayoun's Urdu Morphology and convert to lttoolbox format. | 8—10 | Francis Tyers |
code | 1. Hard | Convert existing resource: Punjabi morphological analyser | Take Muhammad Humayoun's Punjabi Morphology and convert to lttoolbox format. | 8—10 | Francis Tyers |
code | 1. Hard | Convert existing resource: Kurdish morphological analyser | Take the Alexina Kurdish Morphology and convert to lttoolbox format. | 8—10 | Francis Tyers |
outreach | 3. Easy | Apertium on Macedonian Wikipedia | Bulgarian WP has 107,355 articles, Macedonian WP has 42,112, less than half as many. Translate some articles from Bulgarian Wikipedia to Macedonian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. | 1—4 | Francis Tyers |
outreach | 3. Easy | Apertium on Occitan Wikipedia | Catalan WP has 290,059 articles, Occitan WP has 22,579, less than a tenth as many. Translate some articles from Catalan Wikipedia to Occitan Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. | 1—4 | Francis Tyers |
outreach | 3. Easy | Apertium on Asturian Wikipedia | Spanish WP has 663,567 articles, Asturian WP has 13,869, almost a fiftieth as few. Translate some articles from Spanish Wikipedia to Asturian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand. | 1—4 | Francis Tyers |
quality | 3. Easy | Thorough checkup of bn-en morphological analyser | While the current bn-en morphological analyser has a pretty good coverage, it should have been higher. Part of the reason is that a lot of verbs have one/two slight different surface forms that differ from the regular ones and the analyser misses them. Using lt-expand it's possible to generate all forms of the verbs, then manually check these and using another script (already in the pair) rebuild the analyser file. This checking will require a native speaker/expert on Bengali language | Abu Zaher | |
code | 2. Medium | NSIS script | Write an NSIS script to install the Cygwin version of Apertium on Windows. | 2—6 | Jimregan |
research | 2. Medium | Contrastive analysis: Macedonian and Albanian | Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Macedonian and Albanian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. | 4—6 | Francis Tyers |
research | 2. Medium | Contrastive analysis: Kurdish and Persian | Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Kurdish and Persian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. | 4—6 | Francis Tyers |
research | 2. Medium | Contrastive analysis: Hindi and Urdu | Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Hindu and Urdu. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. | 4—6 | Francis Tyers |
research | 2. Medium | Contrastive analysis: Finnish and Estonian | Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Finnish and Estonian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis. | 4—6 | Francis Tyers |
research | 3. Easy | Catalogue resources: Aromanian | Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under. | Francis Tyers | |
translation | 2. Medium | Translate the HOWTO: Polish | Translate the new language pair HOWTO into Polish. | 5—8 | Jimregan |
translation | 2. Medium | Translate the HOWTO: Slovakian | Translate the new language pair HOWTO into Slovakian. | 5—8 | Jimregan |
translation | 2. Medium | Translate the HOWTO: Italian | Translate the new language pair HOWTO into Italian. | 5—8 | Deadbeef |
translation | 2. Medium | Translate the HOWTO: Norwegian | Translate the new language pair HOWTO into Nynorsk. | 5—8 | Unhammer |
translation | 2. Medium | Translate the HOWTO: Norwegian | Translate the new language pair HOWTO into Bokmål. | 5—8 | Unhammer |
quality | 3. Easy | Quality evaluation: Spanish and French | Perform a human post-edition evaluation of the Spanish and French language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words. | 4—8 | Francis Tyers |
quality | 3. Easy | Quality evaluation: Spanish and Occitan | Perform a human post-edition evaluation of the Spanish and Occitan language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words. | 4—8 | Mireia Ginestí |
quality | 3. Easy | Quality evaluation: Spanish and Asturian | Perform a human post-edition evaluation of the Spanish and Asturian language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 2,000 words. | 4—8 | Francis Tyers |
user interface | 1. Hard | Design a user-friendly interface for Apertium | Apertium does not currently have a friendly user interface for translators. Look at other translation software on the market, and sketch out some ideas for how to design a user interface. This will not require programming, but could, for example involve using Glade to demonstrate the ideas. | Jimregan | |
training | 3. Easy | Step-by-step "become a developer" guide | Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and make a single change in a language pair. This should include everything, from checking out with SVN to requesting committer access on SourceForge. Document everything you do! | 2—3 | Mikel L. Forcada |
training | 3. Easy | Step-by-step "constraint grammar" guide | Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install Constraint Grammar and fix 5 disambiguation problems in a single sentence, then committing to the incubator. This should include everything, from checking out with SVN to requesting committer access on SourceForge. Document everything you do! | 2—3 | Unhammer |
quality | 3. Easy | Release freshness | Go through all the 25 released pairs and note down their date of last release and how many dictionary entries and rules they have. Then go to SVN and look at the module for the released pair and find out how many dictionary entries and rules it has. Put this into a spreadsheet and email the mailing list. Why? Our release cycle is very slow, and often we get pairs in trunk which have substantial improvements but have not been released. | 2—4 | Francis Tyers |
outreach | 3. Easy | Translate the Wikipedia article on Apertium: Macedonian | Translate the article on Apertium into Macedonian for the Macedonian Wikipedia | 30m-1h | Francis Tyers |
outreach | 3. Easy | Translate the Wikipedia article on Apertium: Aragonese | Translate the article on Apertium into Aragonese for the Aragonese Wikipedia | 30m-1h | Jimregan |
documentation | 3. Easy | Create a dictionary crossing guide | Create a full guide to crossing dictionaries, using notes that will be provided. | 2—3 | Jimregan |
outreach | 3. Easy | Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'. | Students around the world use Apertium (and other MT systems) to do their second-language homework. The documents would summarize the do's and don'ts, and could even elaborate on how students using Apertium for their homework could discover ways in which Apertium could be improved. | 2—3 | Mikel L. Forcada |
documentation | 3. Easy | Document undocumented features: manpages | Work through each of the manpages in apertium and lttoolbox, checking that each of the options listed by --help is documented. | 2—4 | Jimregan |
research | 3. Easy | Create manually tagged corpora: Occitan | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. | 2—4 | Mireia Ginestí |
research | 3. Easy | Create manually tagged corpora: Italian | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. | 2—4 | Mireia Ginestí |
research | 3. Easy | Create manually tagged corpora: Catalan | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking the corpus in the es-ca package, and adapting it in terms of the multiwords present in en-ca, but absent in es-ca. | 2—4 | Mireia Ginestí |
research | 3. Easy | Create manually tagged corpora: Polish | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. | 2—4 | Jimregan |
research | 3. Easy | Create manually tagged corpora: Czech | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. | 2—4 | Jimregan |
research | 3. Easy | Create manually tagged corpora: Slovakian | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. | 2—4 | Jimregan |
research | 3. Easy | Create manually tagged corpora: Russian | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. | 2—4 | Jimregan |
research | 3. Easy | Create manually tagged corpora: Ukrainian | Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (such as from Wikipedia), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. It may be preferable to use LanguageTool's tagger. | 2—4 | Jimregan |
quality | 1. Hard | Improve a language pair: Welsh-English | Find some faults in Welsh-English and fix them. | 8—12 | Francis Tyers |
quality | 1. Hard | Improve a language pair: Breton-French | Find some faults in Breton-French and fix them. | 8—12 | Francis Tyers |
quality | 1. Hard | Improve a language pair: Basque-Spanish | Find some faults in Basque-Spanish and fix them. | 8—12 | Mireia Ginestí |
documentation | 2. Medium | Document undocumented features: cascaded interchunk | Update the Apertium manual to document cascaded interchunk. | 4—8 | Mikel L. Forcada |
documentation | 2. Medium | Document undocumented features: transliteration | Update the Apertium manual to document the transliteration features in lttoolbox. | 4—8 | Francis Tyers |
quality | 1. Hard | Fix some tagger errors in Swedish->Danish | apertium-sv-da could be improved with a Constraint Grammar. Find 10 sentences that get wrong translations due to tagging, and write CG rules to fix them. The student should have good knowledge of Swedish, or at least some Scandinavian language. | 8—12 | Unhammer |