Difference between revisions of "Task ideas for Google Code-in"

Revision as of 22:44, 26 October 2010

This is the task ideas page for Google Code-in, here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. The time column gives the minimum estimated amount of time that should be spent on the task. It does not include time taken to install / set up apertium.

Task list

Area	Difficulty	Title	Description	Time (hours)	People
code	2. Medium	Cross a language pair: Occitan-French	Using apertium-crossdics, build a dictionary for Occitan-French from Occitan-Catalan and Catalan-French, and clean up the result.	4—10	Francis Tyers
code	2. Medium	Cross a language pair: Aragonese-Catalan	Using apertium-crossdics, build a dictionary for Aragonese-Catalan from Aragonese-Spanish and Spanish-Catalan, and clean up the result.	4—10	Jimregan
code	1. Hard	Convert existing resource: Urdu morphological analyser	Take Muhammad Humayoun's Urdu Morphology and convert to lttoolbox format.	8—10	Francis Tyers
code	1. Hard	Convert existing resource: Punjabi morphological analyser	Take Muhammad Humayoun's Punjabi Morphology and convert to lttoolbox format.	8—10	Francis Tyers
code	1. Hard	Convert existing resource: Kurdish morphological analyser	Take the Alexina Kurdish Morphology and convert to lttoolbox format.	8—10	Francis Tyers
outreach	3. Easy	Apertium on Macedonian Wikipedia	Bulgarian WP has 107,355 articles, Macedonian WP has 42,112, less than half as many. Translate some articles from Bulgarian Wikipedia to Macedonian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand.	1—4	Francis Tyers
outreach	3. Easy	Apertium on Occitan Wikipedia	Catalan WP has 290,059 articles, Occitan WP has 22,579, less than a tenth as many. Translate some articles from Catalan Wikipedia to Occitan Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand.	1—4	Francis Tyers
outreach	3. Easy	Apertium on Asturian Wikipedia	Spanish WP has 663,567 articles, Asturian WP has 13,869, almost a fiftieth as few. Translate some articles from Spanish Wikipedia to Asturian Wikipedia using Apertium, and then postedit them. Explain to the local Wikipedia community what you are doing beforehand.	1—4	Francis Tyers
quality	3. Easy	Thorough checkup of bn-en morphological analyser	While the current bn-en morphological analyser has a pretty good coverage, it should have been higher. Part of the reason is that a lot of verbs have one/two slight different surface forms that differ from the regular ones and the analyser misses them. Using lt-expand it's possible to generate all forms of the verbs, then manually check these and using another script (already in the pair) rebuild the analyser file. This checking will require a native speaker/expert on Bengali language		Abu Zaher
code	2. Medium	NSIS script	Write an NSIS script to install the Cygwin version of Apertium on Windows.		Jimregan
research	2. Medium	Contrastive analysis: Macedonian and Albanian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Macedonian and Albanian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Kurdish and Persian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Kurdish and Persian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Hindi and Urdu	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Hindu and Urdu. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	2. Medium	Contrastive analysis: Finnish and Estonian	Create a set of test sentences (see various pages of 'Pending tests' and 'Regression tests' on the Wiki) for translation between Finnish and Estonian. The tests should cover as many features of the languages as possible. Some of the examples might be able to be found in a grammar, others might need to be invented. This will not involve programming, only grammatical analysis.	4—6	Francis Tyers
research	3. Easy	Catalogue resources: Aromanian	Catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, papers, corpora, etc.) for Aromanian, along with the licences they are under.		Francis Tyers
translation	2. Medium	Translate the HOWTO: Polish	Translate the new language pair HOWTO into Polish, and go through it for a new pair of languages. When finished, upload to the Incubator.	2—3	Jimregan
translation	2. Medium	Translate the HOWTO: Italian	Translate the new language pair HOWTO into Italian, and go through it for a new pair of languages. When finished, upload to the Incubator.	2—3	Deadbeef
quality	3. Easy	Quality evaluation: Spanish and French	Perform a human post-edition evaluation of the Spanish and French language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 4,000 words.	4—8	Francis Tyers
quality	3. Easy	Quality evaluation: Spanish and Occitan	Perform a human post-edition evaluation of the Spanish and Occitan language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 4,000 words.	4—8	Francis Tyers
quality	3. Easy	Quality evaluation: Spanish and Asturian	Perform a human post-edition evaluation of the Spanish and Asturian language pair. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 4,000 words.	4—8	Francis Tyers

Make more specific

Area	Difficulty	Title	Description	Time (hours)	People
research	3. Easy	Create manually tagged corpora	Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (see above), running it through the analyser and tagger, and replacing incorrect analyses with the correct one.		Jimregan
quality	1. Hard	Improve a language pair	Find some faults in an existing language pair and fix them. In particular minor→major pairs, e.g. Welsh-English, Basque-Spanish, Breton-French.		Francis Tyers, Jimregan, Mikel L. Forcada
documentation	2. Medium	Document undocumented features	Find a feature that can't be found in the existing documentation (e.g. cascaded interchunk transfer), and write about it.		Mikel L. Forcada
documentation	2. Medium	Create a dictionary crossing guide	Extract a tutorial guide to using crossdics from jimregan's brain.	2—3	Jimregan
training	3. Easy	Simple step-by-step "become a developer" guide	Write a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and start doing development or polishing tasks like the ones above.		Mikel L. Forcada
user interface	1. Hard	Design a user-friendly interface for Apertium	Apertium does not currently have a friendly user interface for translators. Look at other translation software on the market, and sketch out some ideas for how to design a user interface. This will not require programming, but could, for example involve using Glade to demonstrate the ideas.		Jimregan
outreach	2. Medium	Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'.	Students around the world use Apertium (and other MT systems) to do their second-language homework. The documents would summarize the do's and don'ts, and could even elaborate on how students using Apertium for their homework could discover ways in which Apertium could be improved.		Mikel L. Forcada

@@ Line 54: / Line 54: @@
 {|class="wikitable sortable"
 ! Area             !! Difficulty     !! Title                 !! Description !! Time<br/>(hours) !! People
-|-
-|align=center| {{sc|quality}}       || 3.&nbsp;Easy   || Quality evaluation    || Perform a human post-edition evaluation of one of our non-evaluated pairs. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. The minimum amount of text should be 4,000 words. ||align=center| 4&mdash;8 ||[[User:Francis Tyers|Francis&nbsp;Tyers]]
 |-
 |align=center| {{sc|research}}       || 3.&nbsp;Easy   || Create manually tagged corpora    || Fix tagging errors in a piece of analysed text, for use in tagger training. This will involve taking some free text (see above), running it through the analyser and tagger, and replacing incorrect analyses with the correct one. || || [[User:Jimregan|Jimregan]]

Difference between revisions of "Task ideas for Google Code-in"

Revision as of 22:44, 26 October 2010

Task list

Make more specific

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools