Difference between revisions of "Task ideas for Google Code-in"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
An informal spot for outlining ideas for the [[Google Code-in]] (GCI).
 
An informal spot for outlining ideas for the [[Google Code-in]] (GCI).
   
  +
==Task list==
# '''Code''' Take two language pairs, use [[apertium-crossdics]] and clean up the resulting bilingual dictionary. For instance, build Occitan-French from Occitan-Catalan and Catalan-French.
 
  +
# '''Code''' Convert an existing resource into Apertium format, for example an analyser for Punjabi or Hindi.
 
  +
{|class="wikitable sortable"
# '''Documentation''' Document features used in language pairs but not documented in the current official documentation or wiki (for instance, cascaded interchunk transfer); integrate that into the existing "official " [[documentation]].
 
  +
! Area !! Difficulty !! Title !! Description !! People
  +
|-
 
|align=center| <code>CODE</code> || 2.&nbsp;Medium || Cross a language pair || Take two language pairs, use [[apertium-crossdics]] and clean up the resulting bilingual dictionary. For instance, build Occitan-French from Occitan-Catalan and Catalan-French. || [[User:Francis Tyers|Francis&nbsp;Tyers]]
  +
|-
  +
|align=center| {{sc|code}} || 1.&nbsp;Hard || Convert existing resource || Take an existing linguistic resource and adapt it to be used in Apertium. For example, take a morphological analyser for Punjabi in Functional Morphology and convert it to [[lttoolbox]]. || [[User:Francis Tyers|Francis&nbsp;Tyers]]
  +
|-
  +
|align=center| {{sc|quality}} || 3.&nbsp;Easy || Quality evaluation || Perform a human post-edition evaluation of one of our non-evaluated pairs. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. || [[User:Francis Tyers|Francis&nbsp;Tyers]]
  +
|-
  +
|align=center| {{sc|research}} || 3.&nbsp;Easy || Catalogue resources || Pick an under-resourced language (e.g. Chechen, Guaraní, Aromanian, Chuvash, Swazi, ...) and catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, corpora, etc.) for it along with the licences they are under. || [[User:Francis Tyers|Francis&nbsp;Tyers]]
  +
|-
  +
|align=center| {{sc|quality}} || 1.&nbsp;Hard || Improve a language pair || Find some faults in an existing language pair and fix them. In particular minor→major pairs, e.g. Welsh-English, Basque-Spanish, Breton-French. || [[User:Francis Tyers|Francis&nbsp;Tyers]]
  +
|-
  +
|align=center| {{sc|translation}} || 2.&nbsp;Medium || Translate the HOWTO || Translate the [[new language pair HOWTO]] into another language, and go through it for a new pair of languages. When finished, upload to the [[Incubator]]. || [[User:Francis Tyers|Francis&nbsp;Tyers]]
  +
|-
  +
|align=center| {{sc|documentation}} || 2&nbsp;Medium || Document undocumented features || Find a feature that can't be found in the existing documentation (e.g. cascaded interchunk transfer), and write about it. ||
  +
|-
  +
|}
  +
==Other==
  +
  +
 
# '''Outreach''' Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'.
 
# '''Outreach''' Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'.
# '''Quality Assurance''' Perform a human post-editting evaluation of one of our non-evaluated pairs. At least 5,000 words.
 
 
# '''Quality Assurance''' Make some concrete improvements in a language pair. This might be disambiguation, transfer or vocabulary (in particular, minor-major language pairs would appreciate input: Welsh-English, Basque-Spanish, Breton-French).
 
# '''Quality Assurance''' Make some concrete improvements in a language pair. This might be disambiguation, transfer or vocabulary (in particular, minor-major language pairs would appreciate input: Welsh-English, Basque-Spanish, Breton-French).
# '''Research''' Pick an under-resourced language, and go and find as many free resources for it as possible. This could include grammatical/morphological descriptions, dictionaries, anything. Catalogue them in the [[Incubator]].
 
 
# '''Training''' Writing up a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and start doing development or polishing tasks like the ones above, to become a young Apertium developer. This may reuse or link existing material.
 
# '''Training''' Writing up a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and start doing development or polishing tasks like the ones above, to become a young Apertium developer. This may reuse or link existing material.
# '''Translation''' Translate the [[new language pair HOWTO]] &mdash; and in the process commit the translation system you make to the [[Incubator]]
 
 
# '''User interface''' Update [[apertium-tolk]] and [[apertium-dbus]]
 
# '''User interface''' Update [[apertium-tolk]] and [[apertium-dbus]]
   

Revision as of 17:03, 19 October 2010

An informal spot for outlining ideas for the Google Code-in (GCI).

Task list

Area Difficulty Title Description People
CODE 2. Medium Cross a language pair Take two language pairs, use apertium-crossdics and clean up the resulting bilingual dictionary. For instance, build Occitan-French from Occitan-Catalan and Catalan-French. Francis Tyers
code 1. Hard Convert existing resource Take an existing linguistic resource and adapt it to be used in Apertium. For example, take a morphological analyser for Punjabi in Functional Morphology and convert it to lttoolbox. Francis Tyers
quality 3. Easy Quality evaluation Perform a human post-edition evaluation of one of our non-evaluated pairs. This will involve taking some free text (e.g. from Wikipedia or Wikinews), running it through the translator and then altering the output to be correct. Then using apertium-eval-translator to calculate the Word Error Rate. Francis Tyers
research 3. Easy Catalogue resources Pick an under-resourced language (e.g. Chechen, Guaraní, Aromanian, Chuvash, Swazi, ...) and catalogue all the available resources (grammatical descriptions, wordlists, dictionaries, spellcheckers, corpora, etc.) for it along with the licences they are under. Francis Tyers
quality 1. Hard Improve a language pair Find some faults in an existing language pair and fix them. In particular minor→major pairs, e.g. Welsh-English, Basque-Spanish, Breton-French. Francis Tyers
translation 2. Medium Translate the HOWTO Translate the new language pair HOWTO into another language, and go through it for a new pair of languages. When finished, upload to the Incubator. Francis Tyers
documentation 2 Medium Document undocumented features Find a feature that can't be found in the existing documentation (e.g. cascaded interchunk transfer), and write about it.

Other

  1. Outreach Writing a quick guide on 'What Apertium can and cannot do to help you with your homework'.
  2. Quality Assurance Make some concrete improvements in a language pair. This might be disambiguation, transfer or vocabulary (in particular, minor-major language pairs would appreciate input: Welsh-English, Basque-Spanish, Breton-French).
  3. Training Writing up a simple step-by-step guide (on the wiki) for pre-university students (of varying levels of computer literacy) to install a development version of Apertium and start doing development or polishing tasks like the ones above, to become a young Apertium developer. This may reuse or link existing material.
  4. User interface Update apertium-tolk and apertium-dbus