Difference between revisions of "Task ideas for Google Code-in"
Jump to navigation
Jump to search
TommiPirinen (talk | contribs) (→Task ideas: addede descriptions) |
TommiPirinen (talk | contribs) |
||
Line 85: | Line 85: | ||
|description=Use d3 globe to make an apertium language/pair viewer (like [[pairviewer]]), maybe based on [https://www.jasondavies.com/maps/rotate/ this] or [http://bl.ocks.org/KoGor/5994804 this] or [http://bl.ocks.org/dwtkns/4973620 this]. [http://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/mapviewer/langdata/apertium-languages.tsv This file] contains coordinates of Apertium languages.|mentors=Firespeaker|type=code|tags=js,html,maps}} |
|description=Use d3 globe to make an apertium language/pair viewer (like [[pairviewer]]), maybe based on [https://www.jasondavies.com/maps/rotate/ this] or [http://bl.ocks.org/KoGor/5994804 this] or [http://bl.ocks.org/dwtkns/4973620 this]. [http://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/mapviewer/langdata/apertium-languages.tsv This file] contains coordinates of Apertium languages.|mentors=Firespeaker|type=code|tags=js,html,maps}} |
||
{{Taskidea |
{{Taskidea |
||
− | |title=make a thing to detect contexts where a path in a compiled transducer begins with a whitespace |
+ | |title=make a thing to detect contexts where a path in a compiled transducer begins with a whitespace |
+ | |desciption}} |
||
{{Taskidea |
{{Taskidea |
||
− | |title=make the lt-comp compiler print a warning when a path begins with a whitespace. |
+ | |title=make the lt-comp compiler print a warning when a path begins with a whitespace. |
+ | |description=Common mistake in dix files is to have some bad whitespace at places, this needs to be aqutomatically detected in the compilation tool and warning to user issued.}} |
||
{{Taskidea |
{{Taskidea |
||
|title=apertium-mar-hin: make the TL morph for any part of speech less daft |
|title=apertium-mar-hin: make the TL morph for any part of speech less daft |
||
+ | |description=Some morph in Marathi or Hindi are currently daft. |
||
|tags=morphology|mentors=vin-ivar}} |
|tags=morphology|mentors=vin-ivar}} |
||
{{Taskidea |
{{Taskidea |
||
|title=add indic scripts/formal latin transliterations |
|title=add indic scripts/formal latin transliterations |
||
− | |description= to |
+ | |description=Translitteration is a ways to write stuffs in different scripts. Currently some indic scrpts are done only to some WX transliterator|tags=python|mentors=vin-ivar}} |
{{Taskidea| |
{{Taskidea| |
||
− | + | title=apertium-hin: more consistency with apertium-mar for verbs|tags=morphology|mentors=vin-ivar |
|
+ | |description= Verbs in Marath and Hindi are incosistently. |
||
+ | |type=code}} |
||
{{Taskidea| |
{{Taskidea| |
||
− | title=apertium-mar: replace cases with postpositions|tags=morphology|tags=morphology|mentors=vin-ivar |
+ | title=apertium-mar: replace cases with postpositions|tags=morphology|tags=morphology|mentors=vin-ivar |
+ | |description=Marathi cases are postpositions |
||
+ | |type=code}} |
||
{{Taskidea| |
{{Taskidea| |
||
− | title=apertium-mar: fix modals and quasi-modals|tags=morphology|mentors=vin-ivar |
+ | title=apertium-mar: fix modals and quasi-modals|tags=morphology|mentors=vin-ivar |
+ | |description=Modals in Marathi need fixing |
||
+ | |type=code}} |
||
{{Taskidea |
{{Taskidea |
||
|type=code |
|type=code |
||
|title=refactor x file in apy |
|title=refactor x file in apy |
||
+ | |description=Reorganise apy code to be more readable, maintainable and so forth. |
||
|mentors=Putti}} |
|mentors=Putti}} |
||
{{Taskidea |
{{Taskidea |
||
|type=documentation |
|type=documentation |
||
|title=add docstrings to x file in apy |
|title=add docstrings to x file in apy |
||
+ | |description=docstrings are a way to document python code that can be generated into documentation on the web or in python. See following PEPs in python.org |
||
|mentors=Putti}} |
|mentors=Putti}} |
||
{{Taskidea |
{{Taskidea |
||
Line 114: | Line 125: | ||
|mentors=Putti, (sushain, unhammer ?)}} |
|mentors=Putti, (sushain, unhammer ?)}} |
||
{{Taskidea |
{{Taskidea |
||
+ | |type=code |
||
|title=add 1 transfer rule |
|title=add 1 transfer rule |
||
+ | |description=Transfer rules are parts of translation process dealing with re-arranging, adding and deleting words. See also [[Short introduction to transfer]] |
||
|mentors=Fran, vinit}} |
|mentors=Fran, vinit}} |
||
{{Taskidea |
{{Taskidea |
||
+ | |type=code |
||
|title=add 50 entries to a bidix |
|title=add 50 entries to a bidix |
||
+ | |description=Bilingual dictionary (bidix) contains word-to-word translations between languages, e.g. cat-chat or cat-Katze in English to French or German respectively. Add 50 of such word-translations to languages you know. |
||
|mentors=Fran, vinit}} |
|mentors=Fran, vinit}} |
||
{{Taskidea |
{{Taskidea |
||
+ | |type=code |
||
|title=write 10 lexical selection rules |
|title=write 10 lexical selection rules |
||
− | |description=Write 10 lexical selection rules for a pair already set up with lexical selection |
+ | |description=Write 10 lexical selection rules for a pair already set up with [[lexical selection]] |
|mentors=Fran, vinit}} |
|mentors=Fran, vinit}} |
||
{{Taskidea |
{{Taskidea |
||
+ | |type=code |
||
|title=write 10 constraint grammar rules |
|title=write 10 constraint grammar rules |
||
+ | |description=[[Constraint grammar]] is a rule-based approach of selecting linguistic readings from ambiguous cases, to improve translation quality etc. See introduction cG here: |
||
|mentors=Fran, vinit}} |
|mentors=Fran, vinit}} |
||
{{Taskidea |
{{Taskidea |
||
Line 132: | Line 150: | ||
|mentors=Firespeaker}} |
|mentors=Firespeaker}} |
||
⚫ | |||
− | |||
+ | |title=apertium-hun: match existing apertium-hun paradigms with morphdb.hu |
||
+ | |description=Morphdb.hu is another implementation of Hungarian morphology, that has a large lexicon. In order to convert it to apertium format, the classification of the words needs to be mapped to one used in apertium.}} |
||
{{Taskidea|type=code|mentors=Flammie|tags= |
{{Taskidea|type=code|mentors=Flammie|tags= |
||
|title=apertium-hun: convert hunmorph.db into apertium |
|title=apertium-hun: convert hunmorph.db into apertium |
||
− | |description=one of: |
+ | |description=one of: See prerequisite task above. }} |
− | {{Taskidea|type=code|mentors=Flammie|tags= |
+ | {{Taskidea|type=code|mentors=Flammie|tags=fin,dix |
− | |title=apertium- |
+ | |title=apertium-fin-eng: go through lexicon for potential rubbish words) |
+ | |description=Apertium's Finnish–English dictionary has been converted from projects, like Finnwordnet, that hae a lot of pairs unsuitable for MT, find and delete them from the file. |
||
⚫ | |||
+ | }} |
||
− | |title=apertium-fin-eng: go through lexicon for potential rubbish words)}} |
||
− | {{Taskidea|type=code|mentors=Flammie|tags= |
+ | {{Taskidea|type=code|mentors=Flammie|tags=eng,dix |
|title=apertium-fin-eng: add words from apertium-fin-eng to apertium-eng |
|title=apertium-fin-eng: add words from apertium-fin-eng to apertium-eng |
||
− | |description=grep for English words in apertium-fin-eng.fin-eng.dix and classify them according to paradgims. |
+ | |description=grep for English words in apertium-fin-eng.fin-eng.dix and classify them according to paradgims. See also: [[Apertium English]])}} |
− | {{Taskidea|type=code|mentors=Flammie|tags= |
+ | {{Taskidea|type=code|mentors=Flammie|tags=apy |
|title=apertium-apy: add i/o formats) |
|title=apertium-apy: add i/o formats) |
||
+ | |description=Currently APY web queries get responses in ad hoc json format. Research and implement interoperabilities with further formats, such as: }} |
||
− | |e.g. for interoperability with web apps and well-known apis }} |
||
⚫ | |||
− | |||
⚫ | |||
|title=apertium-apy: write metadata about apertium language pairs |
|title=apertium-apy: write metadata about apertium language pairs |
||
|description=CMDI format that can be deployed for CLARIN stuffs}} |
|description=CMDI format that can be deployed for CLARIN stuffs}} |
||
− | {{Taskidea|type=code|mentors=Flammie|tags= |
+ | {{Taskidea|type=code|mentors=Flammie|tags=apy |
|title=apertium-apy: make more parts of apertium-pipeline on web |
|title=apertium-apy: make more parts of apertium-pipeline on web |
||
+ | |description=apertium.org has a web service interface for getting translations or morphological analyses. This should be extended for other functions of apertium as well. more information: [[Apertium Apy]].}} |
||
− | |description=available through API (e.g. disambig, etc.) )}} |
||
⚫ | |||
− | |||
⚫ | |||
⚫ | |||
+ | |There exists a version from last GSOC of apertium.org translator where user can suggest fixes to unknown word translations among other things, but this is not deployed to server. )}} |
||
⚫ | |||
− | {{Taskidea|type=code|mentors=Flammie|tags= |
+ | {{Taskidea|type=code|mentors=Flammie|tags=apy |
− | |title=Further developments to suggest a word |
+ | |title=Further developments to suggest a word |
+ | |description=Currently suggested words may be added to wiki by a service, it would make sense to also have e.g. chance to login and get attributed as contributor, as well as other stuff )}} |
||
{{Taskidea|type=code|mentors=Fran |
{{Taskidea|type=code|mentors=Fran |
Revision as of 22:26, 30 October 2016
Contents |
This is the task ideas page for Google Code-in, here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.
The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:
- this does not include time taken to install / set up apertium.
- this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.
Categories:
- code: Tasks related to writing or refactoring code
- documentation: Tasks related to creating/editing documents and helping others learn more
- research: Tasks related to community management, outreach/marketting, or studying problems and recommending solutions
- quality: Tasks related to testing and ensuring code is of high quality.
- interface: Tasks related to user experience research or user interface design and interaction
You can find descriptions of some of the mentors here: List_of_Apertium_mentors.
Task ideas
type | title | description | tags | mentors | |||
---|---|---|---|---|---|---|---|
code | Fix a memory leak in matxin-transfer | hargle bargle | c++ | Fran | |||
research | See if you can precompile xpath expressions or xslt stylesheets | hargle bargle | parsing | Fran | |||
research | Review literature on linearisation of dependency trees | hargle bargle | parsing | Fran | |||
Manually annotate/Tag text in Apertium format | Fran | ||||||
code | Convert Chukchi Nouns to HFST/lexc | Fran | |||||
code | Convert Chukchi Numerals to HFST/lexc | Fran | |||||
code | Convert Chukchi Adjectives to HFST/lexc | Fran | |||||
Make a (web) viewer for parallel treebanks | (also for viewing diff annotation for same sentence) | ||||||
Write a script to convert a UD treebank | for a given language to a format suitable for training the perceptron tagger | ||||||
Train the perceptron tagger for a language | Perceptron is | Fran | |||||
Design an annotation tool for disambiguation | like c.f. webanno, corpus.mari-language.org, brat | ||||||
Design an annotation tool for adding dependencies | Like c.f. | ||||||
Train lexical selection rules description= from a large parallel corpus for a language pair | Fran | ||||||
Document how to set up the experiments for weighted transfer rules | Fran | ||||||
convert UD treebank to apertium tags, use unigram tagger | (see #apertium logs 2016-06-22) | ||||||
Write a script to extract sentences from CoNLL-U | where they have the same tokenisation as Apertium. | Fran | |||||
convert [1] to apertium-style documentation | |||||||
code | Implement `lt-print --strings` lt-print -s | c++ | Fran | ||||
code | Implement lt-expand -n | Implement an algorithm that prints out a transducer but only follows n cycles. | c++ | Fran, wei2912 | |||
code | in-browser globe with apertium languages as points | Use d3 globe to make an apertium language/pair viewer (like pairviewer), maybe based on this or this or this. This file contains coordinates of Apertium languages. | js,html,maps | Firespeaker | |||
make a thing to detect contexts where a path in a compiled transducer begins with a whitespacedesciption | |||||||
make the lt-comp compiler print a warning when a path begins with a whitespace. | Common mistake in dix files is to have some bad whitespace at places, this needs to be aqutomatically detected in the compilation tool and warning to user issued. | ||||||
apertium-mar-hin: make the TL morph for any part of speech less daft | Some morph in Marathi or Hindi are currently daft. | morphology | vin-ivar | ||||
add indic scripts/formal latin transliterations | Translitteration is a ways to write stuffs in different scripts. Currently some indic scrpts are done only to some WX transliterator | python | vin-ivar | ||||
code | apertium-hin: more consistency with apertium-mar for verbs | Verbs in Marath and Hindi are incosistently. | morphology | vin-ivar | |||
code | apertium-mar: replace cases with postpositions | Marathi cases are postpositions | morphology | vin-ivar | |||
code | apertium-mar: fix modals and quasi-modals | Modals in Marathi need fixing | morphology | vin-ivar | |||
code | refactor x file in apy | Reorganise apy code to be more readable, maintainable and so forth. | Putti | ||||
documentation | add docstrings to x file in apy | docstrings are a way to document python code that can be generated into documentation on the web or in python. See following PEPs in python.org | Putti | ||||
code, quality | write 10 unit tests for apy | Putti, (sushain, unhammer ?) | |||||
code | add 1 transfer rule | Transfer rules are parts of translation process dealing with re-arranging, adding and deleting words. See also Short introduction to transfer | Fran, vinit | ||||
code | add 50 entries to a bidix | Bilingual dictionary (bidix) contains word-to-word translations between languages, e.g. cat-chat or cat-Katze in English to French or German respectively. Add 50 of such word-translations to languages you know. | Fran, vinit | ||||
code | write 10 lexical selection rules | Write 10 lexical selection rules for a pair already set up with lexical selection | Fran, vinit | ||||
code | write 10 constraint grammar rules | Constraint grammar is a rule-based approach of selecting linguistic readings from ambiguous cases, to improve translation quality etc. See introduction cG here: | Fran, vinit | ||||
research, documentation | Document resources for a language | Document resources for a language without resources already documented on the wiki | Firespeaker | ||||
design | apertium-hun: match existing apertium-hun paradigms with morphdb.hu | Morphdb.hu is another implementation of Hungarian morphology, that has a large lexicon. In order to convert it to apertium format, the classification of the words needs to be mapped to one used in apertium. | hun,dix | Flammie | |||
code | apertium-hun: convert hunmorph.db into apertium | one of: See prerequisite task above. | Flammie | ||||
code | apertium-fin-eng: go through lexicon for potential rubbish words) | Apertium's Finnish–English dictionary has been converted from projects, like Finnwordnet, that hae a lot of pairs unsuitable for MT, find and delete them from the file. | fin,dix | Flammie | |||
code | apertium-fin-eng: add words from apertium-fin-eng to apertium-eng | grep for English words in apertium-fin-eng.fin-eng.dix and classify them according to paradgims. See also: Apertium English) | eng,dix | Flammie | |||
code | apertium-apy: add i/o formats) | Currently APY web queries get responses in ad hoc json format. Research and implement interoperabilities with further formats, such as: | apy | Flammie | |||
code | apertium-apy: write metadata about apertium language pairs | CMDI format that can be deployed for CLARIN stuffs | apy | Flammie | |||
code | apertium-apy: make more parts of apertium-pipeline on web | apertium.org has a web service interface for getting translations or morphological analyses. This should be extended for other functions of apertium as well. more information: Apertium Apy. | apy | Flammie | |||
??? | Deploy suggest-a-word feature in apertium.orgThere exists a version from last GSOC of apertium.org translator where user can suggest fixes to unknown word translations among other things, but this is not deployed to server. ) | apy | Flammie | ||||
code | Further developments to suggest a word | Currently suggested words may be added to wiki by a service, it would make sense to also have e.g. chance to login and get attributed as contributor, as well as other stuff ) | apy | Flammie | |||
code | Fix ordering of dependencies in CG matxin format | Fran |