Difference between revisions of "Task ideas for Google Code-in"
Jump to navigation
Jump to search
(→Task ideas: another) |
|||
Line 234: | Line 234: | ||
|title=Fix a memory leak in matxin-transfer |
|title=Fix a memory leak in matxin-transfer |
||
|description=The matxin-transfer program is a component of the [[Matxin]] MT system, a sister system to Apertium. Run valgrind on the code and find and fix a memory leak. There may be serveral. |
|description=The matxin-transfer program is a component of the [[Matxin]] MT system, a sister system to Apertium. Run valgrind on the code and find and fix a memory leak. There may be serveral. |
||
}} |
|||
{{Taskidea|type=code|mentors=Bech |
|||
|title=Write a tool helping to test a bidix coherence |
|||
|description=This tool will generate a file with each lema of the main categories (at least noums, adjectives ans verbs) found in a bidix. Then this file will be translated to the second language and back to the first one. Looking for changes will allow to detect transfer problems and changes of meaning. |
|||
|beginner=possible |
|||
}} |
}} |
||
</table> |
</table> |
Revision as of 22:37, 4 November 2016
Contents |
This is the task ideas page for Google Code-in, here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.
The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:
- this does not include time taken to install / set up apertium.
- this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.
Categories:
- code: Tasks related to writing or refactoring code
- documentation: Tasks related to creating/editing documents and helping others learn more
- research: Tasks related to community management, outreach/marketting, or studying problems and recommending solutions
- quality: Tasks related to testing and ensuring code is of high quality.
- interface: Tasks related to user experience research or user interface design and interaction
You can find descriptions of some of the mentors here: List_of_Apertium_mentors.
Task ideas
type | title | description | tags | mentors | beginner? | ||
---|---|---|---|---|---|---|---|
code | Refactor/mege the main "processing" functions of lrx-proc | lrx-proc has two modes, "-m" mode and default mode. They are implemented by each their huge function, nearly identical to each other. Refactor the code to remove the redundancy, and run tests on lots of text with several language pairs to ensure no regressions. | c++ | Fran, Unhammer | |||
code | Profile and improve speed of lrx-proc | lrx-proc is slower than it should be. There is probably some low-hanging fruit. Try profiling it and implementing an improvement. | c++ | Fran, Unhammer | |||
research | See if you can precompile xpath expressions or xslt stylesheets | An XSLT stylesheet is a program for transforming XML trees. An Xpath expression is a way of specifying a node set in an XML tree. Investigate the possibility of pre-compiling either stylesheets or xpath expressions. | parsing | Fran | |||
research | Review literature on linearisation of dependency trees | A dependency tree is an intermediate representation of a sentence with no implicit word order. Linearisation is finding the appropriate word order for a dependency tree. Do a survey of the available literature and write up a review. | parsing | Fran, Schindler | |||
research | Manually annotate/Tag text in Apertium format | Take some running text, analyse it using an Apertium analyser then manually disambiguate the result. | Fran | ||||
code | Convert Chukchi Nouns to HFST/lexc | There is a freely available lexicon of Chukchi, a language spoken in the north-east of Russia. The objective of this task is to convert part of the lexicon covering nouns to lexc format, which is a formalism for specifying concatenative morphology. | Fran | ||||
code | Convert Chukchi Numerals to HFST/lexc | There is a freely available lexicon of Chukchi, a language spoken in the north-east of Russia. The objective of this task is to convert part of the lexicon covering nouns to lexc format, which is a formalism for specifying concatenative morphology. | Fran | ||||
code | Convert Chukchi Adjectives to HFST/lexc | There is a freely available lexicon of Chukchi, a language spoken in the north-east of Russia. The objective of this task is to convert part of the lexicon covering nouns to lexc format, which is a formalism for specifying concatenative morphology. | Fran | ||||
interface | Make a design for a web-based viewer for parallel treebanks | (also for viewing diff annotation for same sentence) | HTML,CSS | Fran, Schindler | |||
code | Write a script to convert a UD treebank | for a given language to a format suitable for training the perceptron tagger | |||||
research | Train the perceptron tagger for a language | The perceptron tagger is a new part-of-speech tagger that was developed for Apertium in the Summer of Code. Take a language from languages and train the tagger for that language. | Fran | ||||
interface | Design an annotation tool for disambiguation | like c.f. webanno, corpus.mari-language.org, brat | |||||
interface | Design an annotation tool for adding dependencies | Like c.f. brat | |||||
code | Train lexical selection rules | from a large parallel corpus for a language pair | Fran | ||||
documentation | Document how to set up the experiments for weighted transfer rules | Fran | |||||
code | convert UD treebank to apertium tags, use unigram tagger | (see #apertium logs 2016-06-22) | |||||
code | Write a script to extract sentences from CoNLL-U | where they have the same tokenisation as Apertium. | Fran | ||||
documentation | convert [1] to apertium-style documentation | Schindler | |||||
code | Implement `lt-print --strings` lt-print -s | c++ | Fran, wei2912 | ||||
code | Implement lt-expand -n | Implement an algorithm that prints out a transducer but only follows n cycles. | c++ | Fran, wei2912 | |||
code | in-browser globe with apertium languages as points | Use d3 globe to make an apertium language/pair viewer (like pairviewer), maybe based on this or this or this. This file contains coordinates of Apertium languages. | js,html,maps | Firespeaker, kvld | |||
code | Write a program to detect contexts where a path in a compiled transducer begins with a whitespace | c++ | |||||
code | Make the lt-comp compiler print a warning when a path begins with a whitespace. | Common mistake in dix files is to have some bad whitespace at places, this needs to be aqutomatically detected in the compilation tool and warning to user issued. | c++ | ||||
apertium-mar-hin: make the TL morph for any part of speech less daft | Some morph in Marathi or Hindi are currently daft. | morphology | vin-ivar | ||||
add indic scripts/formal latin transliterations | Translitteration is a ways to write stuffs in different scripts. Currently some indic scrpts are done only to some WX transliterator | python | vin-ivar | ||||
code | apertium-hin: more consistency with apertium-mar for verbs | Verbs in Marath and Hindi are incosistently. | morphology | vin-ivar | |||
code | apertium-mar: replace cases with postpositions | Marathi cases are postpositions | morphology | vin-ivar | |||
code | apertium-mar: fix modals and quasi-modals | Modals in Marathi need fixing | morphology | vin-ivar | |||
code | refactor x file in apy | Reorganise apy code to be more readable, maintainable and so forth. | Putti | ||||
documentation | add docstrings to x file in apy | docstrings are a way to document python code that can be generated into documentation on the web or in python. See following PEPs in python.org | Putti, vin-ivar | ||||
quality | write 10 unit tests for apy | Putti, Unhammer, (sushain?) | |||||
code | add 1 transfer rule | Transfer rules are parts of translation process dealing with re-arranging, adding and deleting words. See also Short introduction to transfer | Fran, vin-ivar, zfe, kvld | ||||
code | add 50 entries to a bidix | Bilingual dictionary (bidix) contains word-to-word translations between languages, e.g. cat-chat or cat-Katze in English to French or German respectively. Add 50 of such word-translations to languages you know. | Fran, vin-ivar, zfe, kvld, Schindler | ||||
code | write 10 lexical selection rules | Write 10 lexical selection rules for a pair already set up with lexical selection | Fran, vin-ivar, zfe, Unhammer | ||||
code | write 10 constraint grammar rules | Constraint grammar is a rule-based approach of selecting linguistic readings from ambiguous cases, to improve translation quality etc. See introduction CG here: | Fran, vin-ivar, zfe, kvld, Unhammer | ||||
research | Document resources for a language | Document resources for a language without resources already documented on the wiki | Firespeaker, vin-ivar, zfe, Schindler | ||||
research | Write a contrastive grammar | Document 6 differences between two (preferably related) languages and where they would need to be addressed (morph analysis, transfer, etc) | vin-ivar, (fran? firespeaker?, zfe?, Schindler | X | |||
design | apertium-hun: match existing apertium-hun paradigms with morphdb.hu | Morphdb.hu is another implementation of Hungarian morphology, that has a large lexicon. In order to convert it to apertium format, the classification of the words needs to be mapped to one used in apertium. | hun,dix | Flammie | |||
code | apertium-hun: convert hunmorph.db into apertium | one of: See prerequisite task above. | Flammie | ||||
code | apertium-fin-eng: go through lexicon for potential rubbish words) | Apertium's Finnish–English dictionary has been converted from projects, like Finnwordnet, that hae a lot of pairs unsuitable for MT, find and delete them from the file. | fin,dix | Flammie | |||
code | apertium-fin-eng: add words from apertium-fin-eng to apertium-eng | grep for English words in apertium-fin-eng.fin-eng.dix and classify them according to paradgims. See also: Apertium English) | eng,dix | Flammie | |||
code | apertium-apy: add i/o formats) | Currently APY web queries get responses in ad hoc json format. Research and implement interoperabilities with further formats, such as: | apy | Flammie | |||
code | apertium-apy: write metadata about apertium language pairs | CMDI format that can be deployed for CLARIN stuffs | apy | Flammie | |||
code | apertium-apy: make more parts of apertium-pipeline on web | apertium.org has a web service interface for getting translations or morphological analyses. This should be extended for other functions of apertium as well. more information: Apertium Apy. | apy | Flammie | |||
??? | Finish suggest-a-word feature so it can be deployed to apertium.org | There exists a version from last GSOC of apertium.org translator where user can suggest fixes to unknown word translations among other things, but this is not deployed to server. | apy | Flammie | |||
code | Further developments to suggest a word | Currently suggested words may be added to wiki by a service, it would make sense to also have e.g. chance to login and get attributed as contributor, as well as other stuff ) | apy | Flammie | |||
code | Fix ordering of dependencies in CG matxin format | Fran | |||||
code | CG syntax highlighting plugin for a text editor | Write a syntax file for your favourite text editor that provides fancy syntax highlighting for Constraint Grammar | vin-ivar, Unhammer, (Flammie?) | ||||
code | Package apertium-lint to install to a prefix | apertium-lint currently installs with pip, modify that to allow passing a flag for installing it to a prefix | vin-ivar | ||||
quality | Fix a bug in Apertium html-tools | Fix a currently open issue with html-tools in consultation with your mentor. | multi,html,js,html-tools | Unhammer,Firespeaker,Kira | |||
quality | Fix a bug in Apertium APy | Fix a currently open issue with APy in consultation with your mentor. | multi,python,apy | Unhammer,Firespeaker,Kira | |||
code | Script to get resources from GF | Write a script to scrape words from one particular paradigm in GF and make it usable in Apertium. | vin-ivar | ||||
code | Create a list of text editors compatible with different scripts | Create a list of ten text editors and document their status with representing human text (Latin), RTL text (Arabic), combining characters (Devanagari), etc. Document any bugs with eg. copy/paste and tab indentation. | vin-ivar | ||||
code | Write a script to strip apertium morphological information from CONLL-U files | Write a script to strip apertium morphological information from CONLL-U files so the dependency trees can be rendered okay by the online tools. | vin-ivar | ||||
research | Investigate FST backends for Swype-type input | Investigate what options exist for implementing an FST (of the sort used in Apertium spell checking) for auto-correction into an existing open source Swype-type input method on Android. You don't need to do any coding, but you should determine what would need to be done to add an FST backend into the software. Write up your findings on the Apertium wiki. | spelling,android | Firespeaker | |||
code | Fix a memory leak in matxin-transfer | The matxin-transfer program is a component of the Matxin MT system, a sister system to Apertium. Run valgrind on the code and find and fix a memory leak. There may be serveral. | c++ | Fran | |||
code | Write a tool helping to test a bidix coherence | This tool will generate a file with each lema of the main categories (at least noums, adjectives ans verbs) found in a bidix. Then this file will be translated to the second language and back to the first one. Looking for changes will allow to detect transfer problems and changes of meaning. | Bech | possible |