Difference between revisions of "Task ideas for Google Code-in"

From Apertium
Jump to navigation Jump to search
(minor)
(27 intermediate revisions by 9 users not shown)
Line 23: Line 23:
   
 
==Task ideas==
 
==Task ideas==
<table class="sortable wikitable">
+
<table class="sortable wikitable" style="display: none">
  +
<!-- THE TASKS NEED TO BE HIDDEN FOR NOW,
  +
but feel free to remove style="display: none" to preview changes to this page.
  +
Just remember to put it back before saving
  +
JNW 2017-10-30
  +
-->
 
<tr><th>type</th><th>title</th><th>description</th><th>tags</th><th>mentors</th><th>bgnr?</th><th>multi?</th><th>duplicates</th></tr>
 
<tr><th>type</th><th>title</th><th>description</th><th>tags</th><th>mentors</th><th>bgnr?</th><th>multi?</th><th>duplicates</th></tr>
 
{{Taskidea
 
{{Taskidea
Line 30: Line 35:
 
|description=Document resources for a language without resources already documented on the Apertium wiki. [[Task ideas for Google Code-in/Documentation of resources|read more...]]
 
|description=Document resources for a language without resources already documented on the Apertium wiki. [[Task ideas for Google Code-in/Documentation of resources|read more...]]
 
|tags=wiki, languages
 
|tags=wiki, languages
|mentors=Jonathan, Vin
+
|mentors=Jonathan, Vin, Xavivars, Marc Riera
 
|multi=40
 
|multi=40
 
|beginner=yes
 
|beginner=yes
Line 37: Line 42:
 
|title=Write a contrastive grammar
 
|title=Write a contrastive grammar
 
|description=Document 6 differences between two (preferably related) languages and where they would need to be addressed in the [[Apertium pipeline]] (morph analysis, transfer, etc). Use a grammar book/resource for inspiration. Each difference should have no fewer than 3 examples. Put your work on the Apertium wiki under [[Language1_and_Language2/Contrastive_grammar]]. See [[Farsi_and_English/Pending_tests]] for an example of a contrastive grammar that a previous GCI student made.
 
|description=Document 6 differences between two (preferably related) languages and where they would need to be addressed in the [[Apertium pipeline]] (morph analysis, transfer, etc). Use a grammar book/resource for inspiration. Each difference should have no fewer than 3 examples. Put your work on the Apertium wiki under [[Language1_and_Language2/Contrastive_grammar]]. See [[Farsi_and_English/Pending_tests]] for an example of a contrastive grammar that a previous GCI student made.
|mentors=Vin, Jonathan, Fran
+
|mentors=Vin, Jonathan, Fran, mlforcada
 
|tags=wiki, languages
 
|tags=wiki, languages
 
|beginner=yes
 
|beginner=yes
Line 54: Line 59:
 
}}
 
}}
 
{{Taskidea|type=code|mentors=Fran, Masha, Jonathan, Vin
 
{{Taskidea|type=code|mentors=Fran, Masha, Jonathan, Vin
|tags=annotation, annotatrix, javascript
+
|tags=annotation, annotatrix, javascript, dependencies
 
|title=SDparse to CoNLL-U converter in JavaScript
 
|title=SDparse to CoNLL-U converter in JavaScript
 
|description=SDparse is a format for describing dependency trees, they look like relation(head, dependency). CoNLL-U is another
 
|description=SDparse is a format for describing dependency trees, they look like relation(head, dependency). CoNLL-U is another
 
format for describing dependency trees. Make a converter between the two formats. You will probably need to learn more about the specifics of these formats. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/88 here].
 
format for describing dependency trees. Make a converter between the two formats. You will probably need to learn more about the specifics of these formats. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/88 here].
 
}}
 
}}
{{Taskidea|type=quality|mentors=Fran, Masha
+
{{Taskidea|type=quality|mentors=Fran, Masha, Vin
 
|tags=annotation, annotatrix
 
|tags=annotation, annotatrix
 
|title=Write a test for the format converters in annotatrix
 
|title=Write a test for the format converters in annotatrix
Line 70: Line 75:
 
|description=It is possible to detect invalid trees (such as those that have cycles). We would like to write a function to detect those kinds of trees and advise the user. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/96 here].
 
|description=It is possible to detect invalid trees (such as those that have cycles). We would like to write a function to detect those kinds of trees and advise the user. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/96 here].
 
}}
 
}}
{{Taskidea|type=documentation|mentors=Fran, Masha, Jonathan
+
{{Taskidea|type=documentation|mentors=Fran, Masha, Jonathan, Vin
|tags=annotation, annotatrix
+
|tags=annotation, annotatrix, dependencies
 
|title=Write a tutorial on how to use annotatrix to annotate a dependency tree
 
|title=Write a tutorial on how to use annotatrix to annotate a dependency tree
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities in the app, for example tokenisation options.
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities in the app, for example tokenisation options.
 
}}
 
}}
{{Taskidea|type=documentation|mentors=Fran, Masha
+
{{Taskidea|type=documentation|mentors=Fran, Masha, Vin
|tags=annotation, annotatrix, video
+
|tags=annotation, annotatrix, video, dependencies
 
|title=Make a video tutorial on annotating a dependency tree using the [https://github.com/jonorthwash/ud-annotatrix/ UD annotatrix software].
 
|title=Make a video tutorial on annotating a dependency tree using the [https://github.com/jonorthwash/ud-annotatrix/ UD annotatrix software].
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities available in the app, for example tokenisation options.
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities available in the app, for example tokenisation options.
Line 87: Line 92:
 
$ svn diff --old apertium-pol.pol.dix@73196 --new apertium-pol.pol.dix@73199 > changes.diff
 
$ svn diff --old apertium-pol.pol.dix@73196 --new apertium-pol.pol.dix@73199 > changes.diff
 
}}
 
}}
{{Taskidea|type=quality|mentors=fotonzade, Jonathan
+
{{Taskidea|type=quality|mentors=fotonzade, Jonathan, Xavivars, Marc Riera, mlforcada
 
|tags=xml, dictionaries, svn
 
|tags=xml, dictionaries, svn
 
|title=Add 200 new entries to a bidix to language pair %AAA%-%BBB%
 
|title=Add 200 new entries to a bidix to language pair %AAA%-%BBB%
|description=Our translation systems require large lexicons so as to provide production-quality coverage of any input data. This task requires the student to add 500 new words to a bidirectional dictionary.
+
|description=Our translation systems require large lexicons so as to provide production-quality coverage of any input data. This task requires the student to add 200 new words to a bidirectional dictionary.
 
|multi=yes
 
|multi=yes
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
{{Taskidea|type=quality|mentors=fotonzade, Jonathan
+
{{Taskidea|type=quality|mentors=fotonzade, Jonathan, Xavivars, Marc Riera, mlforcada
 
|tags=xml, dictionaries, svn
 
|tags=xml, dictionaries, svn
 
|title=Add 500 new entries to a bidix to language pair %AAA%-%BBB%
 
|title=Add 500 new entries to a bidix to language pair %AAA%-%BBB%
Line 100: Line 105:
 
|multi=yes
 
|multi=yes
 
}}
 
}}
{{Taskidea|type=quality|mentors=fotonzade|tags=disambiguation, svn
+
{{Taskidea|type=quality|mentors=fotonzade, Xavivars, Marc Riera, mlforcada
  +
|tags=disambiguation, svn
 
|title=Disambiguate 500 tokens of text in %AAA%
 
|title=Disambiguate 500 tokens of text in %AAA%
 
|description=Run some text through a morphological analyser and disambiguate the output. Contact the mentor beforehand to approve the choice of language and text.
 
|description=Run some text through a morphological analyser and disambiguate the output. Contact the mentor beforehand to approve the choice of language and text.
Line 128: Line 134:
 
|title=conllu parser and searching
 
|title=conllu parser and searching
 
|description=Write a script (preferably in python3) that will parse files in conllu format, and perform basic searches, such as "find a node that has an nsubj relation to another node that has a noun POS" or "find all nodes with a cop label and a past feature"
 
|description=Write a script (preferably in python3) that will parse files in conllu format, and perform basic searches, such as "find a node that has an nsubj relation to another node that has a noun POS" or "find all nodes with a cop label and a past feature"
|tags=python,dependencies
+
|tags=python, dependencies
|mentors=Jonathan, Fran, Wei En
+
|mentors=Jonathan, Fran, Wei En, Anna
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 160: Line 166:
 
|type=code
 
|type=code
 
|title=add an option for reverse compiling to the [[lsx module]]
 
|title=add an option for reverse compiling to the [[lsx module]]
|mentors=Jonathan, Fran, Wei En, Irene
+
|mentors=Jonathan, Fran, Wei En, Irene, Xavivars
 
|description=this should be simple as it can just leverage the existing lttoolbox options for left-right / right-left compiling
 
|description=this should be simple as it can just leverage the existing lttoolbox options for left-right / right-left compiling
 
|tags=C++, transducers, lsx
 
|tags=C++, transducers, lsx
  +
}}{{Taskidea
  +
|type=quality, code
  +
|title=clean up lsx-comp
  +
|mentors=Jonathan, Fran, Wei En, Irene, Xavivars
  +
|description=remove extraneous functions from lsx-comp and clean up the code
  +
|tags=C++, transducers, lsx
  +
}}{{Taskidea
  +
|type=quality, code
  +
|title=clean up lsx-proc
  +
|mentors=Jonathan, Fran, Wei En, Irene, Xavivars
  +
|description=remove extraneous functions from lsx-proc and clean up the code
  +
|tags=C++, transducers, lsx
  +
}}{{Taskidea
  +
|type=documentation
  +
|title=document usage of the lsx module
  +
|mentors= Irene
  +
|description= document which language pairs have included the lsx module in its package, which have beta-tested the lsx module, and which are good candidates for including support for lsx. add to [[Lsx_module/supported_languages | this wiki page]]
  +
|tags=C++, transducers, lsx
  +
|beginner=yes
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=quality
 
|type=quality
  +
|title=beta testing the lsx-module
|title=remove extraneous functions from lsx-comp and clean up the code
 
 
|mentors=Jonathan, Fran, Wei En, Irene
 
|mentors=Jonathan, Fran, Wei En, Irene
  +
|description= [[Lsx_module#Creating_the_lsx-dictionary|create an lsx dictionary]]for any relevant and existing language pair that doesn't yet support it, adding 10-30 entries to it. Thoroughly test to make sure the output is as expected. report bugs/non-supported features and add them to [[Lsx_module#Future_work| future work]]. Document your tested language pair by listing it under [[Lsx_module#Beta_testing]] and in [[Lsx_module/supported_languages | this wiki page]]
|description=
 
 
|tags=C++, transducers, lsx
 
|tags=C++, transducers, lsx
  +
|multi=yes
  +
|dup=yes
 
}}{{Taskidea
 
}}{{Taskidea
|type=quality
+
|type=code
|title=remove extraneous functions from lsx-proc and clean up the code
+
|title=fix an lsx bug / add an lsx feature
 
|mentors=Jonathan, Fran, Wei En, Irene
 
|mentors=Jonathan, Fran, Wei En, Irene
  +
|description= if you've done the above task (beta testing the lsx-module) and discovered any bugs or unsupported features, fix them.
|description=
 
 
|tags=C++, transducers, lsx
 
|tags=C++, transducers, lsx
  +
|multi=yes
  +
|dup=yes
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=code
 
|type=code
Line 183: Line 212:
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=quality,code
 
|type=quality,code
  +
|tag=issues
 
|title=fix any open ticket
 
|title=fix any open ticket
 
|description=Fix any open ticket in any of our issues trackers: [https://sourceforge.net/p/apertium/tickets/ main], [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak]. When you claim this task, let your mentor know which issue you plan to work on.
 
|description=Fix any open ticket in any of our issues trackers: [https://sourceforge.net/p/apertium/tickets/ main], [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak]. When you claim this task, let your mentor know which issue you plan to work on.
Line 262: Line 292:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] supports translation using language variants. However, we do not have first-class style/interface support for it. This task requires speaking with mentors/reading existing discussion to understand the problem and then produce design mockups for a solution. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/82 #82]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] supports translation using language variants. However, we do not have first-class style/interface support for it. This task requires speaking with mentors/reading existing discussion to understand the problem and then produce design mockups for a solution. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/82 #82]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan, Fran, Shardul
+
|mentors=Sushain, Jonathan, Fran, Shardul, Xavivars
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 269: Line 299:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Significant progress has been made towards providing a dictionary-style interface within [https://github.com/goavki/apertium-html-tools html-tools]. This task requires refining the existing [https://github.com/goavki/apertium-html-tools/pull/184 PR] by de-conflicting it with master and resolving the interface concerns discussed [https://github.com/goavki/apertium-html-tools/pull/184#issuecomment-323597780 here]. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/105 #105]) and asynchronous discussion should occur there.
 
|description=Significant progress has been made towards providing a dictionary-style interface within [https://github.com/goavki/apertium-html-tools html-tools]. This task requires refining the existing [https://github.com/goavki/apertium-html-tools/pull/184 PR] by de-conflicting it with master and resolving the interface concerns discussed [https://github.com/goavki/apertium-html-tools/pull/184#issuecomment-323597780 here]. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/105 #105]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Xavivars
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 276: Line 306:
 
|tags=html, css, web
 
|tags=html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has inline styles. These are not very maintainable and widely considered as bad style. This task requires surveying the uses, removing all of them in a clean manner, i.e. semantically, and re-enabling the linter rule that will prevent them going forward. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/114 #114]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has inline styles. These are not very maintainable and widely considered as bad style. This task requires surveying the uses, removing all of them in a clean manner, i.e. semantically, and re-enabling the linter rule that will prevent them going forward. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/114 #114]) and asynchronous discussion should occur there.
|mentors=Sushain, Shardul
+
|mentors=Sushain, Shardul, Xavivars
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 454: Line 484:
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=research
 
|type=research
|mentors=Fran
+
|mentors=Fran, Xavivars
 
|title=Phrasebooks and frequency
 
|title=Phrasebooks and frequency
 
|description=Apertium is quite terrible in general with phrasebook style sentences in most languages. Try translating "what's up" from English to Spanish. The objective of this task is to look for phrasebook/filler type sentences/utterances in parallel corpora of film subtitles and on the internet and order them by frequency/generality. Frequency is the amount of times you see the utterance, generality is in how many different places you see it.
 
|description=Apertium is quite terrible in general with phrasebook style sentences in most languages. Try translating "what's up" from English to Spanish. The objective of this task is to look for phrasebook/filler type sentences/utterances in parallel corpora of film subtitles and on the internet and order them by frequency/generality. Frequency is the amount of times you see the utterance, generality is in how many different places you see it.
Line 470: Line 500:
 
{{Taskidea
 
{{Taskidea
 
|type=research
 
|type=research
|mentors=Vin
+
|mentors=Vin, Jonathan, Anna
 
|title=Create a UD-Apertium morphology mapping
 
|title=Create a UD-Apertium morphology mapping
|description=Choose a language that has a Universal Dependencies treebank and tabulate a potential set of Apertium morph labels based on the (universal) UD morph labels
+
|description=Choose a language that has a Universal Dependencies treebank and tabulate a potential set of Apertium morph labels based on the (universal) UD morph labels. See Apertium's [[list of symbols]] and [http://universaldependencies.org/ UD]'s POS and feature tags for the labels.
|tags=morphology, ud
+
|tags=morphology, ud, dependencies
 
|beginner=
 
|beginner=
 
|multi=5
 
|multi=5
Line 479: Line 509:
 
{{Taskidea
 
{{Taskidea
 
|type=research
 
|type=research
|mentors=Vin
+
|mentors=Vin, Jonathan, Anna
 
|title=Create an Apertium-UD morphology mapping
 
|title=Create an Apertium-UD morphology mapping
 
|description=Choose a language that has an Apertium morphological analyser and adapt it to convert the morphology to UD morphology
 
|description=Choose a language that has an Apertium morphological analyser and adapt it to convert the morphology to UD morphology
|tags=morphology, ud
+
|tags=morphology, ud, dependencies
 
|beginner=
 
|beginner=
 
|multi=5
 
|multi=5
Line 499: Line 529:
 
|mentors=Vin
 
|mentors=Vin
 
|title=Create a syntactic analogy corpus for a particular POS/language.
 
|title=Create a syntactic analogy corpus for a particular POS/language.
|description=Refer to the syntactic section of [this paper](https://www.aclweb.org/anthology/N/N16/N16-2002.pdf). Try to create a data set with more than 2000 * 8 = 16000 entries for a particular POS with any language, using a large corpus for frequency.
+
|description=Refer to the syntactic section of [https://www.aclweb.org/anthology/N/N16/N16-2002.pdf this paper]. Try to create a data set with more than 2000 * 8 = 16000 entries for a particular POS with any language, using a large corpus for frequency.
 
|tags=morphology, embeddings
 
|tags=morphology, embeddings
 
|beginner=
 
|beginner=
Line 520: Line 550:
 
|tags=python,morphology
 
|tags=python,morphology
 
|beginner=
 
|beginner=
  +
}}
  +
{{Taskidea
  +
|type=research,quality
  +
|mentors=Shardul, Jonathan
  +
|tags=issues, python
  +
|title=Clean up open issues in [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak], or [https://github.com/goavki/apertium-apy/issues APy]
  +
|description=Go through issue threads for [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak], or [https://github.com/goavki/apertium-apy/issues APy], and find issues that have been solved in the code but are still open on GitHub. (The fact that they have been solved may not be evident from the comments thread alone.) Once you find such an issue, comment on the thread explaining what code/commit fixed it and how it behaves at the latest revision.
  +
|multi=15
  +
}}
  +
{{Taskidea
  +
|type=code,quality
  +
|mentors=Shardul, Jonathan
  +
|tags=tests, python, IRC
  +
|title=Get [https://github.com/goavki/phenny begiak] to build cleanly
  +
|description=Currently, [https://github.com/goavki/phenny begiak] does not build cleanly because of a number of failing tests. Find what is causing the tests to fail, and either fix the code or the tests if the code has changed its behavior. Document all your changes in the PR that you create.
  +
}}
  +
{{Taskidea
  +
|type=quality
  +
|mentors=Jonathan, Ilnar
  +
|title=Find stems in the Kazakh treebank that are not in the Kazakh analyser
  +
|description=There are quite a few analyses in the [http://svn.code.sf.net/p/apertium/svn/languages/apertium-kaz/texts/puupankki/puupankki.kaz.conllu Kazakh treebank] that don't exist in the [[apertium-kaz|Kazakh analyser]]. Find as many examples of missing stems as you can. Feel free to write a script to automate this so it's as exhaustive (and non-exhausting:) as possible. You may either add what you find to the analyser yourself, commit a list of the missing stems to apertium-kaz/dev, or send a list to your mentor so that they may do one of these.
  +
|tags=treebank, Kazakh, analyses
  +
|beginner=yes
  +
}}
  +
{{Taskidea
  +
|type=quality
  +
|mentors=Jonathan, Ilnar
  +
|title=Find missing analyses in the Kazakh treebank that are not in the Kazakh analyser
  +
|description=There are quite a few analyses in the [http://svn.code.sf.net/p/apertium/svn/languages/apertium-kaz/texts/puupankki/puupankki.kaz.conllu Kazakh treebank] that don't exist in the [[apertium-kaz|Kazakh analyser]]. Find as many examples of missing analyses (for existing stems) as you can. Feel free to write a script to automate this so it's as exhaustive (and non-exhausting:) as possible. You may commit a list of the missing stems to apertium-kaz/dev or send a list to your mentor so that they may do this.
  +
|tags=treebank, Kazakh, analyses
  +
|beginner=yes
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Use apertium-init to bootstrap a new language module
  +
|description=Use [[Apertium-init]] to bootstrap a new language module that doesn't currently exist in Apertium. To see if a language is available, check [[languages]] and [[incubator]], and especially ask on IRC. Add enough stems and morphology to the module so that it analyses and generates at least 100 correct forms. Check your code into Apertium's codebase. [[Task ideas for Google Code-in/Add words from frequency list|Read more about adding stems...]]
  +
|tags=languages, bootstrap, dictionaries
  +
|beginner=yes
  +
|multi=25
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Use apertium-init to bootstrap a new language pair
  +
|description=Use [[Apertium-init]] to bootstrap a new translation pair between two languages which have monolingual modules already in Apertium. To see if a translation pair has already been made, check our [[SVN]] repository, and especially ask on IRC. Add 100 common stems to the dictionary. Check your work into Apertium's codebase.
  +
|tags=languages, bootstrap, dictionaries, translators
  +
|beginner=yes
  +
|multi=25
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan, mlforcada
  +
|title=Add a transfer rule to an existing translation pair
  +
|description=Add a transfer rule to an existing translation pair that fixes an error in translation. Document the rule on the [http://wiki.apertium.org/ Apertium wiki] by adding a [[regression testing|regression tests]] page similar to [[English_and_Portuguese/Regression_tests]] or [[Icelandic_and_English/Regression_tests]]. Check your code into Apertium's codebase. [[Task ideas for Google Code-in/Add transfer rule|Read more...]]
  +
|tags=languages, bootstrap, transfer
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Add stems to an existing translation pair
  +
|description=Add 1000 common stems to the dictionary of an existing translation pair. Check your work into Apertium's codebase. [[Task ideas for Google Code-in/Add words from frequency list|Read more about adding stems...]]
  +
|tags=languages, bootstrap, dictionaries, translators
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Write 10 lexical selection to an existing translation pair
  +
|description=Add 10 lexical selection rules to an existing translation pair. Check your work into Apertium's codebase. [[Task ideas for Google Code-in/Add lexical-select rules|Read more...]]
  +
|tags=languages, bootstrap, lexical selection, translators
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Write 10 constraint grammar rules for an existing language module
  +
|description=Add 10 constraint grammar rules to an existing language that you know. Check your work into Apertium's codebase. [[Task ideas for Google Code-in/Add constraint-grammar rules|Read more...]]
  +
|tags=languages, bootstrap, constraint grammar
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code,interface
  +
|mentors=Jonathan
  +
|title=Paradigm generator webpage
  +
|description=Write a standalone webpage that makes queries (though javascript) to an [[apertium-apy]] server to fill in a morphological forms based on morphological tags that are hidden throughout the body of the page. For example, say you have the verb "say", and some tags like inf, past, pres.p3.sg—these forms would get filled in as "say", "said", "says".
  +
|tags=javascript, html, apy
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Anna
  +
|title=Train a new model for syntactic function labeller
  +
|description=Choose one of the languages Apertium uses in language pairs and prepare training data for the labeller from its UD-treebank: replace UD tags with Apertium tags, parse the treebank, create fastText embeddings. Then train a new model on this data and evaluate an accuracy.
  +
|tags=python, UD, embeddings, machine learning
  +
|multi=5
  +
}}
  +
{{Taskidea
  +
|type=code,quality
  +
|mentors=Anna
  +
|title=Tuning a learning rate for syntactic function labeller's RNN
  +
|description=Syntactic function labeller uses RNN for training and predicting syntactic functions of words. Current models can be improved by tuning training parameters, e.g. learning rate parameter.
  +
|tags=python, machine learning
 
}}
 
}}
 
</table>
 
</table>

Revision as of 10:50, 15 November 2017

Contents

This is the task ideas page for Google Code-in, here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:

  1. this does not include time taken to install / set up apertium (and relevant tools).
  2. this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.

Categories:

  • code: Tasks related to writing or refactoring code
  • documentation: Tasks related to creating/editing documents and helping others learn more
  • research: Tasks related to community management, outreach/marketting, or studying problems and recommending solutions
  • quality: Tasks related to testing and ensuring code is of high quality.
  • interface: Tasks related to user experience research or user interface design and interaction

Clarification of "multiple task" types

  • multi = number of students who can do a given task
  • dup = number of times a student can do the same task

You can find descriptions of some of the mentors here.

Task ideas