Difference between revisions of "Task ideas for Google Code-in"

From Apertium
Jump to navigation Jump to search
(46 intermediate revisions by 11 users not shown)
Line 23: Line 23:
   
 
==Task ideas==
 
==Task ideas==
<table class="sortable wikitable">
+
<table class="sortable wikitable" style="display: none">
  +
<!-- THE TASKS NEED TO BE HIDDEN FOR NOW,
  +
but feel free to remove style="display: none" to preview changes to this page.
  +
Just remember to put it back before saving
  +
JNW 2017-10-30
  +
-->
 
<tr><th>type</th><th>title</th><th>description</th><th>tags</th><th>mentors</th><th>bgnr?</th><th>multi?</th><th>duplicates</th></tr>
 
<tr><th>type</th><th>title</th><th>description</th><th>tags</th><th>mentors</th><th>bgnr?</th><th>multi?</th><th>duplicates</th></tr>
  +
{{Taskidea
  +
|type=research
  +
|title=Document resources for a language
  +
|description=Document resources for a language without resources already documented on the Apertium wiki. [[Task ideas for Google Code-in/Documentation of resources|read more...]]
  +
|tags=wiki, languages
  +
|mentors=Jonathan, Vin, Xavivars, Marc Riera
  +
|multi=40
  +
|beginner=yes
  +
}}{{Taskidea
  +
|type=research
  +
|title=Write a contrastive grammar
  +
|description=Document 6 differences between two (preferably related) languages and where they would need to be addressed in the [[Apertium pipeline]] (morph analysis, transfer, etc). Use a grammar book/resource for inspiration. Each difference should have no fewer than 3 examples. Put your work on the Apertium wiki under [[Language1_and_Language2/Contrastive_grammar]]. See [[Farsi_and_English/Pending_tests]] for an example of a contrastive grammar that a previous GCI student made.
  +
|mentors=Vin, Jonathan, Fran, mlforcada
  +
|tags=wiki, languages
  +
|beginner=yes
  +
|multi=40
  +
}}
 
{{Taskidea|type=interface|mentors=Fran, Masha, Jonathan
 
{{Taskidea|type=interface|mentors=Fran, Masha, Jonathan
 
|tags=annotation, annotatrix
 
|tags=annotation, annotatrix
Line 37: Line 59:
 
}}
 
}}
 
{{Taskidea|type=code|mentors=Fran, Masha, Jonathan, Vin
 
{{Taskidea|type=code|mentors=Fran, Masha, Jonathan, Vin
|tags=annotation, annotatrix, javascript
+
|tags=annotation, annotatrix, javascript, dependencies
 
|title=SDparse to CoNLL-U converter in JavaScript
 
|title=SDparse to CoNLL-U converter in JavaScript
 
|description=SDparse is a format for describing dependency trees, they look like relation(head, dependency). CoNLL-U is another
 
|description=SDparse is a format for describing dependency trees, they look like relation(head, dependency). CoNLL-U is another
 
format for describing dependency trees. Make a converter between the two formats. You will probably need to learn more about the specifics of these formats. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/88 here].
 
format for describing dependency trees. Make a converter between the two formats. You will probably need to learn more about the specifics of these formats. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/88 here].
 
}}
 
}}
{{Taskidea|type=quality|mentors=Fran, Masha
+
{{Taskidea|type=quality|mentors=Fran, Masha, Vin
 
|tags=annotation, annotatrix
 
|tags=annotation, annotatrix
 
|title=Write a test for the format converters in annotatrix
 
|title=Write a test for the format converters in annotatrix
Line 53: Line 75:
 
|description=It is possible to detect invalid trees (such as those that have cycles). We would like to write a function to detect those kinds of trees and advise the user. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/96 here].
 
|description=It is possible to detect invalid trees (such as those that have cycles). We would like to write a function to detect those kinds of trees and advise the user. The GitHub issue is [https://github.com/jonorthwash/ud-annotatrix/issues/96 here].
 
}}
 
}}
{{Taskidea|type=documentation|mentors=Fran, Masha, Jonathan
+
{{Taskidea|type=documentation|mentors=Fran, Masha, Jonathan, Vin
|tags=annotation, annotatrix
+
|tags=annotation, annotatrix, dependencies
 
|title=Write a tutorial on how to use annotatrix to annotate a dependency tree
 
|title=Write a tutorial on how to use annotatrix to annotate a dependency tree
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities in the app, for example tokenisation options.
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities in the app, for example tokenisation options.
 
}}
 
}}
{{Taskidea|type=documentation|mentors=Fran, Masha
+
{{Taskidea|type=documentation|mentors=Fran, Masha, Vin
|tags=annotation, annotatrix, video
+
|tags=annotation, annotatrix, video, dependencies
 
|title=Make a video tutorial on annotating a dependency tree using the [https://github.com/jonorthwash/ud-annotatrix/ UD annotatrix software].
 
|title=Make a video tutorial on annotating a dependency tree using the [https://github.com/jonorthwash/ud-annotatrix/ UD annotatrix software].
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities available in the app, for example tokenisation options.
 
|description=Give step by step instructions to annotating a dependency tree with Annotatrix. Make sure you include all possibilities available in the app, for example tokenisation options.
Line 70: Line 92:
 
$ svn diff --old apertium-pol.pol.dix@73196 --new apertium-pol.pol.dix@73199 > changes.diff
 
$ svn diff --old apertium-pol.pol.dix@73196 --new apertium-pol.pol.dix@73199 > changes.diff
 
}}
 
}}
{{Taskidea|type=quality|mentors=fotonzade, Jonathan
+
{{Taskidea|type=quality|mentors=fotonzade, Jonathan, Xavivars, Marc Riera, mlforcada
 
|tags=xml, dictionaries, svn
 
|tags=xml, dictionaries, svn
 
|title=Add 200 new entries to a bidix to language pair %AAA%-%BBB%
 
|title=Add 200 new entries to a bidix to language pair %AAA%-%BBB%
|description=Our translation systems require large lexicons so as to provide production-quality coverage of any input data. This task requires the student to add 500 new words to a bidirectional dictionary.
+
|description=Our translation systems require large lexicons so as to provide production-quality coverage of any input data. This task requires the student to add 200 new words to a bidirectional dictionary.
 
|multi=yes
 
|multi=yes
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
{{Taskidea|type=quality|mentors=fotonzade, Jonathan
+
{{Taskidea|type=quality|mentors=fotonzade, Jonathan, Xavivars, Marc Riera, mlforcada
 
|tags=xml, dictionaries, svn
 
|tags=xml, dictionaries, svn
 
|title=Add 500 new entries to a bidix to language pair %AAA%-%BBB%
 
|title=Add 500 new entries to a bidix to language pair %AAA%-%BBB%
Line 83: Line 105:
 
|multi=yes
 
|multi=yes
 
}}
 
}}
{{Taskidea|type=quality|mentors=fotonzade|tags=disambiguation, svn
+
{{Taskidea|type=quality|mentors=fotonzade, Xavivars, Marc Riera, mlforcada
  +
|tags=disambiguation, svn
 
|title=Disambiguate 500 tokens of text in %AAA%
 
|title=Disambiguate 500 tokens of text in %AAA%
 
|description=Run some text through a morphological analyser and disambiguate the output. Contact the mentor beforehand to approve the choice of language and text.
 
|description=Run some text through a morphological analyser and disambiguate the output. Contact the mentor beforehand to approve the choice of language and text.
Line 96: Line 119:
 
{{Taskidea
 
{{Taskidea
 
|type=documentation
 
|type=documentation
|mentors=Jonathan
+
|mentors=Jonathan, Flammie
 
|title=add comments to .dix file symbol definitions
 
|title=add comments to .dix file symbol definitions
 
|tags=dix
 
|tags=dix
Line 111: Line 134:
 
|title=conllu parser and searching
 
|title=conllu parser and searching
 
|description=Write a script (preferably in python3) that will parse files in conllu format, and perform basic searches, such as "find a node that has an nsubj relation to another node that has a noun POS" or "find all nodes with a cop label and a past feature"
 
|description=Write a script (preferably in python3) that will parse files in conllu format, and perform basic searches, such as "find a node that has an nsubj relation to another node that has a noun POS" or "find all nodes with a cop label and a past feature"
|tags=python,dependencies
+
|tags=python, dependencies
|mentors=Jonathan, Fran, Wei En
+
|mentors=Jonathan, Fran, Wei En, Anna
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 143: Line 166:
 
|type=code
 
|type=code
 
|title=add an option for reverse compiling to the [[lsx module]]
 
|title=add an option for reverse compiling to the [[lsx module]]
|mentors=Jonathan, Fran, Wei En, Irene
+
|mentors=Jonathan, Fran, Wei En, Irene, Xavivars
 
|description=this should be simple as it can just leverage the existing lttoolbox options for left-right / right-left compiling
 
|description=this should be simple as it can just leverage the existing lttoolbox options for left-right / right-left compiling
 
|tags=C++, transducers, lsx
 
|tags=C++, transducers, lsx
  +
}}{{Taskidea
  +
|type=quality, code
  +
|title=clean up lsx-comp
  +
|mentors=Jonathan, Fran, Wei En, Irene, Xavivars
  +
|description=remove extraneous functions from lsx-comp and clean up the code
  +
|tags=C++, transducers, lsx
  +
}}{{Taskidea
  +
|type=quality, code
  +
|title=clean up lsx-proc
  +
|mentors=Jonathan, Fran, Wei En, Irene, Xavivars
  +
|description=remove extraneous functions from lsx-proc and clean up the code
  +
|tags=C++, transducers, lsx
  +
}}{{Taskidea
  +
|type=documentation
  +
|title=document usage of the lsx module
  +
|mentors= Irene
  +
|description= document which language pairs have included the lsx module in its package, which have beta-tested the lsx module, and which are good candidates for including support for lsx. add to [[Lsx_module/supported_languages | this wiki page]]
  +
|tags=C++, transducers, lsx
  +
|beginner=yes
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=quality
 
|type=quality
  +
|title=beta testing the lsx-module
|title=remove extraneous functions from lsx-comp and clean up the code
 
 
|mentors=Jonathan, Fran, Wei En, Irene
 
|mentors=Jonathan, Fran, Wei En, Irene
  +
|description= [[Lsx_module#Creating_the_lsx-dictionary|create an lsx dictionary]]for any relevant and existing language pair that doesn't yet support it, adding 10-30 entries to it. Thoroughly test to make sure the output is as expected. report bugs/non-supported features and add them to [[Lsx_module#Future_work| future work]]. Document your tested language pair by listing it under [[Lsx_module#Beta_testing]] and in [[Lsx_module/supported_languages | this wiki page]]
|description=
 
 
|tags=C++, transducers, lsx
 
|tags=C++, transducers, lsx
  +
|multi=yes
  +
|dup=yes
 
}}{{Taskidea
 
}}{{Taskidea
|type=quality
+
|type=code
|title=remove extraneous functions from lsx-proc and clean up the code
+
|title=fix an lsx bug / add an lsx feature
 
|mentors=Jonathan, Fran, Wei En, Irene
 
|mentors=Jonathan, Fran, Wei En, Irene
  +
|description= if you've done the above task (beta testing the lsx-module) and discovered any bugs or unsupported features, fix them.
|description=
 
 
|tags=C++, transducers, lsx
 
|tags=C++, transducers, lsx
  +
|multi=yes
  +
|dup=yes
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=code
 
|type=code
 
|title=script to test coverage over wikipedia corpus
 
|title=script to test coverage over wikipedia corpus
|mentors=Jonathan, Wei En
+
|mentors=Jonathan, Wei En, Shardul
 
|description=Write a script (in python or ruby) that in one mode checks out a specified language module to a given directory, compiles it (or updates it if already existant), and then gets the most recently nightly wikipedia archive for that language and runs coverage over it (as much in RAM if possible). In another mode, it compiles the language pair in a docker instance that it then disposes of after successfully running coverage. Scripts exist in Apertium already for finding where a wikipedia is, extracting a wikipedia archive into a text file, and running coverage.
 
|description=Write a script (in python or ruby) that in one mode checks out a specified language module to a given directory, compiles it (or updates it if already existant), and then gets the most recently nightly wikipedia archive for that language and runs coverage over it (as much in RAM if possible). In another mode, it compiles the language pair in a docker instance that it then disposes of after successfully running coverage. Scripts exist in Apertium already for finding where a wikipedia is, extracting a wikipedia archive into a text file, and running coverage.
 
|tags=python, ruby, wikipedia
 
|tags=python, ruby, wikipedia
 
}}{{Taskidea
 
}}{{Taskidea
 
|type=quality,code
 
|type=quality,code
  +
|tag=issues
 
|title=fix any open ticket
 
|title=fix any open ticket
 
|description=Fix any open ticket in any of our issues trackers: [https://sourceforge.net/p/apertium/tickets/ main], [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak]. When you claim this task, let your mentor know which issue you plan to work on.
 
|description=Fix any open ticket in any of our issues trackers: [https://sourceforge.net/p/apertium/tickets/ main], [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak]. When you claim this task, let your mentor know which issue you plan to work on.
|mentors=Jonathan, Wei En, Sushain
+
|mentors=Jonathan, Wei En, Sushain, Shardul
 
|multi=25
 
|multi=25
 
|dup=10
 
|dup=10
Line 177: Line 224:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, apertium.org and generally any [https://github.com/goavki/apertium-html-tools html-tools] installation fails lots of Chrome audit tests. As many as possible should be fixed. Ones that require substantial work should be filed as tickets and measures should be taken to prevent problems from reappearing (e.g. a test or linter rule). More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/201 #201]) and asynchronous discussion should occur there.
 
|description=Currently, apertium.org and generally any [https://github.com/goavki/apertium-html-tools html-tools] installation fails lots of Chrome audit tests. As many as possible should be fixed. Ones that require substantial work should be filed as tickets and measures should be taken to prevent problems from reappearing (e.g. a test or linter rule). More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/201 #201]) and asynchronous discussion should occur there.
|mentors=Jonathan, Sushain
+
|mentors=Jonathan, Sushain, Shardul
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 184: Line 231:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] uses Bootstrap 3.x. Bootstrap 4 beta is out and we can upgrade (hopefully)! If an upgrade is not possible, you should document why it's not and ensure that it's easy to upgrade when the blockers are removed. More information may be available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/200 #200]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] uses Bootstrap 3.x. Bootstrap 4 beta is out and we can upgrade (hopefully)! If an upgrade is not possible, you should document why it's not and ensure that it's easy to upgrade when the blockers are removed. More information may be available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/200 #200]) and asynchronous discussion should occur there.
|mentors=Sushain
+
|mentors=Sushain, Shardul
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 192: Line 239:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has an "APy" mode where users can easily test out the API. However, it doesn't display the actual URL of the API endpoint and it would be nice to show that to the user. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/147 #147]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has an "APy" mode where users can easily test out the API. However, it doesn't display the actual URL of the API endpoint and it would be nice to show that to the user. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/147 #147]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Shardul
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 200: Line 247:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has no tests (sad!). This task requires researching what solutions there are for testing jQuery based web applications and putting one into place with a couple tests as a proof of concept. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/116 #116]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has no tests (sad!). This task requires researching what solutions there are for testing jQuery based web applications and putting one into place with a couple tests as a proof of concept. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/116 #116]) and asynchronous discussion should occur there.
|mentors=Sushain
+
|mentors=Sushain, Shardul
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 207: Line 254:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] is capable of translating files. However, this translation does not always result in the file immediately being download to the user on all browsers. It would be awesome if it did! This task requires researching what solutions there are, evaluating them against each other and it may result in a conclusion that it just isn't possible (yet). More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/97 #97]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] is capable of translating files. However, this translation does not always result in the file immediately being download to the user on all browsers. It would be awesome if it did! This task requires researching what solutions there are, evaluating them against each other and it may result in a conclusion that it just isn't possible (yet). More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/97 #97]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan, Unhammer
+
|mentors=Sushain, Jonathan, Unhammer, Shardul
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 214: Line 261:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] relies on an API endpoint to translate documents, files, etc. However, when this API is down the interface also breaks! This task requires fixing this breakage. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/207 #207]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] relies on an API endpoint to translate documents, files, etc. However, when this API is down the interface also breaks! This task requires fixing this breakage. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/207 #207]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Shardul
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 222: Line 269:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] is capable of displaying results/allowing input for RTL languages in a LTR context (e.g. we're translating Arabic in an English website). However, this doesn't always look exactly how it should look, i.e. things are not aligned correctly. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/49 #49]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] is capable of displaying results/allowing input for RTL languages in a LTR context (e.g. we're translating Arabic in an English website). However, this doesn't always look exactly how it should look, i.e. things are not aligned correctly. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/49 #49]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Shardul
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 230: Line 277:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=There has been much demand for [https://github.com/goavki/apertium-html-tools html-tools] to support an interface for users making suggestions regarding e.g. incorrect translations (c.f. Google translate). An interface was designed for this purpose. However, since it has been a while since anyone touched it, the code now conflicts with the current master branch. This task requires de-conflicting this [https://github.com/goavki/apertium-html-tools/pull/74 branch] with master and providing screenshot/video(s) of the interface to show that it functions. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/74 #74]) and asynchronous discussion should occur there.
 
|description=There has been much demand for [https://github.com/goavki/apertium-html-tools html-tools] to support an interface for users making suggestions regarding e.g. incorrect translations (c.f. Google translate). An interface was designed for this purpose. However, since it has been a while since anyone touched it, the code now conflicts with the current master branch. This task requires de-conflicting this [https://github.com/goavki/apertium-html-tools/pull/74 branch] with master and providing screenshot/video(s) of the interface to show that it functions. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/74 #74]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Shardul
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 237: Line 284:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] supports website translation. However, if asked to translate itself, weird things happen and the interface does not properly load. This task requires figuring out the root problem and correcting the fault. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/203 #203]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] supports website translation. However, if asked to translate itself, weird things happen and the interface does not properly load. This task requires figuring out the root problem and correcting the fault. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/203 #203]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Shardul
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 245: Line 292:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] supports translation using language variants. However, we do not have first-class style/interface support for it. This task requires speaking with mentors/reading existing discussion to understand the problem and then produce design mockups for a solution. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/82 #82]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] supports translation using language variants. However, we do not have first-class style/interface support for it. This task requires speaking with mentors/reading existing discussion to understand the problem and then produce design mockups for a solution. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/82 #82]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan, Fran
+
|mentors=Sushain, Jonathan, Fran, Shardul, Xavivars
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 252: Line 299:
 
|tags=javascript, html, css, web
 
|tags=javascript, html, css, web
 
|description=Significant progress has been made towards providing a dictionary-style interface within [https://github.com/goavki/apertium-html-tools html-tools]. This task requires refining the existing [https://github.com/goavki/apertium-html-tools/pull/184 PR] by de-conflicting it with master and resolving the interface concerns discussed [https://github.com/goavki/apertium-html-tools/pull/184#issuecomment-323597780 here]. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/105 #105]) and asynchronous discussion should occur there.
 
|description=Significant progress has been made towards providing a dictionary-style interface within [https://github.com/goavki/apertium-html-tools html-tools]. This task requires refining the existing [https://github.com/goavki/apertium-html-tools/pull/184 PR] by de-conflicting it with master and resolving the interface concerns discussed [https://github.com/goavki/apertium-html-tools/pull/184#issuecomment-323597780 here]. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/105 #105]) and asynchronous discussion should occur there.
|mentors=Sushain, Jonathan
+
|mentors=Sushain, Jonathan, Xavivars
 
}}
 
}}
 
{{Taskidea
 
{{Taskidea
Line 259: Line 306:
 
|tags=html, css, web
 
|tags=html, css, web
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has inline styles. These are not very maintainable and widely considered as bad style. This task requires surveying the uses, removing all of them in a clean manner, i.e. semantically, and re-enabling the linter rule that will prevent them going forward. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/114 #114]) and asynchronous discussion should occur there.
 
|description=Currently, [https://github.com/goavki/apertium-html-tools html-tools] has inline styles. These are not very maintainable and widely considered as bad style. This task requires surveying the uses, removing all of them in a clean manner, i.e. semantically, and re-enabling the linter rule that will prevent them going forward. More information is available in the issue tracker ([https://github.com/goavki/apertium-html-tools/issues/114 #114]) and asynchronous discussion should occur there.
|mentors=Sushain
+
|mentors=Sushain, Shardul, Xavivars
 
|bgnr=yes
 
|bgnr=yes
 
}}
 
}}
Line 369: Line 416:
 
|mentors=Jonathan
 
|mentors=Jonathan
 
|tags=javascript
 
|tags=javascript
  +
}}{{Taskidea
  +
|type=code
  +
|title=Scrape Crimean Tatar Quran translation from a website
  +
|description=Bible and Quran translations often serve as a parallel corpus useful for solving NLP tasks because both texts are available in many languages. Your goal in this task is to write a program in the language of your choice which scrapes the Quran translation in the Crimean Tatar language available on the following website: http://crimean.org/islam/koran/dizen-qurtnezir/. You can adapt the scraper described on the [[Writing a scraper]] page or write your own from scratch. The output should be plain text in Tanzil format ('text with aya numbers'). You can see examples of that format on http://tanzil.net/trans/ page. When scraping, please be polite and request data at a reasonable rate.
  +
|mentors=Ilnar, Jonathan, fotonzade
  +
|tags=scraper
  +
}}{{Taskidea
  +
|type=code
  +
|title=Scrape Quran translations from a website
  +
|description=Bible and Quran translations often serve as a parallel corpus useful for solving NLP tasks because both texts are available in many languages. Your goal in this task is to write a program in the language of your choice which scrapes the Quran translations available on the following website: http://www.quran-ebook.com/. You can adapt the scraper described on the [[Writing a scraper]] page or write your own from scratch. The output should be plain text in Tanzil format ('text with aya numbers'). You can see examples of that format on http://tanzil.net/trans/ page. Before starting, check whether the translation is not already available on the Tanzil project's page (no need to re-scrape those, but you should use them to test the output of your program). Although the format of the translations seems to be the same and thus your program is expected to work for all of them, translations we are interested the most are the following: [http://www.quran-ebook.com/azerbaijan_version2/1.html Azerbaijani version 2], [http://www.quran-ebook.com/bashkir_version/index_ba.html Bashkir], [http://www.quran-ebook.com/chechen_version/index_cech.html Chechen], [http://www.quran-ebook.com/karachayevo_version/index_krc.html Karachay] and [http://www.quran-ebook.com/kyrgyzstan_version/index_kg.html Kyrgyz]. When scraping, please be polite and request data at a reasonable rate.
  +
|mentors=Ilnar, Jonathan, fotonzade
  +
|tags=scraper
  +
}}{{Taskidea
  +
|type=documentation
  +
|title=Unified documentation on Apertium visualisers
  +
|description=There are currently three prototype visualisers for the translation pairs Apertium offers: [https://github.com/jonorthwash/Apertium-Global-PairViewer Apertium Globe Viewer] and [http://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/pairviewer/apertium.html apertium pair viewer] and [http://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/family-visualizations/ language family visualisation tool]. Make a page on the Apertium wiki that showcases these three visualisers and links to further documentation on each. If documentation for any of them is available somewhere other than the Apertium wiki, then (assuming compatible licenses) integrate it into the Apertium wiki, with a link back to the original.
  +
|mentors=Jonathan
  +
|tags=wiki, visualisers
  +
}}{{Taskidea|type=research|mentors=Jonathan
  +
|title=Investigate FST backends for Swype-type input
  +
|description=Investigate what options exist for implementing an FST (of the sort used in Apertium [[spell checking]]) for auto-correction into an existing open source Swype-type input method on Android. You don't need to do any coding, but you should determine what would need to be done to add an FST backend into the software. Write up your findings on the Apertium wiki.
  +
|mentors=Jonathan
  +
|tags=spelling,android
  +
}}{{Taskidea|type=research|mentors=Jonathan
  +
|title=tesseract interface for apertium languages
  +
|description=Find out what it would take to integrate apertium or voikkospell into tesseract. Document thoroughly available options on the wiki.
  +
|tags=spelling,ocr
  +
}}{{Taskidea
  +
|type=documentation
  +
|mentors=Jonathan, Shardul
  +
|title=Integrate documentation of the Apertium deformatter/reformatter into system architecture page
  +
|description=Integrate documentation of the Apertium deformatter and reformatter into the wiki page on the [[Apertium system architecture]].
  +
|tags=wiki, architecture
  +
}}{{Taskidea
  +
|type=documentation
  +
|mentors=Jonathan, Shardul
  +
|title=Document a full example through the Apertium pipeline
  +
|description=Come up with an example sentence that could hypothetically rely on each stage of the [[Apertium pipeline]], and show the input and output of each stage under the [[Apertium_system_architecture#Example_translation_at_each_stage|Example translation at each stage]] section on the Apertium wiki.
  +
|tags=wiki, architecture
  +
}}{{Taskidea
  +
|type=documentation
  +
|mentors=Jonathan, Shardul
  +
|title=Create a visual overview of structural transfer rules
  +
|description=Based on an [https://wikis.swarthmore.edu/ling073/Structural_transfer existing overview of Apertium structural transfer rules], come up with a visual presentation of transfer rules that shows what parts of a set of rules correspond to which changes in input and output, and also which definitions are used where in the rules. Get creative—you can do this all in any format easily viewed across platforms, especially as a webpage using modern effects like those offered by d3 or similar.
  +
|tags=wiki, architecture, visualisations, transfer
  +
}}{{Taskidea
  +
|type=documentation
  +
|mentors=Jonathan
  +
|title=Complete the Linguistic Data chart on Apertium system architecture wiki page
  +
|description=With the assistance of the Apertium community (our [[IRC]] channel) and the resources available on the Apertium wiki, fill in the remaining cells of the table in the "Linguistic data" section of the [[Apertium system architecture]] wiki page.
  +
|tags=wiki, architecture
  +
|beginner=yes
  +
}}{{Taskidea
  +
|type=research
  +
|mentors=Fran
  +
|title=Do a literature review on anaphora resolution
  +
|description=Anaphora resolution (see the [[anaphora resolution|wiki page]] is the task of determining for a pronoun or other item with reference what it refers to. Do a literature review and write up common methods with their success rates.
  +
|tags=anaphora, rbmt, engine
  +
|beginner=
  +
}}{{Taskidea
  +
|type=research
  +
|mentors=Fran
  +
|title=Write up grammatical tables for a grammar of a language that Apertium doesn't have an analyser for
  +
|description=Many descriptive grammars have useful tables that can be used for building morphological analysers. Unfortunately they are in Google Books or in paper and not easily processable by machine. The objective is to find a grammar of a language for which Apertium doesn't have a morphological analyser and write up the tables on a Wiki page.
  +
|tags=grammar, books, data-entry
  +
|beginner=
  +
}}{{Taskidea
  +
|type=research
  +
|mentors=Fran, Xavivars
  +
|title=Phrasebooks and frequency
  +
|description=Apertium is quite terrible in general with phrasebook style sentences in most languages. Try translating "what's up" from English to Spanish. The objective of this task is to look for phrasebook/filler type sentences/utterances in parallel corpora of film subtitles and on the internet and order them by frequency/generality. Frequency is the amount of times you see the utterance, generality is in how many different places you see it.
  +
|tags=phrasebook, translation
  +
|beginner=
  +
}}
  +
{{Taskidea
  +
|type=research
  +
|mentors=Flammie
  +
|title=Hungarian Open Source dictionaries
  +
|description=There are currently 3+ open source Hungarian open source resources for morphological analysis/dictionaries, study and document on how to install these and get the words and their inflectional informations out, and e.g. tabulate some examples of similarities and differences of word classes/tags/stuff. See [[Hungarian]] for more info.
  +
|tags=hungarian
  +
|beginner=
  +
}}
  +
{{Taskidea
  +
|type=research
  +
|mentors=Vin, Jonathan, Anna
  +
|title=Create a UD-Apertium morphology mapping
  +
|description=Choose a language that has a Universal Dependencies treebank and tabulate a potential set of Apertium morph labels based on the (universal) UD morph labels. See Apertium's [[list of symbols]] and [http://universaldependencies.org/ UD]'s POS and feature tags for the labels.
  +
|tags=morphology, ud, dependencies
  +
|beginner=
  +
|multi=5
  +
}}
  +
{{Taskidea
  +
|type=research
  +
|mentors=Vin, Jonathan, Anna
  +
|title=Create an Apertium-UD morphology mapping
  +
|description=Choose a language that has an Apertium morphological analyser and adapt it to convert the morphology to UD morphology
  +
|tags=morphology, ud, dependencies
  +
|beginner=
  +
|multi=5
  +
}}
  +
{{Taskidea
  +
|type=research
  +
|mentors=Vin
  +
|title=Create a full verbal paradigm for an Indo-Aryan language
  +
|description=Choose a regular verb and create a paradigm with all possible tense/aspect/mood inflections for an Indo-Aryan language (except Hindi or Marathi). Use Masica's grammar as a reference.
  +
|tags=morphology, indo-aryan
  +
|beginner=
  +
|multi=10
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Vin
  +
|title=Create a syntactic analogy corpus for a particular POS/language.
  +
|description=Refer to the syntactic section of [https://www.aclweb.org/anthology/N/N16/N16-2002.pdf this paper]. Try to create a data set with more than 2000 * 8 = 16000 entries for a particular POS with any language, using a large corpus for frequency.
  +
|tags=morphology, embeddings
  +
|beginner=
  +
|multi=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Vin
  +
|title=Envision and create a quick utility for tasks like morphological lookup
  +
|description=Many tasks like morphological analysis are annoying to do by navigating to the right directory, typing out an entire pipeline etc. Write a bash script to simplify some of these procedures, taking into account the install paths and prefixes if necessary. eg. echo "hargle" \| ~/analysers/apertium-eng/eng.automorf.bin ==> morph "hargle" eng
  +
|tags=bash, scripting
  +
|beginner=yes
  +
|multi=10
  +
}}
  +
{{Taskidea
  +
|type=research,code
  +
|mentors=Vin
  +
|title=Use open-source OCR to convert open-source non-text news corpora to text. Evaluate an analyser's coverage on them.
  +
|description=Many languages that have online newspapers do not use actual text to store the news but instead use images or GIFs :((( find a newspaper for a language that lacks news text online (eg. Marathi), check licenses, find an OCR tool and scrape a reasonably large corpus from the images if doing so would not violate CC/GPL. Evaluate the morphological analyser on it.
  +
|tags=python,morphology
  +
|beginner=
  +
}}
  +
{{Taskidea
  +
|type=research,quality
  +
|mentors=Shardul, Jonathan
  +
|tags=issues, python
  +
|title=Clean up open issues in [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak], or [https://github.com/goavki/apertium-apy/issues APy]
  +
|description=Go through issue threads for [https://github.com/goavki/apertium-html-tools/issues html-tools], [https://github.com/goavki/phenny/issues begiak], or [https://github.com/goavki/apertium-apy/issues APy], and find issues that have been solved in the code but are still open on GitHub. (The fact that they have been solved may not be evident from the comments thread alone.) Once you find such an issue, comment on the thread explaining what code/commit fixed it and how it behaves at the latest revision.
  +
|multi=15
  +
}}
  +
{{Taskidea
  +
|type=code,quality
  +
|mentors=Shardul, Jonathan
  +
|tags=tests, python, IRC
  +
|title=Get [https://github.com/goavki/phenny begiak] to build cleanly
  +
|description=Currently, [https://github.com/goavki/phenny begiak] does not build cleanly because of a number of failing tests. Find what is causing the tests to fail, and either fix the code or the tests if the code has changed its behavior. Document all your changes in the PR that you create.
  +
}}
  +
{{Taskidea
  +
|type=quality
  +
|mentors=Jonathan, Ilnar
  +
|title=Find stems in the Kazakh treebank that are not in the Kazakh analyser
  +
|description=There are quite a few analyses in the [http://svn.code.sf.net/p/apertium/svn/languages/apertium-kaz/texts/puupankki/puupankki.kaz.conllu Kazakh treebank] that don't exist in the [[apertium-kaz|Kazakh analyser]]. Find as many examples of missing stems as you can. Feel free to write a script to automate this so it's as exhaustive (and non-exhausting:) as possible. You may either add what you find to the analyser yourself, commit a list of the missing stems to apertium-kaz/dev, or send a list to your mentor so that they may do one of these.
  +
|tags=treebank, Kazakh, analyses
  +
|beginner=yes
  +
}}
  +
{{Taskidea
  +
|type=quality
  +
|mentors=Jonathan, Ilnar
  +
|title=Find missing analyses in the Kazakh treebank that are not in the Kazakh analyser
  +
|description=There are quite a few analyses in the [http://svn.code.sf.net/p/apertium/svn/languages/apertium-kaz/texts/puupankki/puupankki.kaz.conllu Kazakh treebank] that don't exist in the [[apertium-kaz|Kazakh analyser]]. Find as many examples of missing analyses (for existing stems) as you can. Feel free to write a script to automate this so it's as exhaustive (and non-exhausting:) as possible. You may commit a list of the missing stems to apertium-kaz/dev or send a list to your mentor so that they may do this.
  +
|tags=treebank, Kazakh, analyses
  +
|beginner=yes
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Use apertium-init to bootstrap a new language module
  +
|description=Use [[Apertium-init]] to bootstrap a new language module that doesn't currently exist in Apertium. To see if a language is available, check [[languages]] and [[incubator]], and especially ask on IRC. Add enough stems and morphology to the module so that it analyses and generates at least 100 correct forms. Check your code into Apertium's codebase. [[Task ideas for Google Code-in/Add words from frequency list|Read more about adding stems...]]
  +
|tags=languages, bootstrap, dictionaries
  +
|beginner=yes
  +
|multi=25
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Use apertium-init to bootstrap a new language pair
  +
|description=Use [[Apertium-init]] to bootstrap a new translation pair between two languages which have monolingual modules already in Apertium. To see if a translation pair has already been made, check our [[SVN]] repository, and especially ask on IRC. Add 100 common stems to the dictionary. Check your work into Apertium's codebase.
  +
|tags=languages, bootstrap, dictionaries, translators
  +
|beginner=yes
  +
|multi=25
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan, mlforcada
  +
|title=Add a transfer rule to an existing translation pair
  +
|description=Add a transfer rule to an existing translation pair that fixes an error in translation. Document the rule on the [http://wiki.apertium.org/ Apertium wiki] by adding a [[regression testing|regression tests]] page similar to [[English_and_Portuguese/Regression_tests]] or [[Icelandic_and_English/Regression_tests]]. Check your code into Apertium's codebase. [[Task ideas for Google Code-in/Add transfer rule|Read more...]]
  +
|tags=languages, bootstrap, transfer
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Add stems to an existing translation pair
  +
|description=Add 1000 common stems to the dictionary of an existing translation pair. Check your work into Apertium's codebase. [[Task ideas for Google Code-in/Add words from frequency list|Read more about adding stems...]]
  +
|tags=languages, bootstrap, dictionaries, translators
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Write 10 lexical selection to an existing translation pair
  +
|description=Add 10 lexical selection rules to an existing translation pair. Check your work into Apertium's codebase. [[Task ideas for Google Code-in/Add lexical-select rules|Read more...]]
  +
|tags=languages, bootstrap, lexical selection, translators
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Jonathan
  +
|title=Write 10 constraint grammar rules for an existing language module
  +
|description=Add 10 constraint grammar rules to an existing language that you know. Check your work into Apertium's codebase. [[Task ideas for Google Code-in/Add constraint-grammar rules|Read more...]]
  +
|tags=languages, bootstrap, constraint grammar
  +
|multi=25
  +
|dup=5
  +
}}
  +
{{Taskidea
  +
|type=code,interface
  +
|mentors=Jonathan
  +
|title=Paradigm generator webpage
  +
|description=Write a standalone webpage that makes queries (though javascript) to an [[apertium-apy]] server to fill in a morphological forms based on morphological tags that are hidden throughout the body of the page. For example, say you have the verb "say", and some tags like inf, past, pres.p3.sg—these forms would get filled in as "say", "said", "says".
  +
|tags=javascript, html, apy
  +
}}
  +
{{Taskidea
  +
|type=code
  +
|mentors=Anna
  +
|title=Train a new model for syntactic function labeller
  +
|description=Choose one of the languages Apertium uses in language pairs and prepare training data for the labeller from its UD-treebank: replace UD tags with Apertium tags, parse the treebank, create fastText embeddings. Then train a new model on this data and evaluate an accuracy.
  +
|tags=python, UD, embeddings, machine learning
  +
|multi=5
  +
}}
  +
{{Taskidea
  +
|type=code,quality
  +
|mentors=Anna
  +
|title=Tuning a learning rate for syntactic function labeller's RNN
  +
|description=Syntactic function labeller uses RNN for training and predicting syntactic functions of words. Current models can be improved by tuning training parameters, e.g. learning rate parameter.
  +
|tags=python, machine learning
 
}}
 
}}
 
 
</table>
 
</table>
   

Revision as of 10:50, 15 November 2017

Contents

This is the task ideas page for Google Code-in, here you can find ideas on interesting tasks that will improve your knowledge of Apertium and help you get into the world of open-source development.

The people column lists people who you should get in contact with to request further information. All tasks are 2 hours maximum estimated amount of time that would be spent on the task by an experienced developer, however:

  1. this does not include time taken to install / set up apertium (and relevant tools).
  2. this is the time expected to take by an experienced developer, you may find that you spend more time on the task because of the learning curve.

Categories:

  • code: Tasks related to writing or refactoring code
  • documentation: Tasks related to creating/editing documents and helping others learn more
  • research: Tasks related to community management, outreach/marketting, or studying problems and recommending solutions
  • quality: Tasks related to testing and ensuring code is of high quality.
  • interface: Tasks related to user experience research or user interface design and interaction

Clarification of "multiple task" types

  • multi = number of students who can do a given task
  • dup = number of times a student can do the same task

You can find descriptions of some of the mentors here.

Task ideas