Difference between revisions of "Task ideas for Google Code-in"

Revision as of 22:25, 3 September 2017

code: Tasks related to writing or refactoring code
documentation: Tasks related to creating/editing documents and helping others learn more
research: Tasks related to community management, outreach/marketting, or studying problems and recommending solutions
quality: Tasks related to testing and ensuring code is of high quality.
interface: Tasks related to user experience research or user interface design and interaction

You can find descriptions of some of the mentors here: List_of_Apertium_mentors.

Task ideas

type	title	description	tags	mentors	bgnr?	multi?
code	Refactor/mege the main "processing" functions of lrx-proc	lrx-proc has two modes, "-m" mode and default mode. They are implemented by each their huge function, nearly identical to each other. Refactor the code to remove the redundancy, and run tests on lots of text with several language pairs to ensure no regressions.	c++	Fran, Unhammer
code	Profile and improve speed of lrx-proc	lrx-proc is slower than it should be. There is probably some low-hanging fruit. Try profiling it and implementing an improvement.	c++	Fran, Unhammer
research	See if you can precompile xpath expressions or xslt stylesheets	An XSLT stylesheet is a program for transforming XML trees. An Xpath expression is a way of specifying a node set in an XML tree. Investigate the possibility of pre-compiling either stylesheets or xpath expressions.	parsing	Fran
research	Review literature on linearisation of dependency trees	A dependency tree is an intermediate representation of a sentence with no implicit word order. Linearisation is finding the appropriate word order for a dependency tree. Do a survey of the available literature and write up a review.	parsing	Fran, Schindler
research	Manually annotate/Tag text in Apertium format	Take some running text, analyse it using an Apertium analyser then manually disambiguate the result.		Fran
code	Convert Chukchi Nouns to HFST/lexc	There is a freely available lexicon of Chukchi, a language spoken in the north-east of Russia. The objective of this task is to convert part of the lexicon covering nouns to lexc format, which is a formalism for specifying concatenative morphology.		Fran
code	Convert Chukchi Numerals to HFST/lexc	There is a freely available lexicon of Chukchi, a language spoken in the north-east of Russia. The objective of this task is to convert part of the lexicon covering nouns to lexc format, which is a formalism for specifying concatenative morphology.		Fran
code	Convert Chukchi Adjectives to HFST/lexc	There is a freely available lexicon of Chukchi, a language spoken in the north-east of Russia. The objective of this task is to convert part of the lexicon covering nouns to lexc format, which is a formalism for specifying concatenative morphology.		Fran
interface	Make a design for a web-based viewer for parallel treebanks	(also for viewing diff annotation for same sentence)	dependencies,parallel,web	Fran,Jonathan
code	Write a script to convert a UD treebank	for a given language to a format suitable for training the perceptron tagger
research	Train the perceptron tagger for a language	The perceptron tagger is a new part-of-speech tagger that was developed for Apertium in the Summer of Code. Take a language from languages and train the tagger for that language.		Fran
interface	Design an annotation tool for disambiguation	like c.f. webanno, corpus.mari-language.org, brat	disambiguation,annotation	Fran,Jonathan
interface	Design an annotation tool for adding dependencies	Like c.f. brat	dependencies,annotation	Fran,Jonathan
code	Train lexical selection rules	from a large parallel corpus for a language pair		Fran
documentation	Document how to set up the experiments for weighted transfer rules			Fran
code	convert UD treebank to apertium tags, use unigram tagger	(see #apertium logs 2016-06-22)
code	Write a script to extract sentences from CoNLL-U	where they have the same tokenisation as Apertium.		Fran, wei2912
documentation	convert [1] to apertium-style documentation			Schindler
code	Implement `lt-print --strings` lt-print -s		c++	Fran, wei2912
code	Implement lt-expand -n	Implement an algorithm that prints out a transducer but only follows n cycles.	c++	Fran, wei2912
code	in-browser globe with apertium languages as points	Use d3 globe to make an apertium language/pair viewer (like pairviewer), maybe based on this or this or this. This file contains coordinates of Apertium languages.	js,html,maps	Jonathan, kvld
code	Write a program to detect contexts where a path in a compiled transducer begins with a whitespace		c++
code	Make the lt-comp compiler print a warning when a path begins with a whitespace.	Common mistake in dix files is to have some bad whitespace at places, this needs to be aqutomatically detected in the compilation tool and warning to user issued.	c++
	apertium-mar-hin: make the TL morph for any part of speech less daft	Some morph in Marathi or Hindi are currently daft.	morphology	vin-ivar
	add indic scripts/formal latin transliterations	Translitteration is a ways to write stuffs in different scripts. Currently some indic scrpts are done only to some WX transliterator	python	vin-ivar
code	apertium-hin: more consistency with apertium-mar for verbs	Verbs in Marath and Hindi are incosistently.	morphology	vin-ivar
code	apertium-mar: replace cases with postpositions	Marathi cases are postpositions	morphology	vin-ivar
code	apertium-mar: fix modals and quasi-modals	Modals in Marathi need fixing	morphology	vin-ivar
code	refactor x file in apy	Reorganise apy code to be more readable, maintainable and so forth.		Putti
documentation	add docstrings to x file in apy	docstrings are a way to document python code that can be generated into documentation on the web or in python. See following PEPs in python.org		Putti, vin-ivar
quality	write 10 unit tests for apy			Putti, Unhammer, (sushain?)
code	add 1 transfer rule	Transfer rules are parts of translation process dealing with re-arranging, adding and deleting words. See also Short introduction to transfer		Fran, vin-ivar, zfe, kvld
code	add 50 entries to a bidix	Bilingual dictionary (bidix) contains word-to-word translations between languages, e.g. cat-chat or cat-Katze in English to French or German respectively. Add 50 of such word-translations to languages you know.		Fran, vin-ivar, zfe, kvld, Schindler
code	write 10 lexical selection rules	Write 10 lexical selection rules for a pair already set up with lexical selection		Fran, vin-ivar, zfe, Unhammer
code	write 10 constraint grammar rules	Constraint grammar is a rule-based approach of selecting linguistic readings from ambiguous cases, to improve translation quality etc. See introduction CG here:		Fran, vin-ivar, zfe, kvld, Unhammer
research	Document resources for a language	Document resources for a language without resources already documented on the wiki. read more...		Jonathan, vin-ivar, zfe, Schindler	X	X
research	Write a contrastive grammar	Document 6 differences between two (preferably related) languages and where they would need to be addressed (morph analysis, transfer, etc). Use a grammar book/resource for inspiration. Each difference should have no fewer than 3 examples. Put your work on the Apertium wiki under Language1_and_Language2/Contrastive_grammar. See Farsi_and_English/Pending_tests for an example of a contrastive grammar that a previous GCI student made.		vin-ivar, Jonathan, Fran, zfe, Schindler	X	X
research	apertium-hun: match existing apertium-hun paradigms with morphdb.hu	Morphdb.hu is another implementation of Hungarian morphology, that has a large lexicon. In order to convert it to apertium format, the classification of the words needs to be mapped to one used in apertium.	hun,dix	Flammie
code	apertium-hun: convert hunmorph.db into apertium	one of: See prerequisite task above.		Flammie
code	apertium-fin-eng: go through lexicon for potential rubbish words)	Apertium's Finnish–English dictionary has been converted from projects, like Finnwordnet, that hae a lot of pairs unsuitable for MT, find and delete them from the file.	fin,dix	Flammie
code	apertium-fin-eng: add words from apertium-fin-eng to apertium-eng	grep for English words in apertium-fin-eng.fin-eng.dix and classify them according to paradgims. See also: Apertium English)	eng,dix	Flammie
code	apertium-apy: add i/o formats)	Currently APY web queries get responses in ad hoc json format. Research and implement interoperabilities with further formats, such as:	apy	Flammie
code	apertium-apy: write metadata about apertium language pairs	CMDI format that can be deployed for CLARIN stuffs	apy	Flammie
code	apertium-apy: make more parts of apertium-pipeline on web	apertium.org has a web service interface for getting translations or morphological analyses. This should be extended for other functions of apertium as well. more information: Apertium Apy.	apy	Flammie
code	Finish suggest-a-word feature so it can be deployed to apertium.org	There exists a version from last GSOC of apertium.org translator where user can suggest fixes to unknown word translations among other things, but this is not deployed to server.	apy	Flammie
code	Further developments to suggest a word	Currently suggested words may be added to wiki by a service, it would make sense to also have e.g. chance to login and get attributed as contributor, as well as other stuff )	apy	Flammie
code	Fix ordering of dependencies in CG matxin format			Fran
code	CG syntax highlighting plugin for a text editor	Write a syntax file for your favourite text editor that provides fancy syntax highlighting for Constraint Grammar		vin-ivar, Unhammer, (Flammie?)
code	Package apertium-lint to install to a prefix	apertium-lint currently installs with pip, modify that to allow passing a flag for installing it to a prefix		vin-ivar
quality	Fix a bug in Apertium html-tools	Fix a currently open issue with html-tools in consultation with your mentor.	multi,html,js,html-tools	Unhammer, Jonathan, Kira		X
quality	Fix a bug in Apertium APy	Fix a currently open issue with APy in consultation with your mentor.	multi,python,apy	Unhammer, Jonathan, Kira		X
code	Script to get resources from GF	Write a script to scrape words from one particular paradigm in GF and make it usable in Apertium.		vin-ivar
code	Create a list of text editors compatible with different scripts	Create a list of ten text editors and document their status with representing human text (Latin), RTL text (Arabic), combining characters (Devanagari), etc. Document any bugs with eg. copy/paste and tab indentation.		vin-ivar
code	Write a script to strip apertium morphological information from CONLL-U files	Write a script to strip apertium morphological information from CONLL-U files so the dependency trees can be rendered okay by the online tools.		vin-ivar
research	Investigate FST backends for Swype-type input	Investigate what options exist for implementing an FST (of the sort used in Apertium spell checking) for auto-correction into an existing open source Swype-type input method on Android. You don't need to do any coding, but you should determine what would need to be done to add an FST backend into the software. Write up your findings on the Apertium wiki.	spelling,android	Jonathan
code	Fix a memory leak in matxin-transfer	The matxin-transfer program is a component of the Matxin MT system, a sister system to Apertium. Run valgrind on the code and find and fix a memory leak. There may be serveral.	c++	Fran
code	Write a tool helping to test a bidix coherence	This tool will generate a file with each lema of the main categories (at least nouns, adjectives ans verbs) found in a bidix. Then this file will be translated to the second language and back to the first one. Looking for changes will allow to detect transfer problems and changes of meaning.		Bech
quality	fix any begiak issue	Fix any open issue for begiak (Apertium's IRC bot), to be chosen in consultation with your mentor.	python,irc	Jonathan, sushain, wei2912		X
quality	merge phenny upstream into begiak	Merge upstream patches etc. into begiak (Apertium's IRC bot).	git,irc	Jonathan, sushain, Unhammer, wei2912
quality	open a pull request for merging begiak modules into upstream	Open a pull request to merge features from begiak (Apertium's IRC bot) into upstream.	git,irc	Jonathan, sushain, Unhammer, wei2912
code	begiak interface to Apertium's web API	Write a module for begiak (Apertium's IRC bot) that provides access to at least one feature of APy (Apertium's web API). You may want to base the code off begiak's Apertium translation module (which may not be in 100% working order...).	irc,apy	Jonathan, sushain, Unhammer, wei2912		X
research	tesseract interface for apertium languages	Find out what it would take to integrate apertium or voikkospell into tesseract. Document thoroughly available options on the wiki.	spelling,ocr	Jonathan
interface	Abstract the formatting for the Html-tools interface.	The interface for html-tools (Apertium's website framework) should be easily customisable so that people can make it look how they want. The task is to abstract the formatting and make one or more new stylesheets to change the appearance. This is basically making a way of "skinning" the interface.	css,html-tools	Jonathan,sushain
quality	improvements to lexc plugin for vim	A vim syntax definition file for lexc is presented on the following wiki page: Apertium-specific conventions for lexc#Syntax highlighting in vim. This plugin works, but it has some issues: (1) comments on LEXICON lines are not highlighted as comments, (2) editing lines with comments (or similar) can be really slow, (3) the lexicon a form points at is not highlighted distinctly from the form (e.g., in the line «асқабақ:асқабақ N1 ; ! "pumpkin"», N1 should be highlighted somehow). Modify or rewrite the plugin to fix these issues.	vim	Jonathan
code	Write a transliteration plugin for mediawiki	Write a mediawiki plugin similar in functionality (and perhaps implementation) to the way the Kazakh-language wikipedia's orthography changing system works (documented by a previous GCI student). It should be able to be directed to use any arbitrary mode from an apertium mode file installed in a pre-specified path on a server.	php	Jonathan
documentation	add comments to .dix file symbol definitions		dix	Schindler
documentation	find symbols that aren't on the list of symbols page	Go through symbol definitions in Apertium dictionaries in svn (.lexc and .dix format), and document any symbols you don't find on the List of symbols page. This task is fulfilled by adding at least one class of related symbols (e.g., xyz_*) or one major symbol (e.g., abc), along with notes about what it means.	wiki,lexc,dix	Schindler
code	conllu parser and searching	Write a script (preferably in python3) that will parse files in conllu format, and perform basic searches, such as "find a node that has an nsubj relation to another node that has a noun POS" or "find all nodes with a cop label and a past feature"	python,dependencies	Jonathan, Fran
code	group and count possible lemmas output by guesser	Currently a "guesser" version of Apertium transducers can output a list of possible analyses for unknown forms. Develop a new pipleine, preferably with shell scripts or python, that uses a guesser on all unknown forms in a corpus, and takes the list of all possible analyses, and output a hit count of the most common combinations of lemma and POS tag.	guesser,transducers,shellscripts	Jonathan, Fran
code	vim mode/tools for annotating dependency corpora in CG3 format	includes formatting, syntax highlighting, navigation, adding/removing nodes, updating node numbers, etc.	vim,dependencies,CG3	Jonathan, Fran
code	vim mode/tools for annotating dependency corpora in CoNLL-U format	includes formatting, syntax highlighting, navigation, adding/removing nodes, updating node numbers, etc.	vim,dependencies,conllu	Jonathan, Fran
quality	figure out one-to-many bug in the lsx module		C++,transducers,lsx	Jonathan, Fran
code	add an option for reverse compiling to the lsx module	this should be simple as it can just leverage the existing lttoolbox options for left-right / right-left compiling	C++,transducers,lsx	Jonathan, Fran
quality	remove extraneous functions from lsx-comp and clean up the code		C++,transducers,lsx	Jonathan, Fran
quality	remove extraneous functions from lsx-proc and clean up the code		C++,transducers,lsx	Jonathan, Fran
code	script to test coverage over wikipedia corpus	Write a script (in python or ruby) that in one mode checks out a specified language module to a given directory, compiles it (or updates it if already existant), and then gets the most recently nightly wikipedia archive for that language and runs coverage over it (as much in RAM if possible). In another mode, it compiles the language pair in a docker instance that it then disposes of after successfully running coverage.	python,ruby,wikipedia	Jonathan

@@ Line 350: / Line 350: @@
 |title=script to test coverage over wikipedia corpus
 |mentors=Jonathan
-|description=Write a script (in python or ruby) that in one mode checks out a specified language module to a given directory, compiles it (or updates it if already existant), and then gets the most recently nightly wikipedia archive for that language and runs coverage over it (as much in RAM if possible).  In another mode, it compiles the language pair in a ___ that it disposes of after successfully running coverage.
+|description=Write a script (in python or ruby) that in one mode checks out a specified language module to a given directory, compiles it (or updates it if already existant), and then gets the most recently nightly wikipedia archive for that language and runs coverage over it (as much in RAM if possible).  In another mode, it compiles the language pair in a docker instance that it then disposes of after successfully running coverage.
 |tags=python,ruby,wikipedia
 }}</table>

Difference between revisions of "Task ideas for Google Code-in"

Revision as of 22:25, 3 September 2017

Contents

Task ideas

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools