Task ideas for Google Code-in/Getting started

From Apertium
Jump to navigation Jump to search

This page will describe some steps you can take to get involved with the Apertium project in the Google Code-in. First of all, thanks for reading! We're very enthusiastic about getting new contributors to Apertium and to helping spread our passion for language technology.

First steps

So, what are the first steps ?

  • Talk to us! This is the most important step! Nothing in Apertium is too hard without the right amount of help. And we like helping, so just get in contact. The best way to contact us is on IRC, and the best way to use IRC is with a client like irssi,[1] weechat[2] or hexchat.[3]. A good tip is to hang out on IRC, even if no-one is talking when you enter. People can be in different time zones, and channel activity peaks depending on the time.
Here's a list of the IRC nicks and wiki usernames of some of the mentors who are regulars on IRC:
GCI name IRC nick wiki username
Jonathan W firespeaker, jonorthwash Firespeaker
Francis Tyers spectie, spectei, spectre Francis Tyers
Maria Shejanova maryszmary Masha
Aida Sundetova aida27 Aida
Kevin Brubeck Unhammer Unhammer Unhammer
Vinit Ravishankar vin-ivar Vin-ivar
Memduh Gökırmak fotonzade
Sushain Cherivirala sushain, sushain97 Sushain
Xavi Ivars xavivars Xavi Ivars
Irene Tang irene_ Irene
Shardul Chiplunkar shardulc Shardulc
Anna Kondratjeva deltamachine deltamachine
Vinay Singh SilentFlame SilentFlame
Matthew Marting m5w, m5w_ M5w
Tommi Pirinen Flammie
Inari Listenmaa inariksit Inariksit
Marc Riera mrieratrad Marc Riera


  • Install Apertium: Not all tasks require Apertium to be installed, but if you're planning to work with Apertium, it's a good idea to do this early.
  • Find an interesting task:

Useful guidelines

Things you might want to know.

Access

For some tasks, you may need access to Apertium resources, like the wiki or our subversion repository. Usually this is no problem—you just need ask a mentor or an org admin (ask on IRC above).

Tasks on github

For tasks relating to code on github (e.g., begiak, APy, and html-tools), you just need to clone the relevant repository, make your changes, and submit a pull request.

"Fix any bug" tasks

For tasks that point you at a repository and ask you to fix any bug, you should decide on a bug and tell your mentor which one you want to work on when you claim the task. You are also encouraged to come onto IRC (see above) and ask which bug might be a good one to work on given your background—i.e., discussing it with a mentor ahead of time.

Where is apertium code?

Apertium code is housed in several places:

  • Most code, including the core tools, translation and language modules, and a number of other things, live in our svn repo. The language data is found in the following places:
    • /languages - where stable monolingual language packages live
    • /incubator - where the initial stages of language data development takes place, and sometimes stagnates
    • /nursery - where translation modules that have begun to become useful/usable live
    • /staging - where translation modules that are nearly ready—but are still not quite ready for production-environment use—live
    • /trunk - where translation modules that are fully developed and considered stable live; also here is the main code base, etc.
  • Many tools are also in svn, specifically /trunk/apertium-tools.
  • Several tools live on github, including begiak (our IRC bot), APy (our web API), and html-tools (our website framework). The latter two of these are synchronised back into SVN (in /trunk/apertium-tools), but the main development occurs on github.

Language and translation modules

  • Most translation modules are structured in the form of apertium-xxx-yyy, meaning it's a module that translates from language xxx to langauge yyy (and potentially the other way around).
    • Some older language modules use two letter abbreviations, like apertium-xx-yy, but the standard now is three-letter
    • Monolingual language modules are named apertium-xxx, where xxx is the ISO 639-3 code for the language
    • All but some older translation modules rely on monolingual language modules
  • Some monolingual language modules are based on HFST, and some are based on lttoolbox.
  • You can install pre-compiled language and translation modules for end-user use from our package repositories, but if you'd like to work on the data, you need to download the relevant one(s) and compile it/them yourself.
  • You can install pre-compiled core tools from our package repositories for end-user use or for developing language modules, but if you'd like to work on a particular tool, you need to download and compile it yourself.

Links