Task ideas for Google Code-in/Getting started
This page will describe some steps you can take to get involved with the Apertium project in the Google Code-in. First of all, thanks for reading! We're very enthusiastic about getting new contributors to Apertium and to helping spread our passion for language technology.
So, what are the first steps ?
- Talk to us! This is the most important step! Nothing in Apertium is too hard without the right amount of help. And we like helping, so just get in contact. The best way to contact us is on IRC, and the best way to use IRC is with a client like irssi, weechat or hexchat.. A good tip is to hang out on IRC, even if no-one is talking when you enter. People can be in different time zones, and channel activity peaks depending on the time. Here's a list of the IRC nicks and wiki usernames of the mentors:
GCI name IRC nick wiki username Jonathan W firespeaker, jonorthwash Firespeaker
- Install Apertium: Not all tasks require Apertium to be installed, but if you're planning to work with Apertium, it's a good idea to do this early.
- Find an interesting task:
Things you might want to know.
For some tasks, you may need access to Apertium resources, like the wiki or our subversion repository. Usually this is no problem—you just need ask a mentor or an org admin (ask on IRC above).
Tasks on github
"Fix any bug" tasks
For tasks that point you at a repository and ask you to fix any bug, you should decide on a bug and tell your mentor which one you want to work on when you claim the task. You are also encouraged to come onto IRC (see above) and ask which bug might be a good one to work on given your background—i.e., discussing it with a mentor ahead of time.
Where is apertium code?
Apertium code is housed in several places:
- Most code, including the core tools, translation and language modules, and a number of other things, live in our svn repo. The language data is found in the following places:
- /languages - where stable monolingual language packages live
- /incubator - where the initial stages of language data development takes place, and sometimes stagnates
- /nursery - where translation modules that have begun to become useful/usable live
- /staging - where translation modules that are nearly ready—but are still not quite ready for production-environment use—live
- /trunk - where translation modules that are fully developed and considered stable live; also here is the main code base, etc.
- Many tools are also in svn, specifically /trunk/apertium-tools.
- Several tools live on github, including begiak (our IRC bot), APy (our web API), and html-tools (our website framework). The latter two of these are synchronised back into SVN (in /trunk/apertium-tools), but the main development occurs on github.
Language and translation modules
- Most translation modules are structured in the form of apertium-xxx-yyy, meaning it's a module that translates from language xxx to langauge yyy (and potentially the other way around).
- Some older language modules use two letter abbreviations, like apertium-xx-yy, but the standard now is three-letter
- Monolingual language modules are named apertium-xxx, where xxx is the ISO 639-3 code for the language
- All but some older translation modules rely on monolingual language modules
- Some monolingual language modules are based on HFST, and some are based on lttoolbox.
- You can install pre-compiled language and translation modules for end-user use from our package repositories, but if you'd like to work on the data, you need to download the relevant one(s) and compile it/them yourself.
- You can install pre-compiled core tools from our package repositories for end-user use or for developing language modules, but if you'd like to work on a particular tool, you need to download and compile it yourself.