Google Code-in/Application 2015

From Apertium
Jump to navigation Jump to search
Some field were shortened to fit the actual application form
Organisation id*
  • apertium
Organisation name*
  • The Apertium project
Organisation description*

Apertium develops a free/open-source platform for machine translation and language technology. Apertium also develops data for many languages, with a focus on lesser-resourced and marginalised languages, but also develops data for larger languages. The platform, including data for tens of language pairs, a translation engine and auxiliary tools is being developed around the world, both in universities and companies (e.g. Prompsit Language Engineering) and by a growing numbers independent free-software developers. There are currently 40 published language pairs within the project (including a number of "firsts" — for example Spanish—Occitan, Breton—French, Basque—Spanish, North Sámi–Norwegian Bokmål and Kazakh–Tatar among others), and many more in development.


natural language processing, machine translation, grammar, python, c++, linguistics, languages

Organisation home page url*
Main organisation license*
  • GNU GPL 2.0/3.0
  • Veteran
Backup Admin*

unhammer, jnw

Why would you organisation like to participate in Google Code-in 2015?*

Apertium is really keen on participating in Google Code-in for the following four reasons:

In previous years we have really benefited from GCI. In any free software project, there are often tasks that get pushed to the bottom of a developer's todo list, but aren't big enough for a GSoC project. We have found GCI students immensely good at helping us out with these: for instance, annotating corpora that are needed to train Apertium modules, or finding bugs in the handling of formatting, which lead to broken document translation. There have also been GCI projects which have become crucial pieces of code — for example Nathan Maxon's Kazakh analyser, which went on to be key to developing the first Kazakh–Tatar MT system, and Pim Otte's Afrikaans–Dutch system, which he presented at an international MT conference. In addition, our new website and translation server both started out as GCI projects, with students such as Sushain Cherivirala contributing long after GCI and eventually mentoring other students.

As Apertium is a project that focuses a lot on marginalised languages, GCI gives us a chance of getting in touch with the next generation of speakers, and showing them how they can help their languages develop and give them some esteem. Language shift (abandoning one's language after perceiving it is not useful for wider spheres of communication) often occurs at this age, and if we can show young language users that their language is useful, and other people care, and there is no barrier for its use in the 'electronic' space, then that might give it more chance of survival.

Getting kids involved early in Apertium also ensures a flux of new developers for the project, but most importantly, reinforces one of the main tenets of what is sometimes called Responsible Research and Innovation: successful development has to involve society — Apertium development has too. And teenagers are a particularly active part of the digitally active society.

Teaching the kids is just really rewarding. Helping them out, answering questions, explaining things, etc.—and when they get it, it's like a spark goes off, and even if it has taken a long time to explain, it's a really great feeling. In fact, some of our mentors are University instructors who unfortunately find too often that university students are not too good at programming and that teaching them is hard and unrewarding; working with pre-university people may motivate them in the face of such a hard task but also give them clues on how to teach programming at the university level.

Finally, when students understand a part of Apertium in such a way that they can actually explain it to themselves and to their mentors, they can produce great outreach or reference materials that are great to disseminate Apertium.

What years has your organisation participated in Google Summer of Code? Please indicate the years you have participated in Google Code-in or GHOP if applicable.*

2009 (GSoC only), 2010 (GSoC and GCI), 2011 (GSoC and GCI), 2012 (GSoC and GCI), 2013 (GSoC and GCI), 2014 (GSoC and GCI),

Please provide a link to your tasks page. This is one of the most important parts of your application as it lets us see what type of work you plan to have the students work on for Google Code-in and shows you already have some ideas of the types of tasks students would work on. Please be sure to include at least 4 tasks from each of the 5 categories. This is similar to the Google Summer of Code Ideas page. *

What programming languages does your organisation use?*
  • C++, Python, Java, Bash, HTML, Javascript, XML
What is the main development mailing list for your organisation? This question will be shown to students who would like to get more information about applying to your organisation for Google Code-in 2014. If your organisation uses more than one list, please make sure to include a description of the list so students know which to use.*
  • (general list: most traffic here)
What is the main IRC channel for your organisation?*
  • #apertium on
Please tell us about how your organisation has prepared for Google Code-in, including what (and how many) mentors and organisation administrators have agreed to help, what your schedule and response time will be during the holidays (and otherwise during the contest period) and how you plan to deal with unresponsive mentors.*

We have four organisation administrators: Francis Tyers, Jonathan North Washington, Mikel L. Forcada and Kevin Brubeck Unhammer.

We have around 15 mentors who will be taking part. They are from a variety of time zones, from CST (UTC-6) to MSK (UTC+4).

In addition to these mentors, there will be plenty of help available to students as there are always Apertium developers hanging out on the Apertium IRC channel. For most of our mentors, hanging out on the Apertium IRC channel, hacking, and helping other developers hack is a lot of what we do in our free time, because we do it for fun. For those of us that work, the 'holidays' are really when we are most active in Apertium. In past Google Code-Ins our organisation has had no problem to respond in time to students. Those of us who do not hang out so often on the Apertium IRC channel have been able to effectively guide student work using Melange.

If for some reason a mentor becomes unresponsive (in our experience, it would have to be either task overload or 'force majeure'!), administrators will be on call to reassign the task to another mentor or evaluate it themselves.