Google Summer of Code/Application 2012
- The Apertium project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalised languages, but also work with more widely-spoken languages.
- The platform, including data for a large number of language pairs, a translation engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge role.
- There are currently 27 published language pairs within the project (including a number of "firsts" — for example Aragonese—Spanish, Turkish—Kyrgyz, Spanish—Occitan, Breton—French, and Basque—Spanish among others), and several more in development.
- Main Organization License
GNU General Public Licence version 2.0 (GPL2)
- Backup admin
- What is the URL for your ideas page?
- What is the main IRC channel for your organization?
- What is the main development mailing list for your organization?
- Why is your organization applying to participate in Google Summer of Code 2012? What do you hope to gain by participating?
- We are very interested in seeing Apertium improve as both a research and development platform, and also as a platform for spreading free/open-source software in the translation world. As a whole, as in GSoCs 2009-2011, and GCI 2010/2011, we will benefit from increased participation from outside the core group of developers: we will get new or improved resources which will help to improve translation quality for users and developers alike.
- We have found that although it is possible to attract developers interested working on language pairs, it is more difficult to find developers who are interested in work on the engine, so we would hope to find students interested in "diving a bit deeper".
- Did your organization participate in past GSoCs? If so, please summarize your involvement and the successes and challenges of your participation.
- Apertium took part in GSoC in 2009, 2010 and 2011. We received 9 slots in 2009, 9 again in 2010, and 11 in 2011. We are very happy with the results of our participation. Our main successes and challenges are described below:
- Getting useful results: 9 out of 11 of our GSOC projects last year were successful.
- Getting maintainable results: at least of our GSOC projects from last year has had a developer from outside the original project.
- Finding new developers: 4 our of 11 of our GSOC students are still (a year later) regular committers, and all have started to work outside their original projects.
- Selecting applicants: Our selection process worked much better in 2011 than in previous years.
- Selecting applicants: There is still room for improvement, streamlining how applications are dealt with, and getting all mentors involved.
- Getting the final furlong: Many of our GSOC projects were successful, in that the code worked, but they needed some finishing touches to be release-worthy. Encouraging students to do this proved in some cases difficult.
- Persuading students to publicise their results, in 2009 we got around half of our students to present their work to the wider community, and in 2010 two (though two students who completed their projects outside of GSoC also presented their work), but some either didn't plan to have the time or we weren't persuasive enough. This year, FreeRBMT is running a bit later, so we hope our 2011 GSOC students will still take part.
- If your organization has not previously participated in Google Summer of Code, have you applied in the past? If so, for what year(s)?
- Does your organization have an application template you would like to see students use? If so, please provide it now.
update this - Francis Tyers 08:06, 5 February 2012 (UTC)
- We expect students to contact us using IRC or e-mail; we will make sure we get the following information from all applicants:
- Name, e-mail address, and other information that may be useful for contact
- Why is it you are interested in machine translation?
- Why is it that you are interested in the Apertium project?
- Which of the published tasks are you interested in? What do you plan to do?
- Applicants should also include a two- to eight-page proposal, including a title, reasons why Google and Apertium should sponsor it, a description of how and who it will benefit, and a detailed work plan including, if possible, a schedule with milestones and deliverables. Include time needed to think, to program, to document and to disseminate.
- List your skills and give evidence of your qualifications. Tell us what is current field of study, major, etc.
- Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.
- Please list any non-Summer-of-Code plans you have for the Summer, especially employment and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.
- What criteria did you use to select the individuals who will act as mentors for your organization? Please be as specific as possible.
update this - Francis Tyers 08:06, 5 February 2012 (UTC)
Mikel L. Forcada is a professor of Computer Science and has led all of the research that has been done at the Universitat d'Alacant in the field of machine translation. He is responsible for much of the current design of Apertium. Mikel was mentor for the successful dictionary interface project in GSoC 2010.
Jacob Nordfalk is an associate professor of Computer Science and author of several books on programming in Java in Danish. He is the primary developer on the English--Esperanto pair and has also done a lot of work on apertium-dixtools. He was mentor for the successful Swedish--Danish project in 2009's GSOC, and the successful Java port of Apertium in 2010.
Sergio Ortiz Rojas is the senior programmer at Prompsit Language Engineering and is responsible for most of the engine code in Apertium; he is, therefore, the developer of reference when it comes to develop new code for the platform. He was mentor for 2009's successful lttoolbox-java project, 2010's unsuccessful VM project and 2011's successful VM project.
Jimmy O'Regan is based in Ireland, he is the instigator and developer of the English--Polish language pair, and also works on, well, almost everything. He was a mentor for 2009's successful apertium-service project, 2010's successful Czech-Polish MT project, and 2011's unsuccessful Wikipedia extraction project.
Kevin Scannell is head of Computer Science at Saint Louis University. He is known in the free software community for his work on Irish, and has been working on Irish--Scottish Gaelic in Apertium. He was a mentor in both 2009 and 2011, although his students were unsuccessful, we are happy to have him back again this year, third time lucky!
Trond Trosterud is a lecturer in Linguistics at the University of Tromsø. He has worked on language technology for many years and was mentor on 2009's successful Norwegian Nynorsk--Norwegian Bokmål project, and acted as co-mentor on both the multiwords and Sámi-Finnish MT projects.
Francis Tyers is a graduate student of Computer Science at the Universitat d'Alacant. He is the main developer of several language packages and has worked on several more. He mentored 2009's successful multi-engine MT project, 2010's successful French-Portuguese MT project and 2011's successful Serbo-Croatian--Macedonian and Turkish--Azerbaijani projects.
Kevin Unhammer is studying for a Master's degree in Computational Linguistics at the University of Bergen, Norway. He was a successful student in 2009's GSoC, who went on to mentor the successful multiwords project in 2010 and the successful Maltese--Hebrew project in 2011.
- What is your plan for dealing with disappearing students?
- Students will be encouraged to let us know how they want to break up their time, and to try and plan for holidays and absences. This will avoid both mentors and students wasting time. If a mentor reports the unscheduled disappearance of a student (72-hour silence), they will be contacted by the administrators. If silence persists, their task will be frozen and we will report to Google.
- What is your plan for dealing with disappearing mentors?
- It is quite unlikely, since all of the mentors are very active developers, with long-term commitment to the project — they are people we have met face-to-face at conferences, workshops or even in daily life. If a mentor fails to respond adequately to a student, they will have been instructed to contact the administrators. The administrators will examine the situation; if disappearance (48 hour silence) is confirmed, they will be assigned a different mentor and Google will be informed.
- What steps will you take to encourage students to interact with your project's community before, during and after the program?
- Developers who have been chosen as mentors will be available for as long as possible at the
#apertiumIRC channel — or another agreed on messaging system — so that the student may receive guidance with any problem they may have during development and before taking decisions on which task to select.
- As we did in in past years, we will try to get them involved as early as possible in the project, by granting them developer status, so they can modify code and data as any other developer would.
- For the past three years, we have organised an academic workshop, FreeRBMT (FreeRBMT2009, FreeRBMT2011, FreeRBMT2012)
- Are you a new organization who has a Googler or other organization to vouch for you? If so, please list their name(s) here.
- Not applicable
- Are you an established or larger organization who would like to vouch for a new organization applying this year? If so, please list their name(s) here.
HFST is based at the University of Helsinki. A PhD student in their project, Tommi Pirinen mentored in 2010 for Apertium, and his project was a great success. Their group is responsive to emails and very eager to help people use their software.