Google Summer of Code/Application 2008

From Apertium
Jump to navigation Jump to search

Notes for applicants here: selection criteria, and advice for mentors

Answers to the descriptive questions should probably be 2--3 paragraphs at most, according to advice from #gsoc.

Fill out the application form here.


Describe your organization.
  • Two organizations team up for GSoC. One is the Transducens research group of the Universitat d'Alacant (Alacant, Spain); the other one is Prompsit Language Engineering. These two organizations are currently responsible for most of the development taking place in the Apertium open-source machine translation platform.
  • Apertium is a platform for developing rule-based machine translation systems. It was initially targeted at closely related languages (particularly the Romance languages), where it is possible to get a very high degree of accuracy in translation. Recent developments have made it possible to create systems to translate less-closely related languages. We have 10 published language pairs, and three more currently in development.
Why is your organization applying to participate in GSoC 2008? What do you hope to gain by participating?
  • Both organizations are very interested in seeing Apertium improve in many different directions. The Universitat, mainly because most of its research in the field of machine translation is based on Apertium components. Prompsit, because it bases its business in providing Apertium-based services.
  • Apertium as a whole will benefit from increased participation from outside the core group of developers: we will get new or improved tools which will help to improve translation quality for users and developers alike.
Did your organization participate in past GSoCs? If so, please summarize your involvement and the successes and challenges of your participation.
  • n/a
If your organization has not previously participated in GSoC, have you applied in the past? If so, for what year(s)?
  • n/a
Who will your organization administrator be? Please include Google Account information.
  • Mikel L. Forcada (Grup Transducens, Universitat d'Alacant), <mikel.forcada at>
What license(s) does your project use?
  • GNU GPL 2.0
What is the URL for your ideas page?
What is the main development mailing list or forum for your organization?
What is the main IRC channel for your organization?
  • #apertium on
Does your organization have an application template you would like to see students use? If so, please provide it now.
  • We expect students to contact us using IRC or e-mail; we will make sure we get the following information from all applicants:
  • Name and e-mail address
  • Current field of study / major
  • Whether they have programmed before in an open-source project
  • Why is it that they are interested in machine translation
  • Why is it that they are interested in the Apertium project
  • Which task they are interested in, and why
Who will be your backup organization administrator? Please include Google Account information.
  • Gema Ramírez Sánchez (Prompsit Language Engineering S.L., <gramirez at>
Who will your mentors be? Please include Google Account information.
  • Francis Tyers <francis.tyers at>
  • Mikel L. Forcada <mikel.forcada at>
  • Jimmy O'Regan <joregan at>
  • Felipe Sánchez Martínez <fsanchez at>
  • Sergio Ortiz Rojas <sergio.ortiz at>
  • Wynand Winterbach <wynand.winterbach at>
What criteria did you use to select these individuals as mentors? Please be as specific as possible.
  • They are all developers at the Apertium project:
  • Francis Tyers is one of the main coordinators of development being done outside the Universitat d'Alacant or Prompsit. He is a graduate student at the Universitat d'Alacant and also works for Prompsit Language Engineering. He has been responsible for the current visibility of Apertium in Debian and Ubuntu, has set up the Apertium wiki, takes care of the #apertium IRC channel, etc.
  • Mikel L. Forcada is a Professor of Computer Science and has led all of the research that has been done at the Universitat d'Alacant in the field of machine translation. He is responsible for much of the current design of Apertium.
  • Jimmy O'Regan is based in Ireland, he is the instigator and developer of the English--Polish language pair, and also works on Irish. He has also been a writer for the Linux Gazette.
  • Felipe Sánchez Martínez is a graduate student at the Universitat d'Alacant under the supervision of Mikel L. Forcada. He is responsible for coding the part-of-speech tagger of Apertium as well as the maintainer of packages apertium-tagger-training-tools and apertium-transfer-tools, which allow developers of Apertium language-pair data to induce the part-of-speech tagger and an initial set of translation rules from monolingual and bilingual corpora.
  • Sergio Ortiz-Rojas is the senior programmer at Prompsit Language Engineering and is responsible for most of the code in Apertium (except the one written by Felipe Sánchez Martínez); he is, therefore, the developer of reference when it comes to develop new code for the platform.
  • Wynand Winterbach, M.Sc. in Mathematics, is based in South Africa; he is the author of two GUIs for Apertium, apertium-view and apertium-tolk. He has also written a D-Bus module, and has been working in the English--Afrikaans language pair.
What is your plan for dealing with disappearing students?

Students will be encouraged to let us know how they want to break up their time, and to try and plan for holidays and absences. This will avoid both mentors and students wasting time. If a mentor reports the unscheduled disappearance of a student (72-hour silence), he will be contacted by the administrators. If silence persists, his task will be frozen and we will report to Google.

What is your plan for dealing with disappearing mentors?

It is quite unlikely, since all of the mentors are very active developers, with long term commitment to the project. We originally had "call the police" here, but a more down to earth strategy would be the following:

If a mentor fails to respond adequately to a student, he or she will have been instructed to contact the administrators. The administrators will examine the situation; if disappearance (48 hour silence) is confirmed, they will assign a different mentor to them, and inform Google.

What steps will you take to encourage students to interact with your project's community before, during and after the program?
  • We will make sure most developers are available as long as possible at the #apertium IRC channel, so that they get guidance with any problem they may have during development or before taking decisions on what task to select.
  • We will try to get them involved as early as possible in the project, by granting them developer status, so they can modify code and data as any other developer would.
  • Depending on the number of projects chosen for development, we will organise an optional workshop in Alacant so that the students may present their work to the wider group.
What will you do to ensure that your accepted students stick with the project after GSoC concludes?
  • Whenever there is a relevant research or development component in their work, we will make sure they can use it as part of their undergraduate or graduate work, and offer guidance when writing papers.