Difference between revisions of "Google Summer of Code/Application 2013"
Line 12: | Line 12: | ||
There are currently 33 published language pairs within the project (including a number of "firsts" — for example Aragonese—Spanish, Spanish—Occitan, Breton—French, and Basque—Spanish among others), and several more in development. |
There are currently 33 published language pairs within the project (including a number of "firsts" — for example Aragonese—Spanish, Spanish—Occitan, Breton—French, and Basque—Spanish among others), and several more in development. |
||
[GEMA] Apertium has a special focus in lowering the barrier for the creation of linguistic resources for any language, ideally to be used for MT, but also reusable for other purposes (e.g. grammar checking, morphological analysis, PoS tagging, etc.). |
|||
;3. Organisation home page url* |
;3. Organisation home page url* |
||
Line 39: | Line 41: | ||
Challenges: |
Challenges: |
||
* Getting students to work quickly: [GEMA] Apertium is a complex eight-moduled pipeline mixing both linguistics and computer engineering knowledge, getting started is not always straightforward and a special effort is done to break the problems to be addressed by students into isolated little pieces. |
|||
* Getting students to work quickly: |
|||
* Getting the final furlong: Many of our GSOC projects were successful, in that the code worked, but they needed some finishing touches to be release-worthy. Encouraging students to do this proved in some cases difficult. |
* Getting the final furlong: Many of our GSOC projects were successful, in that the code worked, but they needed some finishing touches to be release-worthy. Encouraging students to do this proved in some cases difficult. |
||
* Persuading students to publicise their results, in 2009 we got around half of our students to present their work to the wider community, and in 2010 two (though two students who completed their projects outside of GSoC also presented their work), but some either didn't plan to have the time or we weren't persuasive enough. In 2011/2012 we had one student present their work. |
* Persuading students to publicise their results, in 2009 we got around half of our students to present their work to the wider community, and in 2010 two (though two students who completed their projects outside of GSoC also presented their work), but some either didn't plan to have the time or we weren't persuasive enough. In 2011/2012 we had one student present their work. |
||
Line 51: | Line 53: | ||
;8. Why is your organisation applying to participate in Google Summer of Code 2013? What do you hope to gain by participating?* |
;8. Why is your organisation applying to participate in Google Summer of Code 2013? What do you hope to gain by participating?* |
||
[GEMA] |
|||
Apertium is applying again for two main reasons: |
|||
*Apertium likes Google Summer of Code: it is a programme that supports open-source as much as we do! |
|||
*Apertium needs Google Summer of Code: it is an incredible opportunity for us to spread the word, to attract newcomers and to improve the platform |
|||
What we hope to gain by participating is more students getting to know open-source, contributing to open-source and specially if they are passionated by languages and computers contributing to Apertium. |
|||
;9. What is the URL for your Ideas list?* |
;9. What is the URL for your Ideas list?* |
Revision as of 10:33, 19 March 2013
Application
- 1. Organisation name*
Apertium
- 2. Organisation description*
The Apertium project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalised languages, but also work with more widely-spoken languages.
The platform, including data for a large number of language pairs, a translation engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge role.
There are currently 33 published language pairs within the project (including a number of "firsts" — for example Aragonese—Spanish, Spanish—Occitan, Breton—French, and Basque—Spanish among others), and several more in development.
[GEMA] Apertium has a special focus in lowering the barrier for the creation of linguistic resources for any language, ideally to be used for MT, but also reusable for other purposes (e.g. grammar checking, morphological analysis, PoS tagging, etc.).
- 3. Organisation home page url*
- 4. Main organisation license*
GNU General Public Licence
- 5. Veteran/New*
Veteran
- 6. Backup Admin*
- 7. If you chose "veteran" in the dropdown above, please summarise your involvement and the successes and challenges of your participation. Please also list your pass/fail rate for each year.
Apertium took part in GSoC in 2009, 2010, 2011 and 2012. We received 9 slots in 2009, 9 again in 2010, 11 in 2011, and 12 in 2012 although we gave one slot back to the pool, making 11. We are very happy with the results of our participation. Our main successes and challenges are described below:
Successes:
- Getting useful results: 9 out of 11 projects were successful in that they produced useful, working code, and 6 of the projects were released, which means that the code got to a sufficient level to be let into the world.
- Getting maintainable results: 5 out of the 11 projects have had outside developers (e.g. not the students nor their mentors) work on them.
- Attracting and keeping new developers: Out of our 11 GSOC students last year, 8 are still working with us, and 3 have become very regular committers. Several of our GSOC students last year also helped us out with mentoring for the GCI.
- Selecting applicants: We continued refining our selection process, and found it worked even better in 2012 than in 2011.
Challenges:
- Getting students to work quickly: [GEMA] Apertium is a complex eight-moduled pipeline mixing both linguistics and computer engineering knowledge, getting started is not always straightforward and a special effort is done to break the problems to be addressed by students into isolated little pieces.
- Getting the final furlong: Many of our GSOC projects were successful, in that the code worked, but they needed some finishing touches to be release-worthy. Encouraging students to do this proved in some cases difficult.
- Persuading students to publicise their results, in 2009 we got around half of our students to present their work to the wider community, and in 2010 two (though two students who completed their projects outside of GSoC also presented their work), but some either didn't plan to have the time or we weren't persuasive enough. In 2011/2012 we had one student present their work.
Pass/fail rate by year: check these
- 2009: 8 pass, 1 fail
- 2010: 8 pass, 1 fail
- 2011: 9 pass, 2 fail
- 2012: 10 pass, 1 fail
- 8. Why is your organisation applying to participate in Google Summer of Code 2013? What do you hope to gain by participating?*
[GEMA]
Apertium is applying again for two main reasons:
- Apertium likes Google Summer of Code: it is a programme that supports open-source as much as we do!
- Apertium needs Google Summer of Code: it is an incredible opportunity for us to spread the word, to attract newcomers and to improve the platform
What we hope to gain by participating is more students getting to know open-source, contributing to open-source and specially if they are passionated by languages and computers contributing to Apertium.
- 9. What is the URL for your Ideas list?*
http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code
- 10. What is the main development mailing list for your organisation?*
apertium-stuff@lists.sourceforge.net
- 11. What is the main IRC channel for your organisation?*
#apertium irc.freenode.net
- 12. What criteria did you use to select your mentors for this year's program? Please be as specific as possible.*
- Active contributors:
- Knowledgeable in their field:
- Enough time to spare:
- Experience with mentoring:
- 13. What is your plan for dealing with disappearing mentors?*
- 14. What steps will you take to encourage students to interact with your project's community before and during the program?
First, we encourage all of our students visit our IRC channel (#apertium @ freenode) as often as possible, even before the start of the program, since that would help them find a suitable mentor and a useful project that they can work on. We advice them strongly to read our wiki pages and manuals, use our system, try to break it and fix it, and finally tell us about it. As a result, students get familiar with Apertium before the coding period starts, which increases their chances of ending up with a successful project.
In addition, we define coding challenges for each of the proposed projects, which serve both as an entry task, and as means for getting our students familiar with Apertium and involved in our community in the early stages of the program.
Finally, during the coding stage, we talk to our students on a daily basis and give them suggestions and advice when they get stuck. We urge them to keep to the project plan they made when applying, and assist them when they fall behind.
- 15. What will you do to encourage that your accepted students stick with the project after Google Summer of Code concludes?*
We have found that the following has helped us have quite a high retention rate in previous years:
- Helping students out publishing papers for conferences, or assisting with academic work.
- Organising a workshop where students can present their work to the wider community
- Encouraging students to get involved in mentoring themselves, through the GCI programme
- Passing on information about MSc and PhD positions, and academic and other grants
- 16. Are you an established or larger organisation who would like to vouch for a new organisation applying this year? If so, please list their name(s) here.