Ideas for Google Summer of Code/Adopt a language pair
This project will involve writing linguistic data, including morphological rules and transfer rules — which are specified in a declarative language. A good intro would be to look through Apertium New Language Pair HOWTO, see also Contributing to an existing pair. If the pair has OK dictionaries but a bad tagger (disambiguator), a GsoC project might include writing a good Constraint Grammar for the pair.
The coding challenge for this task is to:
- Install Apertium (see Minimal installation from SVN)
- Go through the HOWTO
- Go through the MT course here (или здесь)
- Write a translator that translates as much of this story as possible — Minimum one sentence. (Другие переводы рассказа здесь.)
- If there is no translation, translate it into the languages of your language pair first.
- Upload your work to Apertium SVN.
If you don't complete it all, don't worry! We take many things into account when assessing your application. However, the URL to any work you do for the coding challenge work should be included in your application.
Frequently asked questions
- Can I do a pair with language x and language y ?
- — Yes, there are no restrictions. But you should take the following into consideration: (a) Are there existing machine translation (MT) systems for this pair? (b) If there are existing systems, how good are they? -- Could you do better in three months? (c) How closely related is the pair? (d) How many resources already exist for the pair? (e) Are there any mentors who can evaluate your work?
- Do I need to have GNU/Linux installed, or can I use another operating system ?
- — In theory you can use any operating system. In practice unless you are using GNU/Linux or Mac/OS you are going to have a hard time as the mentors cannot offer you support with alternative operating systems. You may want to check out Virtualbox if you are using Windows.
- What programming languages do I need to know ?
- — For making a language pair, you don't need to know any specific programming language. Knowing a scripting language will be really helpful, but most of the work is done in Apertium's own linguistic formalisms, which are based on XML. To get an idea of what these formalisms look like, you should do the new language pair HOWTO.
Previous GSOC projects
And pairs which were adopted in past years: