Ideas for Google Summer of Code/Add a new variety to an existing language

From Apertium
< Ideas for Google Summer of Code
Revision as of 19:18, 25 January 2023 by Hectoralos (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Take a released language, and define a new language variety for it: e.g. Quebec French or Provençal Occitan. This will involve adding new words and morphological forms, labelling them as belonging to the new variety. Then add the new variety to one or more released language pairs, without diminishing the quality of the pre-existing variety(ies). This will involve working with dictionaries, lexical selection rules, transfer rules, scripting, corpora. The objective is to facilitate the generation of varieties for languages with a weak standardisation and/or pluricentric languages.

Coding challenge[edit]

  • Find a language pair of your choice in Apertium and install it. (see Minimal installation from SVN)
  • Translate 2,000 words of text (e.g. four articles of 500 words) using Apertium.
  • Postedit the translated text to make a reference translation.
  • Use two articles ( input text and translated post-edited text ) to improve the translator.
    • Add all the words, and cover all the structures with transfer rules.
  • Evaluation: calculate the improvement that you were able to make on these two articles, and on your two held out articles.

Frequently asked questions[edit]

What if my pair is composed of two popular languages, for instance two official languages of EU? 
Then this task will be hard. Pairs which have huge corpora of parallel texts, like the 24 official EU languages or the 3 EU working languages, tend to have very good statistical machine translation software already. Your work will be valuable, and accepted, only if you manage to reach a comparable quality. For instance, if europarl+moses gets a WER of 15-20%, we'd be happy with 25%.
What happens if I don't reach the expected results? 
It's no big deal! GSoC is a scholarship, not a service contract. If you don't deliver what agreed/expected, you'll fail the final or midterm evaluation and lose the consequent stipend installment(s). At least you tried! And, hopefully, learnt a lot in the process.

More: ask us something! :)

See also[edit]