Ideas for Google Summer of Code/Make a language pair state-of-the-art

From Apertium
Jump to navigation Jump to search

Take a released language pair, and drastically improve the performance both in terms of coverage, and in terms of translation quality. This will involve working with dictionaries, transfer rules, scripting, corpora. The objective is to make an Apertium language pair state-of-the-art, or close to state-of-the-art in terms of translation quality. This will involve improving coverage to 95-98% on a range of corpora and decreasing word error rate by 30-50%. For example if the current word error rate is 30%, then it should be reduced to 15-20%.

Coding challenge

  • Find a language pair of your choice.
  • Translate 2,000 words of text (e.g. four articles of 500 words)
  • Postedit the text to make a reference translation.
  • Use two articles to improve the translator.
    • Add all the words, and cover all the structures with transfer rules.
  • Calculate the improvement that you were able to make on these two articles, and on your two held out articles.

Frequently asked questions

See also