Ideas for Google Summer of Code/Apertium Occitan French

From Apertium
< Ideas for Google Summer of Code
Revision as of 19:48, 24 March 2020 by Popcorndude (talk | contribs) (categorize)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Improving Apertium Occitan-French[edit]

The Occitan--French language pair has been recently published. This language pair is of strategic importance for the Occitan language, as Apertium offers the only machine translation system for this language pair. The idea is to make Occitan output easier to postedit and French output easier to understand. This entails increasing the monolingual and bilingual dictionaries, improving disambiguation, and writing new structural transfer rules.

Coding challenge[edit]

  • Look for representative standard Occitan and French texts.
  • Search for frequent words that are not translated in either direction.
  • Modify the data packages so that the system translates the word correctly now.

To convince us even more:

  • Search for a structure that is frequently mistranslated and that can be easily repaired with a structural transfer rule
  • Modify the structural transfer rule packages so that the structure is now correctly translated.

Finally:

  • Submit a pull request with your modifications.