Ideas for Google Summer of Code/Apertium Occitan French
Improving Apertium Occitan-French
The Occitan--French language pair has been recently published. This language pair is of strategic importance for the Occitan language, as Apertium offers the only machine translation system for this language pair. The idea is to make Occitan output easier to postedit and French output easier to understand. This entails increasing the monolingual and bilingual dictionaries, improving disambiguation, and writing new structural transfer rules.
- Install a GNU/Linux system. There is an Apertium virtual machine you can install using VirtualBox.
- If necessary, install Apertium, the Occitan language data, the French language data, and the Apertium Occitan-French package
- Look for representative standard Occitan and French texts.
- Search for frequent words that are not translated in either direction.
- Modify the data packages so that the system translates the word correctly now.
To convince us even more:
- Search for a structure that is frequently mistranslated and that can be easily repaired with a structural transfer rule
- Modify the structural transfer rule packages so that the structure is now correctly translated.
- Submit a pull request with your modifications.