Difference between revisions of "Ideas for Google Summer of Code/Apertium Occitan French"

From Apertium
Jump to navigation Jump to search
(Created page with "== Improving Apertium Occitan-French == The [https://github.com/apertium/apertium-oci-fra Occitan--French language pair] has been recently published. This language pair is of...")
 
Line 7: Line 7:
 
* Install a GNU/Linux system. There is an [http://wiki.apertium.org/wiki/Apertium_VirtualBox Apertium virtual machine] you can install using VirtualBox.
 
* Install a GNU/Linux system. There is an [http://wiki.apertium.org/wiki/Apertium_VirtualBox Apertium virtual machine] you can install using VirtualBox.
   
* If necessary, install the Apertium, [https://github.com/apertium/apertium-oci the Occitan language data], [https://github.com/apertium/apertium-fra the French language data], and [https://github.com/apertium/apertium-oci-fra the Apertium Occitan-French package]
+
* If necessary, install Apertium, [https://github.com/apertium/apertium-oci the Occitan language data], [https://github.com/apertium/apertium-fra the French language data], and [https://github.com/apertium/apertium-oci-fra the Apertium Occitan-French package]
   
 
* Look for representative standard Occitan and French texts.
 
* Look for representative standard Occitan and French texts.

Revision as of 18:28, 4 February 2019

Improving Apertium Occitan-French

The Occitan--French language pair has been recently published. This language pair is of strategic importance for the Occitan language, as Apertium offers the only machine translation system for this language pair. The idea is to make Occitan output easier to postedit and French output easier to understand. This entails increasing the monolingual and bilingual dictionaries, improving disambiguation, and writing new structural transfer rules.

Coding challenge

  • Look for representative standard Occitan and French texts.
  • Search for frequent words that are not translated in either direction.
  • Modify the data packages so that the system translates the word correctly now.

To convince us even more:

  • Search for a structure that is frequently mistranslated and that can be easily repaired with a structural transfer rule
  • Modify the structural transfer rule packages so that the structure is now correctly translated.

Finally:

  • Submit a pull request with your modifications.