Task ideas for Google Code-in/Add constraint-grammar rules

From Apertium
Jump to navigation Jump to search
  1. select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁).
  2. Install Apertium locally using a package manager; install the language pair from github; make sure that it works.
  3. Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect part-of-speech tagging errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one);
  4. write 10 constraint grammar rules that select the desired part of speech in the relevant context(s);
  5. Compile and test again, possibly after retraining the statistical part-of-speech tagger
  6. Submit a pull request (PR) on github, and give your mentor the url of the PR.