Task ideas for Google Code-in/Setup constraint grammar for a pair
Jump to navigation
Jump to search
- select a language pair that does not yet use constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁).
- Install Apertium locally from the Subversion repository; install the language pair; make sure that it works and/or getApertium VirtualBox and update, check out & compile the language pair.
- Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect part-of-speech tagging errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one);
- set it up so that it uses it (get inspiration from constraint grammar files from other languages) and write 5 constraint grammar rules that select the desired part of speech in the relevant context(s);
- Compile and test again, possibly after retraining the statistical part-of-speech tagger
- Submit a patch to your mentor (or commit it if you have already gained developer access)