Difference between revisions of "Task ideas for Google Code-in/Add constraint-grammar rules"
Jump to navigation
Jump to search
(Created page with '# select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target langu…') |
Firespeaker (talk | contribs) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | # select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁). |
+ | # '''select a language pair''' that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁). |
− | # Install Apertium locally |
+ | # '''Install Apertium''' locally using a package manager; install the language pair from [https://github.com/apertium/ github]; make sure that it works. |
− | # Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect part-of-speech tagging errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one); |
+ | # Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) '''detect part-of-speech tagging errors''' (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one); |
− | # write 10 constraint grammar rules that select the desired part of speech in the relevant context(s); |
+ | # '''write 10 constraint grammar rules''' that select the desired part of speech in the relevant context(s); |
− | # Compile and test again, possibly after retraining the statistical part-of-speech tagger |
+ | # '''Compile and test''' again, possibly after retraining the statistical part-of-speech tagger |
− | # Submit a |
+ | # '''Submit a pull request''' (PR) on github, and give your mentor the url of the PR. |
[[Category:Tasks for Google Code-in|Add constraint-grammar rules]] |
[[Category:Tasks for Google Code-in|Add constraint-grammar rules]] |
Latest revision as of 02:19, 21 October 2018
- select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁).
- Install Apertium locally using a package manager; install the language pair from github; make sure that it works.
- Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect part-of-speech tagging errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one);
- write 10 constraint grammar rules that select the desired part of speech in the relevant context(s);
- Compile and test again, possibly after retraining the statistical part-of-speech tagger
- Submit a pull request (PR) on github, and give your mentor the url of the PR.