Difference between revisions of "Task ideas for Google Code-in/Add constraint-grammar rules"

From Apertium
Jump to navigation Jump to search
(Created page with '# select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target langu…')
 
Line 1: Line 1:
# select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁).
+
# '''select a language pair''' that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁).
# Install Apertium locally from the Subversion repository; install the language pair; make sure that it works and/or get [http://wiki.apertium.org/wiki/Apertium_VirtualBox Apertium VirtualBox] and update, check out & compile the language pair.
+
# '''Install Apertium''' locally from the Subversion repository; install the language pair; make sure that it works and/or get [http://wiki.apertium.org/wiki/Apertium_VirtualBox Apertium VirtualBox] and update, check out & compile the language pair.
# Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect part-of-speech tagging errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one);
+
# Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) '''detect part-of-speech tagging errors''' (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one);
# write 10 constraint grammar rules that select the desired part of speech in the relevant context(s);
+
# '''write 10 constraint grammar rules''' that select the desired part of speech in the relevant context(s);
# Compile and test again, possibly after retraining the statistical part-of-speech tagger
+
# '''Compile and test''' again, possibly after retraining the statistical part-of-speech tagger
# Submit a patch to your mentor (or commit it if you have already gained developer access)
+
# '''Submit a patch''' to your mentor (or commit it if you have already gained developer access)
   
 
[[Category:Tasks for Google Code-in|Add constraint-grammar rules]]
 
[[Category:Tasks for Google Code-in|Add constraint-grammar rules]]

Revision as of 04:37, 23 December 2017

  1. select a language pair that already uses constraint grammar for part-of-speech tagging, ideally such that the source language is a language you know (L₂) and the target language a language you use every day (L₁).
  2. Install Apertium locally from the Subversion repository; install the language pair; make sure that it works and/or get Apertium VirtualBox and update, check out & compile the language pair.
  3. Using a large enough corpus of the source language (e.g. plain text taken from Wikipedia, newspapers, literature, etc.) detect part-of-speech tagging errors (the translation is not adequate because the part-of-speech tagger in Apertium has selected the wrong morphological analysis for a word that had more than one);
  4. write 10 constraint grammar rules that select the desired part of speech in the relevant context(s);
  5. Compile and test again, possibly after retraining the statistical part-of-speech tagger
  6. Submit a patch to your mentor (or commit it if you have already gained developer access)