Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Task ideas for Google Code-in/Morphologically disambiguating text

From Apertium
< Task ideas for Google Code-in(Difference between revisions)
Jump to: navigation, search
m
 
Line 39: Line 39:
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Documentation in English]]

Latest revision as of 14:16, 26 September 2016

In this page we describe how to morphologically disambiguate (tag) text so that it can be used as input to training the Apertium part-of-speech tagger.

Why do we want to do this ? -- Well, basically because the default (unsupervised) way of training (see tagger training) is not very accurate, and although for translating between closely related languages this is ok, for translating between less related languages (e.g. English--anything) it causes problems.

Example of a tagger error in the English tagger for the sentence "Where do you come from?":

^Where/Where<adv><itg>$ ^do/do<vbdo><pres>$ ^you/you<prn><subj><p2><mf><sp>$ ^come/come<vblex><pres>$ ^from<pr>$ ^?<sent>$ 
                                                                             |______________________|
                                                                                       ERROR

Input:

The input is the output of the morphological analyser (e.g. lt-proc)

^Where/Where<adv><itg>/Where<rel><adv>$
^do/do<vbdo><pres>/do<vblex><inf>/do<vblex><pres>$
^you/you<prn><subj><p2><mf><sp>/you<prn><obj><p2><mf><sp>$
^come/come<vblex><inf>/come<vblex><pres>/come<vblex><pp>$
^from/from<pr>$
^?/?<sent>$ 

Output:

You then edit that to remove impossible analyses, ideally leaving just one valid analysis (although this may not always be possible).

^Where/Where<adv><itg>$ 
^do/do<vbdo><pres>$ 
^you/you<prn><subj><p2><mf><sp>$ 
^come/come<vblex><inf>$ 
^from/from<pr>$ 
^?/?<sent>$ 
Personal tools