Task ideas for Google Code-in/Morphologically disambiguating text
< Task ideas for Google Code-in
Jump to navigation
Jump to search
Revision as of 21:35, 31 October 2012 by Francis Tyers (talk | contribs) (Created page with 'In this page we describe how to morphologically disambiguate (tag) text so that it can be used as input to training the Apertium part-of-speech tagger. Why do we want to do this…')
In this page we describe how to morphologically disambiguate (tag) text so that it can be used as input to training the Apertium part-of-speech tagger.
Why do we want to do this ? -- Well, basically because the default (unsupervised) way of training (see tagger training) is not very accurate, and although for translating between closely related languages this is ok, for translating between less related languages (e.g. English--anything) it causes problems.
Example of a tagger error:
^Where<adv><itg>$ ^do<vbdo><pres>$ ^you<prn><subj><p2><mf><sp>$ ^come<vblex><pres>$ ^from<pr>$ ^?<sent>$ |_________________| ERROR
Input:
The input is the output of the morphological analyser (e.g. lt-proc
)
^Where/Where<adv><itg>/Where<rel><adv>$ ^do/do<vbdo><pres>/do<vblex><inf>/do<vblex><pres>$ ^you/you<prn><subj><p2><mf><sp>/you<prn><obj><p2><mf><sp>$ ^come/come<vblex><inf>/come<vblex><pres>/come<vblex><pp>$ ^from/from<pr>$ ^?/?<sent>$
Output:
You then edit that to remove impossible analyses, ideally leaving just one valid analysis (although this may not always be possible).
^Where/Where<adv><itg>$ ^do/do<vbdo><pres>$ ^you/you<prn><subj><p2><mf><sp>$ ^come/come<vblex><inf>$ ^from/from<pr>$ ^?/?<sent>$