Difference between revisions of "Task ideas for Google Code-in/Morphologically disambiguating text"

Revision as of 20:36, 13 November 2013

In this page we describe how to morphologically disambiguate (tag) text so that it can be used as input to training the Apertium part-of-speech tagger.

Why do we want to do this ? -- Well, basically because the default (unsupervised) way of training (see tagger training) is not very accurate, and although for translating between closely related languages this is ok, for translating between less related languages (e.g. English--anything) it causes problems.

Example of a tagger error in the English tagger for the sentence "Where do you come from?":

^Where/Where<adv><itg>$ ^do/do<vbdo><pres>$ ^you/you<prn><subj><p2><mf><sp>$ ^come/come<vblex><pres>$ ^from<pr>$ ^?<sent>$ 
                                                                             |______________________|
                                                                                       ERROR

Input:

The input is the output of the morphological analyser (e.g. lt-proc)

^Where/Where<adv><itg>/Where<rel><adv>$
^do/do<vbdo><pres>/do<vblex><inf>/do<vblex><pres>$
^you/you<prn><subj><p2><mf><sp>/you<prn><obj><p2><mf><sp>$
^come/come<vblex><inf>/come<vblex><pres>/come<vblex><pp>$
^from/from<pr>$
^?/?<sent>$

Output:

You then edit that to remove impossible analyses, ideally leaving just one valid analysis (although this may not always be possible).

^Where/Where<adv><itg>$ 
^do/do<vbdo><pres>$ 
^you/you<prn><subj><p2><mf><sp>$ 
^come/come<vblex><inf>$ 
^from/from<pr>$ 
^?/?<sent>$

Difference between revisions of "Task ideas for Google Code-in/Morphologically disambiguating text"

Revision as of 20:36, 13 November 2013

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Revision as of 21:39, 31 October 2012 (edit) Francis Tyers (talk \| contribs) ← Older edit	Revision as of 20:36, 13 November 2013 (edit) (undo) Francis Tyers (talk \| contribs) m (Francis Tyers moved page Morphologically disambiguating text to Task ideas for Google Code-in/Morphologically disambiguating text without leaving a redirect) Newer edit →
(No difference)