Difference between revisions of "Task ideas for Google Code-in/Morphologically disambiguating text"

From Apertium
Jump to navigation Jump to search
(Created page with 'In this page we describe how to morphologically disambiguate (tag) text so that it can be used as input to training the Apertium part-of-speech tagger. Why do we want to do this…')
 
m
 
(2 intermediate revisions by one other user not shown)
Line 3: Line 3:
 
Why do we want to do this ? -- Well, basically because the default (unsupervised) way of training (see [[tagger training]]) is not very accurate, and although for translating between closely related languages this is ok, for translating between less related languages (e.g. English--anything) it causes problems.
 
Why do we want to do this ? -- Well, basically because the default (unsupervised) way of training (see [[tagger training]]) is not very accurate, and although for translating between closely related languages this is ok, for translating between less related languages (e.g. English--anything) it causes problems.
   
Example of a tagger error:
+
Example of a tagger error in the English tagger for the sentence "Where do you come from?":
   
 
<pre>
 
<pre>
^Where<adv><itg>$ ^do<vbdo><pres>$ ^you<prn><subj><p2><mf><sp>$ ^come<vblex><pres>$ ^from<pr>$ ^?<sent>$
+
^Where/Where<adv><itg>$ ^do/do<vbdo><pres>$ ^you/you<prn><subj><p2><mf><sp>$ ^come/come<vblex><pres>$ ^from<pr>$ ^?<sent>$
|_________________|
+
|______________________|
ERROR
+
ERROR
 
</pre>
 
</pre>
   
Line 23: Line 23:
 
^?/?<sent>$
 
^?/?<sent>$
 
</pre>
 
</pre>
 
   
 
'''Output''':
 
'''Output''':
Line 40: Line 39:
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Documentation in English]]

Latest revision as of 12:16, 26 September 2016

In this page we describe how to morphologically disambiguate (tag) text so that it can be used as input to training the Apertium part-of-speech tagger.

Why do we want to do this ? -- Well, basically because the default (unsupervised) way of training (see tagger training) is not very accurate, and although for translating between closely related languages this is ok, for translating between less related languages (e.g. English--anything) it causes problems.

Example of a tagger error in the English tagger for the sentence "Where do you come from?":

^Where/Where<adv><itg>$ ^do/do<vbdo><pres>$ ^you/you<prn><subj><p2><mf><sp>$ ^come/come<vblex><pres>$ ^from<pr>$ ^?<sent>$ 
                                                                             |______________________|
                                                                                       ERROR

Input:

The input is the output of the morphological analyser (e.g. lt-proc)

^Where/Where<adv><itg>/Where<rel><adv>$
^do/do<vbdo><pres>/do<vblex><inf>/do<vblex><pres>$
^you/you<prn><subj><p2><mf><sp>/you<prn><obj><p2><mf><sp>$
^come/come<vblex><inf>/come<vblex><pres>/come<vblex><pp>$
^from/from<pr>$
^?/?<sent>$ 

Output:

You then edit that to remove impossible analyses, ideally leaving just one valid analysis (although this may not always be possible).

^Where/Where<adv><itg>$ 
^do/do<vbdo><pres>$ 
^you/you<prn><subj><p2><mf><sp>$ 
^come/come<vblex><inf>$ 
^from/from<pr>$ 
^?/?<sent>$