Vin-ivar/proposal ud apertium

From Apertium
Revision as of 22:16, 2 April 2017 by Vin-ivar (talk | contribs)
Jump to navigation Jump to search

Work plan Work plan:

Week 1: morphological feature conversion Whilst mapping Apertium POS tags to UD's UPOSTAGs is fairly simple, converting morph features is a lot more annoying (and not completely doable).

Week 2: soft constraints - 1: If UDPipe has a lemmatisation or a POS tag with a probability less than a threshold value, use Apertium's solution instead. Annoyances: hacking UDPipe to figure out the softmax bit. Add this as a mode to a UDPipe fork, eg. `udpipe --tag --threshold 0.8 --tagger ../apertium-swe/`. Also allow UDPipe to integrate other popular tokenisers (eg. the Stanford word segmenter).

Week 3: soft constraints - 2: Continue week 2. GF soft constraints: if certain deprels are unlikely, use GF to parse the chunk. Use the relations GF returns (all of which are hard-coded for different rules).

Week 4: integrate dependencies within lexical selection Lexical selection currently uses words based on their position in a sentence, which isn't perfect. Add support for writing rules with dependencies. Example:

   <rule> 
     <match lemma="spend" tags="vblex.*">
       <select lemma="pasar" tags="vblex.*"/>
     </match>
     <match/>
     <match/>
     <or>
       <match lemma="minute"/>
       <match lemma="hour"/>
       <match lemma="year"/>
     </or>
   </rule>

Becomes:

   <rule>
     <match lemma="spend" tags="vblex.*">
       <or>
         <dep lemma="minute" deprel="dobj"/>
         <dep lemma="hour" deprel="dobj"/>
         <dep lemma="year" deprel="dobj"/>
       </or>
       <select lemma="pasar" tags="vblex.*"/>
     </match>
   </rule>

Week 5: writing wrappers:

You should have the choice of what parser you want to use for stuff done in week 4 (and later); maybe you hate neural networks and are a MaltParser purist (or maybe you just don't have the time to train models with UDPipe). This involves writing a wrapper over popular parsers, to use whichever one you specify in Apertium pipelines. The underlying implementation should be invisible to the user; all they need to specify is "--parser udpipe" or "--parser maltparser". This is also a potential paper.

Week 6: adding "apertium features" to wrapper:

Allow the relevant parser to use features generated by apertium (largely word translations) as an additional feature. For eg. MaltParser, this should seamlessly combine with configurations, like ArcEager or CovingtonProjective.

Week 7: implement transfer rules as constraints

Implement as part of an ensemble system. If you have xx-yy as a pair and yy is a rubbish treebank - translate yy → xx, parse xx. When transfer rules move stuff around, move the associated dependencies with them. This could be helpful with non-projective sentences that are projective when translated - since you're just reordering dependencies, you're not really parsing non-projective sentences (which is hard).

Week 8: integrating wrappers within transfer rules:

more precision for reordering stuff - for instance, you could refer to a chunk as the "object" chunk and move that around (todo: example)


Week 9: (ESSLLI?): bit more chill. Write plugins to make UD annotation simpler for the better text editors (read: vim).

Week 10: (ESSLLI?): Same as above