Ideas for Google Summer of Code/UD and Apertium integration

From Apertium
Jump to navigation Jump to search

Tasks[edit]

This project would involve working on a number of tasks from the following list:

  • UD annotatrix
    • An HTML/JS interface for treebank annotation
  • lttoolbox relabelling (tagset conversion)
    • Seamless conversion between Apertium and UD tagsets
    • tagset embeddings for ambiguous ones
  • UDpipe --- lttoolbox integration
    • Use Apertium morphological analysers to be soft constraints on lemmatisation and POS/MSD tagging.
  • UD mode for language modiules:
    • calling `apertium qtz-ud` should produce conllu file with LEMMA, POS, FEATs, MISC fields filled in
  • Apertium lexsel using deps -- integrate with UDPipe, parse sent, allow rels as constraints (?)
  • deps with CG (??)
  • Integrate legit tokenisers within UDPipe (????)
  • set up APERTIUM EMBEDDINGS in UDPipe
  • use GF-generated dep trees as constraints on UD parses (relevant to GSoC? idk)
  • transfer rules as constraints (dank idea):
    • es-ca will have a (structural) transfer rule X
    • X reorders dependencies
    • parse (es) and use knowledge of a transfer rule to bias parses on ca
  • Your idea(s) here

Coding challenge[edit]

  • Fix one issue in UD annotatrix and send a pull request
  • Train UDpipe for a language that is also in Apertium
  • Write a tagset equivalence file for a language that is in both Apertium and UD.