Difference between revisions of "Ideas for Google Summer of Code/UD and Apertium integration"
Jump to navigation
Jump to search
TommiPirinen (talk | contribs) (→Tasks: mode) |
(→Tasks) |
||
Line 8: | Line 8: | ||
* lttoolbox relabelling (tagset conversion) |
* lttoolbox relabelling (tagset conversion) |
||
** Seamless conversion between Apertium and UD tagsets |
** Seamless conversion between Apertium and UD tagsets |
||
** tagset embeddings for ambiguous ones |
|||
* [[UDpipe]] --- lttoolbox integration |
* [[UDpipe]] --- lttoolbox integration |
||
** Use Apertium morphological analysers to be soft constraints on lemmatisation and POS/MSD tagging. |
** Use Apertium morphological analysers to be soft constraints on lemmatisation and POS/MSD tagging. |
||
* UD mode for language modiules: |
* UD mode for language modiules: |
||
** calling `apertium qtz-ud` should produce conllu file with LEMMA, POS, FEATs, MISC fields filled in |
** calling `apertium qtz-ud` should produce conllu file with LEMMA, POS, FEATs, MISC fields filled in |
||
* Apertium lexsel using deps -- integrate with UDPipe, parse sent, allow rels as constraints (?) |
|||
* deps with CG (??) |
|||
* Integrate legit tokenisers within UDPipe (????) |
|||
* set up APERTIUM EMBEDDINGS in UDPipe |
|||
* use GF-generated dep trees as constraints on UD parses (relevant to GSoC? idk) |
|||
* transfer rules as constraints (dank af idea): |
|||
** es-ca will have a (structural) transfer rule X |
|||
** X reorders dependencies |
|||
** parse (es) and use knowledge of a transfer rule to bias parses on ca |
|||
** i need sleep |
|||
* ''Your idea(s) here'' |
* ''Your idea(s) here'' |
||
Revision as of 23:39, 29 March 2017
Tasks
This project would involve working on a number of tasks from the following list:
- UD annotatrix
- An HTML/JS interface for treebank annotation
- lttoolbox relabelling (tagset conversion)
- Seamless conversion between Apertium and UD tagsets
- tagset embeddings for ambiguous ones
- UDpipe --- lttoolbox integration
- Use Apertium morphological analysers to be soft constraints on lemmatisation and POS/MSD tagging.
- UD mode for language modiules:
- calling `apertium qtz-ud` should produce conllu file with LEMMA, POS, FEATs, MISC fields filled in
- Apertium lexsel using deps -- integrate with UDPipe, parse sent, allow rels as constraints (?)
- deps with CG (??)
- Integrate legit tokenisers within UDPipe (????)
- set up APERTIUM EMBEDDINGS in UDPipe
- use GF-generated dep trees as constraints on UD parses (relevant to GSoC? idk)
- transfer rules as constraints (dank af idea):
- es-ca will have a (structural) transfer rule X
- X reorders dependencies
- parse (es) and use knowledge of a transfer rule to bias parses on ca
- i need sleep
- Your idea(s) here
Coding challenge
- Fix one issue in UD annotatrix and send a pull request
- Train UDpipe for a language that is also in Apertium
- Write a tagset equivalence file for a language that is in both Apertium and UD.