Difference between revisions of "User:Gor ar"
Jump to navigation
Jump to search
(Add to category) |
(Move the proposal to a subpage) |
||
Line 1: | Line 1: | ||
+ | [[User:Gor_ar/proposal_2017|My proposal for GSoC 2017]] |
||
− | GSoC 2017 Proposal: UD and Apertium Integration. |
||
− | |||
− | == Contact information == |
||
− | |||
− | '''Name:''' Gor Arakelyan |
||
− | |||
− | '''Email:''' gor19973010@gmail.com |
||
− | |||
− | '''Skype:''' gor.arakelyan4 |
||
− | |||
− | == About me == |
||
− | |||
− | I am a second-year student in YSU (Yerevan State University) at the department of Informatics and Applied Mathematics. |
||
− | |||
− | I am interested in natural language processing, especially for low resource languages, like my native language Armenian. Apertium seems to be a perfect platform for that. |
||
− | |||
− | == Background == |
||
− | The most important problem for Armenian NLP (and possibly for many others) is the lack of a properly annotated treebank. In order to help linguists to quickly annotate large amounts of text, an annotation tool with easy to use interface is required. I believe UD annotatrix is a very good tool to start with. |
||
− | |||
− | == Proposed solution == |
||
− | Currently it lacks convenient UI to [https://github.com/jonorthwash/ud-annotatrix/issues/6 edit POS tags] or [https://github.com/jonorthwash/ud-annotatrix/issues/3 dependency relations]. I propose the following solutions: |
||
− | |||
− | * The simple textbox in UD annotatrix can be replaced by a more convenient rich code editor (like [https://codemirror.net CodeMirror]) |
||
− | * Rich editor will provide autocompletion for UD specific tags (VERB, ADV, NOUN, Definite etc.) |
||
− | * In the later stages we can make autocomplete more intelligent: |
||
− | * - e.g. display only verb specific tags when POS is set to verb (tense) |
||
− | * - use previously annotated data to provide suggestions |
||
− | * Make UD annotatrix portable so it can be included in larger applications, e.g. in an app that uses a backend to save annotations |
||
− | * Support [https://github.com/jonorthwash/ud-annotatrix/issues/8 multiword tokens] in the visualisation |
||
− | |||
− | == My experience == |
||
− | I have been doing web development for 3 years now. Recently I worked on [https://github.com/YerevaNN/armtreebank an open source tool for corpus management] (currently used for Armenian only). It involved coding in HTML, JS and Python. |
||
− | |||
− | [[Category:GSoC 2017 Student Proposals]] |