User:Gor ar/proposal 2017

From Apertium
< User:Gor ar
Revision as of 15:27, 31 March 2017 by Gor ar (talk | contribs) (First draft)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

GSoC 2017 Proposal: UD and Apertium Integration.

Contact information

Name: Gor Arakelyan

Email: gor19973010@gmail.com

Skype: gor.arakelyan4

About me

I am a second-year student in YSU (Yerevan State University) at the department of Informatics and Applied Mathematics.

I am interested in natural language processing, especially for low resource languages, like my native language Armenian. Apertium seems to be a perfect platform for that.

Background

The most important problem for Armenian NLP (and possibly for many others) is the lack of a properly annotated treebank. In order to help linguists to quickly annotate large amounts of text, an annotation tool with easy to use interface is required. I believe UD annotatrix is a very good tool to start with.

Proposed solution

Currently it lacks convenient UI to edit POS tags or dependency relations. I propose the following solutions:

  • The simple textbox in UD annotatrix can be replaced by a more convenient rich code editor (like CodeMirror)
  • Rich editor will provide autocompletion for UD specific tags (VERB, ADV, NOUN, Definite etc.)
  • In the later stages we can make autocomplete more intelligent:
  • - e.g. display only verb specific tags when POS is set to verb (tense)
  • - use previously annotated data to provide suggestions
  • Make UD annotatrix portable so it can be included in larger applications, e.g. in an app that uses a backend to save annotations
  • Support multiword tokens in the visualisation

My experience

I have been doing web development for 3 years now. Recently I worked on an open source tool for corpus management (currently used for Armenian only). It involved coding in HTML, JS and Python.