User:Mary.szmary/proposal2017

From Apertium
Jump to navigation Jump to search

Contact information

Name: Maria Sheyanova
E-mail: masha.shejanova@gmail.com
IRC: maryszmary
SourceForge: maryszmary
Phone number: +79169223114
Timezone: UTC+3

Why is it that you are interested in the Apertium project?

I have participated in GSoC 2016 with Apertium, which made me involved in this project, this is one of the main reasons why I am interested in contributing to Apertium. Another reason is that, being a linguist, I find it beneficial to develop linguistic tools, and Apertium gives me a good opportunity to do so.

Which of the published tasks are you interested in? What do you plan to do?

I am planning to work on UD-annotatrix. This will include making a user-friendly interface, which would enable linguists to make syntactic annotation quickly and easily.

Reasons why Google and Apertium should sponsor it

Currently there is an interface for doing syntactic annotation called brat with both online and offline. However, the interface has a number of issues. Firstly, it does not allow a user to edit the source. Secondly, it does not allow to edit tokenisation. Basically, this interface lacks a lot of features that could be very useful for annotation. Finally, it requires a web-server in order to be used by a team of annotators.

There is also a project aimed to make a toolkit for working with dependency trees in Apertium. At the moment, it allows to visualize the trees. The aim of my project is to create an easy-to-use, quick and interactive interface tool for UD annotation based on the existing Apertium project. The tool should work both online and offline and allow a user to edit the annotation in both graphical and text modes.

A description of how and who it will benefit in society

Syntactic annotation is blah blah (why it's important). The result of this work is going to be useful for linguists who deal with dependency annotation.

Previous work

Apertium has a web-interface for visualising syntactic trees written in Java-Script and HTML. The interface works with three annotation formats, namely CoNLL-U, CG-3 and SD. It allows a user to either enter their trees in the test area or upload a treebank from a file.

Project plan

The main idea of this project is to

Here is a mockup of the interface.

The main page has ...

The main page


Work plan

Overview

post application period

  • Understanding the architecture of the existing project
  • Improving my knowledge of Java-Script

community bonding period

  • Closer examination and evaluation of the tools that can be used:
    • blah;
    • blah blah blah ;
  • Thinking more about the architecture of the app

work period

  • 1st month: developing the basic architecture of the interface
  • 2nd month: working on increasing the usability and efficiency of the tool
  • 3rd month: working on additional features, documentation

Schedule

week 1: thinking through the architecture, writing the skeleton of the project
weeks 2-3:
week 4:
Deliverable #1
week 5-6:
week 7:
week 8:
27 June: midterm evaluations deadline
Deliverable #2
week 9-10:
week 11: testing, fixing bugs
week 12: clean up the code, last fixes, writing documentation
Project completed: a user-friendly interactive annotation interface is ready

List your skills and give evidence of your qualifications

I'm a 4th year bachelor student of Linguistic Faculty in NRU HSE (Russia).
Programming skills: Python (Flask, Django, SQLite, Elasticsearch, nltk, sklearn), Bash, R, Java-Script (not very experienced, but ready to quickly learn).
Other computer skills: HTML, XML, CSS, JSON, deploying python projects on a Ubuntu server
Experience: I already worked on a couple of projects which included making a web-interface. For example, a have written a web-interface for a unified online dictionary of antonyms. The project is written in Flask (Python), HTML and CSS and uses SQLite database. I also have another Flask project, which consists in making online linguistic exercises. Apart from these projects, I am currently working on a larger team project, which is to make a universal linguistic corpus platform. The frameworks used in this project include Elasticsearch (NoSQL db) and Django. We are also going to use JavaScript in it, so, by the end of spring I will be much more experienced with the language.
Natural languages: Russian (native), Polish, English, German, basic knowledge of Indonesian.

Coding challenge: I’ve fixed the #18 issue on the project's github.

List any non-Summer-of-Code plans you have for the Summer

At the beginning of June I will be finishing a project aimed to make a universal corpus platform, but, as the work on that project is mostly going to be done earlier, I do not think, that it will significantly affect my ability to work on the GSoC project. After that, I have no other no-GSoC plans.