Difference between revisions of "User:Mary.szmary/proposal2017"

From Apertium
Jump to navigation Jump to search
Line 10: Line 10:
== Why is it that you are interested in the Apertium project? ==
== Why is it that you are interested in the Apertium project? ==


I have participated in GSoC 2016 with Apertium, which made me involved in this project. ...
I have participated in GSoC 2016 with Apertium, which made me involved in this project, this is one of the main reasons why I am interested in contributing to Apertium. Another reason is that, being a linguist, I find it beneficial to develop linguistic tools, and Apertium gives me a good opportunity to do so.


== Which of the published tasks are you interested in? What do you plan to do? ==
== Which of the published tasks are you interested in? What do you plan to do? ==


I am planning to work on '''UD-annotatrix'''.
I am planning to work on '''UD-annotatrix'''.
This will include making a user-friendly offline interface, which would enable linguists to make syntactic annotation.
This will include making a user-friendly interface, which would enable linguists to make syntactic annotation quickly and easily.


===Reasons why Google and Apertium should sponsor it===
===Reasons why Google and Apertium should sponsor it===


Syntactic annotation is blah blah (why it'a important).
Currently ...
Currently there is an interface for doing syntactic annotation called [http://brat.nlplab.org/ brat] with both online and offline. However, the interface has a number of issues. Firstly, blah blah...

There is also a project aimed to make a toolkit for working with dependency trees [https://github.com/jonorthwash/ud-annotatrix] in Apertium. At the moment, it allows to visualize the trees (the interface works with three annotation formats, namely CONLL-U, CG3 and SD).
The aim of my project is to create an easy-to-use, quick and interactive interface tool for UD annotation. The tool should work both online and offline and allow a user to edit the annotation in both graphical and text modes.


===A description of how and who it will benefit in society===
===A description of how and who it will benefit in society===

Revision as of 14:47, 23 March 2017

Contact information

Name: Maria Sheyanova
E-mail: masha.shejanova@gmail.com
IRC: maryszmary
SourceForge: maryszmary
Phone number: +79169223114
Timezone: UTC+3

Why is it that you are interested in the Apertium project?

I have participated in GSoC 2016 with Apertium, which made me involved in this project, this is one of the main reasons why I am interested in contributing to Apertium. Another reason is that, being a linguist, I find it beneficial to develop linguistic tools, and Apertium gives me a good opportunity to do so.

Which of the published tasks are you interested in? What do you plan to do?

I am planning to work on UD-annotatrix. This will include making a user-friendly interface, which would enable linguists to make syntactic annotation quickly and easily.

Reasons why Google and Apertium should sponsor it

Syntactic annotation is blah blah (why it'a important). Currently there is an interface for doing syntactic annotation called brat with both online and offline. However, the interface has a number of issues. Firstly, blah blah...

There is also a project aimed to make a toolkit for working with dependency trees [1] in Apertium. At the moment, it allows to visualize the trees (the interface works with three annotation formats, namely CONLL-U, CG3 and SD). The aim of my project is to create an easy-to-use, quick and interactive interface tool for UD annotation. The tool should work both online and offline and allow a user to edit the annotation in both graphical and text modes.

A description of how and who it will benefit in society

The result of this work is going to be useful for linguists who deal with dependency annotation.


Field of work and available resources

Apertium has a web-interface for visualising syntactic trees written in Java-Script and HTML. The main idea of this project is to


Work plan

Overview

post application period

  • Understanding the architecture of the existing project
  • Improving my knowledge of Java-Script

community bonding period

  • Closer examination and evaluation of the tools that can be used:
    • blah;
    • blah blah blah ;
  • Thinking more about the architecture of the app

work period

  • 1st month: qwerty
  • 2nd month:
  • 3rd month:

Schedule

week 1: write scripts to get missing words for the Polish dictionary (using mostly wikisłownik and PWN, but maybe also some downloadable dictionaries)
weeks 2-3: write scripts to get translations for the bilingual dictionary (using mostly wikisłownik and online websites)
week 4: check the completeness of the dictionaries (I think I can use Russian and Polish corpora for that)
Deliverable #1
week 5-6: write the lexical choise (consider generating them automatically using corpora I have access to)
week 7: estimate the validity of the rules
week 8: start writing the transfer rules
27 June: midterm evaluations deadline
Deliverable #2
week 9-10: write the transfer rules
week 11: evaluating, testing
week 12: clean up the code, last fixes, writing documentation
Project completed: a language pair of release quality or close to it

List your skills and give evidence of your qualifications

I'm a 4th year bachelor student of Linguistic Faculty in NRU HSE (Russia).
Programming skills: Python, Bash, R, Java-Script.
Other computer skills: HTML, XML, CSS.
Languages: Russian (native), Polish, English, German, basic knowledge of Indonesian.

As a part of the coding challenge, I’ve fixed the #18 issue on the project's github.

List any non-Summer-of-Code plans you have for the Summer

I am working at my bachelor's thesis till the end of may, after that no other plans.