User:Mary.szmary/proposal2017

From Apertium
< User:Mary.szmary
Revision as of 16:54, 22 March 2017 by Mary.szmary (talk | contribs) (Created page with "== Contact information == '''Name''': Maria Sheyanova<br /> '''E-mail''': masha.shejanova@gmail.com<br /> '''IRC''': mary-szmary<br /> '''SourceForge''': maryszmary<br /> '''...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Contact information

Name: Maria Sheyanova
E-mail: masha.shejanova@gmail.com
IRC: mary-szmary
SourceForge: maryszmary
Phone number: +79169223114
Timezone: UTC+3

Why is it that you are interested in the Apertium project?

I have participated in GSoC 2016 with Apertium, which made me involved in this project. ...

Which of the published tasks are you interested in? What do you plan to do?

UD-annotatrix

Reasons why Google and Apertium should sponsor it

Currently ...

A description of how and who it will benefit in society

The result of this work is going to be useful for linguists who deal with dependency annotation.


Field of work and available resources

Apertium has a web-interface for visualising syntactic trees written in Java-Script and HTML. The main idea of this project is to


Work plan

Overview

post application period

  • Understanding the architecture of the existing project
  • Improving my knowledge of Java-Script

community bonding period

  • Closer examination and evaluation of the tools that can be used:
    • blah;
    • blah blah blah ;
  • Thinking more about the architecture of the app

work period

  • 1st month: qwerty
  • 2nd month:
  • 3rd month:

Schedule

week 1: write scripts to get missing words for the Polish dictionary (using mostly wikisłownik and PWN, but maybe also some downloadable dictionaries)
weeks 2-3: write scripts to get translations for the bilingual dictionary (using mostly wikisłownik and online websites)
week 4: check the completeness of the dictionaries (I think I can use Russian and Polish corpora for that)
Deliverable #1
week 5-6: write the lexical choise (consider generating them automatically using corpora I have access to)
week 7: estimate the validity of the rules
week 8: start writing the transfer rules
27 June: midterm evaluations deadline
Deliverable #2
week 9-10: write the transfer rules
week 11: evaluating, testing
week 12: clean up the code, last fixes, writing documentation
Project completed: a language pair of release quality or close to it

List your skills and give evidence of your qualifications

I'm a 3rd year bachelor student of Linguistic Faculty in NRU HSE (Russia).
Languages: Russian (native), Polish, English, Toki Pona :), German, basic knowledge of Indonesian.
Programming skills: Python (both 2nd and 3rd), R, basic knowledge of bash.
Other computer skills: HTML, XML, CSS.

As a part of the coding challenge, I’ve done the following:

  • added prepositions to bidix using the polish version of wiktionary (30 entries)
  • added adverbs (about 1100 entries), adjectives (about 7500 entries), conjunctions (about 150 entries), numerals and nouns (about 12 000 entries) to bidix by authomatic requests to an online-dictionary
  • wrote a couple of lexical choice rules

All scripts and materials for the coding challenge are here.

List any non-Summer-of-Code plans you have for the Summer

I have exams till 3rd-4th weeks of June so I won't be able to work full-time at this period, but I can spend 20-25 hours per week on the task. After the end of exams I'm going to visit my parents for some 4-5 days and also would be able to spend only 25-30 hours per week on the task. After that I'm ready to work full time and spend up to 45-50 hours on the task.