User:Kiara/GSoC'16 Proposal

From Apertium
< User:Kiara
Revision as of 22:08, 14 March 2016 by Kiara (talk | contribs)
Jump to navigation Jump to search

Name: Kira Droganova E-mail address: kira.droganova@gmail.com Other information that may be useful to contact you: #apertium IRC channel: Kira (Kiara)

Why is it you are interested in machine translation?

I'm getting my Master's degree in Computational Linguistics in Higher School of Economics (Moscow) and I think that Machine translation is one of the most complex areas of computational linguistics. And at the same time it is one of the most practical tools. I like these features of machine translation. People really need MT tools in different areas of life and it means that the tools have to have a high quality.

Why is it that you are interested in the Apertium project?

I like the idea of Apertium. It is great that anyone has a chance to take part in this project. At first, It seems that it is impossible to start working in machine translation without any experience in this area. However, Apertium is greatly documented and the team always helps new people. Both things are very important to graduates and people who had just started to work in machine translation. One of the greatest features is the ease of adaption of a new language pair. In my opinion, it is an extremely important feature of this project and I also like the idea of general rules for closely related languages.

Which of the published tasks are you interested in? What do you plan to do?

I'm interested in Apertium website improvements tasks. I think, I can do all tasks, which are placed at the ides for GSoC page/ Apertium website improvements. However, it partly depends on the readiness of the back-end functionality. I think I can do both. Please, see the schedule details in my proposal.


Apertium website improvements

New features provide benefits both to Apertium users and Apertium team.

Apertium website users will get the improved tool which provides a new dictionary lookup mode which is the second important feature after translation itself.

The feedback feature is important to Apertium team. Apertium team will be able to know more about Apertium from users and the tool obtains more testing from people who don't have technical background.

Both the feedback page and reliability visualisation make the site more user-friendly thus it will grow to one of the coolest online translation tools.

I am cool and highly motivated. I can develop many useful features in Apertium. If you help me to start in MT, I will not miss my chance.


I propose this schedule:

Preparation (22th of April - 22th of May):

i. To ask mentors about 'must-know' information

ii. To learn how to use Tornado framework

iii. To inspect the html, css, bootstrap and js

iv. To inspect the python scripts

v. To try Language identification feature


Coding (25th of May - 23th of August):

Week 1: Feedback feature (Discussion and development)

Week 2: "Dictionary lookup" mode (Discussion and back-end development, ranking algorithm development)

Week 3: "Dictionary lookup" mode: (Discussion and front-end development, bug fixing and testing)

Week 4: Language detection feature (Discussion and development)

Deliverable #1 : Feedback feature and Dictionary lookup feature

Week 5: Language detection: "did you mean" function

Week 6: Reliability visualisation: a translation color depends on how reliable it is (Discussion, algorithm and development)

Week 7: Reliability visualisation (bug fixing, testing and documenting )

Week 8: RBMT summer school

Deliverable #2: Language detection feature and Reliability visualisation feature

Week 9: RBMT summer school

Week 10: Webpage translation (Some buttons/labels are written only in English: Translate a document, Instant translation)

Week 11: Bug fix and documentation

Week 12: Bug fix and documentation

Project completed


List of technologies: python 3, html, css, jQuery, Bootstrap

List of projects:

1. Service which suggests Zaliznyak's grammatical indexes for "new Russian words".

http://web-corpora.net/wsgi3/GDictionary/

I developed back-end, front-end and some of Flask functions.

2. I trained a dependency parsing model for Russian with MaltParser and MyStem tagset.

My paper was published in Proceedings of the AINL-ISMW FRUCT:

Kira Droganova, Building a Dependency Parsing Model for Russian with MaltParser and MyStem Tagset In Proceedings of the AINL-ISMW FRUCT, Saint-Petersburg, Russia, 9-14 November 2015, ITMO University, FRUCT, Finland. ISBN 978-5-7577-0493-7

3. Syntactic parser for Russian

http://web-corpora.net/wsgi3/ru-syntax/ I trained a new syntactic model and improved the quality, prepared and tested segmentation rules and worked with quality metrics.

4. I am a member of Russian UD team. I am working on conversion rules for morphological tag sets now.

5. I also did Apertium coding challenges. I sent a pull request and a diff to Apertium website improvements mentors.

This is the link to my answer: https://github.com/Kira-D/apertium-html-tools/tree/GSoCChallenges