Difference between revisions of "User:Uliana/gsoc-propuesta"

From Apertium
Jump to navigation Jump to search
Line 67: Line 67:
- I created a monolingual package for Sicilian language and a bilingual package for Sicilian-Spanish language pair;
- I created a monolingual package for Sicilian language and a bilingual package for Sicilian-Spanish language pair;


- I expanded the dictionary with the basic paradigms and most frequent words of Sicilian language (basic verbs, nouns and adjectives, pronouns, preposition and some adverbs);
- I expanded the dictionary with basic paradigms and most frequent words of Sicilian language (frequent verbs, nouns and adjectives, pronouns, preposition and some adverbs);


- I added the respective translations to the Sicilian-Spanish dictionary;
- I added the respective translations to the Sicilian-Spanish dictionary;

Revision as of 21:48, 19 March 2016

Contacts

Uliana Sentsova

E-mail: uliana.sentsova@gmail.com

Number: +7 (916) 774-95-30

Skype: ulyanasidorova

IRC channel: uliana at #apertium

Education and achievements

Lomonosov Moscow State University

Qualification: Bachelor in Linguistics (romance-german languages)

GPA: 10.0 / 10.0


National Research University „Higher School of Economics“

Qualification: Major in Natural Language Processing

Current GPA: 8.5 / 10.0


2015: Awardee of graduates’ competition „Natural Language Processing” (a competition for students hold by National Research University Higher School of Economics)

2014: Scholarship of Academic Council of MSU for scientific activities (a special award for top 10% students with academic excellence and scientific activity)

2013: Enhanced State Academic Scholarship for scientific activities (is awarded on the basis of academic excellence and scientific achievements)

Relevant Experience

Building Open Source Information Extraction System for Russian Language Project

Organisation: National Research University „Higher School of Economics”

Project roles: project manager, software developer (Python)

Description: Creating a hybrid information extraction system using rule-based approach and machine learning technologies. This system is able to extract named entities (persons, locations and organizations) and will become a part of stack technology for NLP developed by National Research University „Higher School of Economics”. At this moment in time the system has 93% precision (evaluated by Dialogue Evaluation Conference on 37 000 annotated texts).


My interest in Machine Translation

Machine Translation is far from being a solved problem. In spite of appearance of many statistical approaches to machine translation, it doesn't cover a lot of aspects of language structure so far. First of all, it doesn't cover languages with small language community due to insufficiency of collected data. Beside that, it doesn't really take into account all the differences in structure of both language.

My interest in Apertium projects

I am interested in working on an unreleased language pair for Sicilian-Spanish translation.

My coding challenge

General goals of my coding challenge were:

- to introduce myself to the community;

- to understand the principles of how Apertiums developers team works;

- to get myself familiar with the architecture of the platform;

- to lay the foundations of the project I could accomplish in the summer time.

Regarding the prospective project, I have accomplished following tasks:

- I created a monolingual package for Sicilian language and a bilingual package for Sicilian-Spanish language pair;

- I expanded the dictionary with basic paradigms and most frequent words of Sicilian language (frequent verbs, nouns and adjectives, pronouns, preposition and some adverbs);

- I added the respective translations to the Sicilian-Spanish dictionary;

- I accomplished a translation of the story from Italian to Sicilian language;

- I prepared some important resources regarding the structure of the Sicilian language. This resources include a list of Sicilian words from parsed Sicilian Wiktionary, grammar books about Standard Sicilian Language and research articles about difference between Sicilian and other romance languages. I also have got into touch with contributors of Wikipedia in Sicilian language and thanks to spectre I also have reached computational linguist who studies in Munich and explored the HFST for Sicilian verbs.

During the accomplishment of coding challenge I commited in svn all the changes I made in the respective packages.

Proposal and work plan

Pre-work period

Review and improve on the technical skills required for the project. Study the architecture of Apertium in the detail. Extend my knowledge of Standard Sicilian language.


First month

Main goal