Difference between revisions of "User:Uliana/gsoc-propuesta"

Revision as of 18:01, 17 March 2016

Contacts

Uliana Sentsova

E-mail: uliana.sentsova@gmail.com

Number: +7 (916) 774-95-30

Skype: ulyanasidorova

IRC channel: uliana at #apertium

Education

Lomonosov Moscow State University

Qualification: Bachelor in Linguistics (romance-german languages)

GPA: 10.0 / 10.0

National Research University „Higher School of Economics“

Qualification: Major in Natural Language Processing

Current GPA: 8.5 / 10.0

2015: Awardee of graduates’ competition „Natural Language Processing” (a competition for students hold by National Research University Higher School of Economics)

2014: Scholarship of Academic Council of MSU for scientific activities (a special award for top 10% students with academic excellence and scientific activity)

2013: Enhanced State Academic Scholarship for scientific activities (is awarded on the basis of academic excellence and scientific achievements)

Projects

„Building Open Source Information Extraction System for Russian Language”

Organisation: National Research University „Higher School of Economics”

Project roles: project manager, software developer (Python)

Description: Creating a hybrid information extraction system using rule-based approach and machine learning technologies. This system is able to extract named entities (persons, locations and organizations) and will become a part of stack technology for NLP developed by National Research University „Higher School of Economics”.

As a project coordinator I’m responsible for goals setting, their allocation and setting deadlines, cooperation with other research groups (for example, coreference resolution project), as well as project’s documentation maintenance.

As a software developer I’m responsible for developing of rule-based module, that currently is test mode operating. I built an ontology of named entities, their synonyms and abbreviations with regard to rich morphology of Russian language. I developed a module that allows to index and tokenize input text, analyze features of each token and extract information about named entities and their attributives on a basis of high precision rules. The module has 93% precision (evaluated by Dialogue Evaluation Conference on 37 000 annotated texts).

My interest in Machine Translation

My interest in Apertium projects

I am interested in working on an unreleased language pair for Sicilian - Spanish languages. As my coding challenge I created a new language package scn-spa, added basic vocabulary to the dictionary of Sicilian and translations into Sicilian-Spanisch dictionary. I also started to conduct research in the structure of Sicilian language: I have got into touch with contributors of Wikipedia in Sicilian language and thanks to spectei I also have reached computational linguist who studies in Munich and is native speaker of Sicilian.

@@ Line 42: / Line 42: @@
 '''Description:''' Creating a hybrid information extraction system using rule-based approach and machine learning technologies. This system is able to extract named entities (persons, locations and organizations) and will become a part of stack technology for NLP developed by National Research University  „Higher School of Economics”.
+As a '''project coordinator''' I’m responsible for goals setting, their allocation and setting deadlines, cooperation with other research groups (for example, coreference resolution project), as well as project’s documentation maintenance.
+As a '''software developer''' I’m responsible for developing of rule-based module, that currently is test mode operating. I built an ontology of named entities, their synonyms and abbreviations with regard to rich morphology of Russian language. I developed a module that allows to index and tokenize input text, analyze features of each token and extract information about named entities and their attributives on a basis of high precision rules. The module has 93% precision (evaluated by Dialogue Evaluation Conference on 37 000 annotated texts).
 == My interest in Machine Translation ==
@@ Line 49: / Line 54: @@
 == My interest in Apertium projects ==
-I am interested in working an unreleased language pair for Sicilian - Spanish languages.
+I am interested in working on an unreleased language pair for Sicilian - Spanish languages.
 As my coding challenge I created a new language package scn-spa, added basic vocabulary to the dictionary of Sicilian and translations into Sicilian-Spanisch dictionary.
 I also started to conduct research in the structure of Sicilian language: I have got into touch with contributors of Wikipedia in Sicilian language and thanks to ''spectei'' I also have reached computational linguist who studies in Munich and is native speaker of Sicilian.

Difference between revisions of "User:Uliana/gsoc-propuesta"

Revision as of 18:01, 17 March 2016

Contents

Contacts

Education

Projects

My interest in Machine Translation

My interest in Apertium projects

Proposal and work plan

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools