Difference between revisions of "User:Uliana/gsoc-propuesta"

From Apertium
Jump to navigation Jump to search
Line 52: Line 52:
 
I am interested in working on an unreleased language pair for Sicilian-Spanish translation.
 
I am interested in working on an unreleased language pair for Sicilian-Spanish translation.
   
As my coding challenge I created a new language package scn-spa, added basic vocabulary to the dictionary of Sicilian and translations into Sicilian-Spanisch dictionary.
+
As my coding challenge I created a new language package scn-spa, added basic vocabulary to the dictionary of Sicilian and translations into Sicilian-Spanisch dictionary. I am also currently working on
   
 
I also started to conduct research in the structure of Sicilian language: I have got into touch with contributors of Wikipedia in Sicilian language and thanks to ''spectei'' I also have reached computational linguist who studies in Munich and is native speaker of Sicilian.
 
I also started to conduct research in the structure of Sicilian language: I have got into touch with contributors of Wikipedia in Sicilian language and thanks to ''spectei'' I also have reached computational linguist who studies in Munich and is native speaker of Sicilian.
   
 
== Proposal and work plan ==
 
== Proposal and work plan ==
  +
  +
  +
<center>
  +
{|class="wikitable"
  +
! Period !! Week !! Description !! Commenta
  +
|-
  +
| rowspan="10" | &nbsp; Pre-work&nbsp;Period &nbsp; || &nbsp;09:00&mdash;09:30&nbsp; ||rowspan="5"| &nbsp; [[Helsinki Apertium Workshop/Session 0|0:&nbsp;Overview]] || &nbsp; '''Getting to know each other'''
  +
|-
  +
|| &nbsp;09:30&mdash;10:30&nbsp; || &nbsp; '''General introduction''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session0.pdf Machine translation]
  +
|-
  +
|| &nbsp;10:30&mdash;11:00&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;11:00&mdash;12:00&nbsp; || &nbsp; '''Introduction''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session0a.pdf The Apertium machine-translation platform]
  +
|-
  +
|| &nbsp;12:00&mdash;13:00&nbsp; || &nbsp; '''Practical''': Installing Apertium and creating a language pair
  +
  +
|-
  +
|| &nbsp;13:00&mdash;14:00&nbsp; ||colspan="2" align="center"| '''Lunch'''
  +
  +
|-
  +
|| &nbsp;14:00&mdash;14:30&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 1|1: Basic dictionaries]] || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session1.pdf Morphology and morphotactics]
  +
|-
  +
|| &nbsp;14:30&mdash;15:00&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;15:00&mdash;17:00&nbsp; || &nbsp; '''Practical''': Paradigms and continuation lexica
  +
  +
|-
  +
! style="background:darkgray" colspan="4" |
  +
|-
  +
| rowspan="9" | &nbsp; First th&nbsp;Month &nbsp; || &nbsp;09:00&mdash;10:00&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 2|2:&nbsp;Advanced&nbsp;dictionaries]] || &nbsp; '''Practical''': Creating dictionaries
  +
|-
  +
|| &nbsp;10:00&mdash;11:30&nbsp; || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session2b.pdf Morphophonology]
  +
|-
  +
|| &nbsp;11:30&mdash;12:00&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;12:00&mdash;13:00&nbsp; || &nbsp; '''Practical''': Working on morphology
  +
|-
  +
|| &nbsp;13:00&mdash;14:00&nbsp; ||colspan="2" align="center"| '''Lunch'''
  +
|-
  +
|| &nbsp;14:00&mdash;14:30&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 3|3: Morphological disambiguation]] || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session3.pdf Morphological and syntactic disambiguation]
  +
|-
  +
|| &nbsp;14:30&mdash;15:00&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;15:00&mdash;17:00&nbsp; || &nbsp; '''Practical''': Writing rules for morphological disambiguation
  +
|-
  +
! style="background:darkgray" colspan="4" |
  +
|-
  +
| rowspan="11" | &nbsp; Secondth&nbsp;Month &nbsp; || &nbsp;09:00&mdash;09:30&nbsp; ||rowspan="6"| &nbsp; [[Helsinki Apertium Workshop/Session 4|4:&nbsp;Lexical&nbsp;transfer]] || &nbsp; '''Practical''': Dictionary work
  +
|-
  +
|| &nbsp;09:30&mdash;10:00&nbsp; || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session4.pdf Lexical transfer]
  +
|-
  +
|| &nbsp;10:00&mdash;11:00&nbsp; || &nbsp; '''Practical''': Work on bilingual dictionaries
  +
|-
  +
|| &nbsp;11:00&mdash;11:30&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;11:30&mdash;12:00&nbsp; || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session4.pdf Lexical selection]
  +
|-
  +
|| &nbsp;12:00&mdash;13:00&nbsp; || &nbsp; '''Practical''': Working on lexical selection
  +
|-
  +
|| &nbsp;13:00&mdash;14:00&nbsp; ||colspan="2" align="center"| '''Lunch'''
  +
|-
  +
|| &nbsp;14:00&mdash;14:30&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 5|5:&nbsp;Structural&nbsp;transfer]] || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session5.pdf Basic structural transfer]
  +
|-
  +
|| &nbsp;14:30&mdash;15:00&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;15:00&mdash;17:00&nbsp; || &nbsp; '''Practical''': Writing rules for structural transfer
  +
  +
|-
  +
! style="background:darkgray" colspan="4" |
  +
|-
  +
|-
  +
| rowspan="8" | &nbsp; Third&nbsp;Month &nbsp; || &nbsp;09:00&mdash;9:30&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 6|6:&nbsp;Multi-level structural transfer]] || &nbsp; '''Теория''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session6.pdf Multi-level structural transfer]
  +
|-
  +
|| &nbsp;09:30&mdash;11:00&nbsp; || &nbsp; '''Practical''': Writing transfer rules
  +
|-
  +
|| &nbsp;11:00&mdash;11:30&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;11:30&mdash;13:00&nbsp; || &nbsp; '''Practical''': Writing transfer rules
  +
|-
  +
|| &nbsp;13:00&mdash;14:00&nbsp; ||colspan="2" align="center"| '''Lunch'''
  +
|-
  +
|| &nbsp;14:00&mdash;15:00&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 6|6:&nbsp;Multi-level structural transfer]] || &nbsp; '''Discussion''': Uralic comparative grammar
  +
  +
|-
  +
|| &nbsp;15:00&mdash;15:30&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;15:30&mdash;17:00&nbsp; || &nbsp; '''Practical''': Writing transfer rules
  +
|-
  +
! style="background:darkgray" colspan="4" |
  +
|-
  +
|-
  +
  +
| rowspan="8" | &nbsp; 17th&nbsp;May &nbsp; || &nbsp;09:00&mdash;09:30&nbsp; ||rowspan="4"| &nbsp; [[Helsinki Apertium Workshop/Session 7|7:&nbsp;Data consistency, quality and evaluation]] || &nbsp; '''Theory''': [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session7a.pdf Data consistency, quality] and [https://svn.code.sf.net/p/apertium/svn/branches/courses/helsinki_2013/slides/session7b.pdf evaluation]
  +
|-
  +
|| &nbsp;09:30&mdash;11:00&nbsp; || &nbsp; '''Practical''': Finding and fixing errors
  +
|-
  +
|| &nbsp;11:00&mdash;11:30&nbsp; ||align="center"| '''Coffee break'''
  +
|-
  +
|| &nbsp;11:30&mdash;13:00&nbsp; || &nbsp; '''Practical''': Finding and fixing errors
  +
|-
  +
|| &nbsp;13:00&mdash;14:00&nbsp; || colspan="2" align="center"| '''Lunch'''
  +
|-
  +
|| &nbsp;14:00&mdash;14:30&nbsp; ||rowspan="3"| &nbsp; [[Helsinki Apertium Workshop/Session 8|8:&nbsp;Project planning, questions and answers]] || &nbsp; '''Theory''': Project planning, questions and answers
  +
|-
  +
|| &nbsp;14:30&mdash;15:00&nbsp; || &nbsp; '''Practical''': Finding and fixing errors
  +
|-
  +
|| &nbsp;15:00&mdash;17:00&nbsp; || &nbsp; '''Conclusion''': Round table on making machine translation systems
  +
|-
  +
! style="background:darkgray" colspan="4" |
  +
|-
  +
|}
  +
  +
  +
  +
</center>

Revision as of 18:16, 17 March 2016

Contacts

Uliana Sentsova

E-mail: uliana.sentsova@gmail.com

Number: +7 (916) 774-95-30

Skype: ulyanasidorova

IRC channel: uliana at #apertium

Education and achievements

Lomonosov Moscow State University

Qualification: Bachelor in Linguistics (romance-german languages)

GPA: 10.0 / 10.0


National Research University „Higher School of Economics“

Qualification: Major in Natural Language Processing

Current GPA: 8.5 / 10.0


2015: Awardee of graduates’ competition „Natural Language Processing” (a competition for students hold by National Research University Higher School of Economics)

2014: Scholarship of Academic Council of MSU for scientific activities (a special award for top 10% students with academic excellence and scientific activity)

2013: Enhanced State Academic Scholarship for scientific activities (is awarded on the basis of academic excellence and scientific achievements)

Projects

„Building Open Source Information Extraction System for Russian Language”

Organisation: National Research University „Higher School of Economics”

Project roles: project manager, software developer (Python)

Description: Creating a hybrid information extraction system using rule-based approach and machine learning technologies. This system is able to extract named entities (persons, locations and organizations) and will become a part of stack technology for NLP developed by National Research University „Higher School of Economics”. At this moment in time the system has 93% precision (evaluated by Dialogue Evaluation Conference on 37 000 annotated texts).


My interest in Machine Translation

My interest in Apertium projects

I am interested in working on an unreleased language pair for Sicilian-Spanish translation.

As my coding challenge I created a new language package scn-spa, added basic vocabulary to the dictionary of Sicilian and translations into Sicilian-Spanisch dictionary. I am also currently working on

I also started to conduct research in the structure of Sicilian language: I have got into touch with contributors of Wikipedia in Sicilian language and thanks to spectei I also have reached computational linguist who studies in Munich and is native speaker of Sicilian.

Proposal and work plan

Period Week Description Commenta
  Pre-work Period    09:00—09:30    0: Overview   Getting to know each other
 09:30—10:30    General introduction: Machine translation
 10:30—11:00  Coffee break
 11:00—12:00    Introduction: The Apertium machine-translation platform
 12:00—13:00    Practical: Installing Apertium and creating a language pair
 13:00—14:00  Lunch
 14:00—14:30    1: Basic dictionaries   Theory: Morphology and morphotactics
 14:30—15:00  Coffee break
 15:00—17:00    Practical: Paradigms and continuation lexica
  First th Month    09:00—10:00    2: Advanced dictionaries   Practical: Creating dictionaries
 10:00—11:30    Theory: Morphophonology
 11:30—12:00  Coffee break
 12:00—13:00    Practical: Working on morphology
 13:00—14:00  Lunch
 14:00—14:30    3: Morphological disambiguation   Theory: Morphological and syntactic disambiguation
 14:30—15:00  Coffee break
 15:00—17:00    Practical: Writing rules for morphological disambiguation
  Secondth Month    09:00—09:30    4: Lexical transfer   Practical: Dictionary work
 09:30—10:00    Theory: Lexical transfer
 10:00—11:00    Practical: Work on bilingual dictionaries
 11:00—11:30  Coffee break
 11:30—12:00    Theory: Lexical selection
 12:00—13:00    Practical: Working on lexical selection
 13:00—14:00  Lunch
 14:00—14:30    5: Structural transfer   Theory: Basic structural transfer
 14:30—15:00  Coffee break
 15:00—17:00    Practical: Writing rules for structural transfer
  Third Month    09:00—9:30    6: Multi-level structural transfer   Теория: Multi-level structural transfer
 09:30—11:00    Practical: Writing transfer rules
 11:00—11:30  Coffee break
 11:30—13:00    Practical: Writing transfer rules
 13:00—14:00  Lunch
 14:00—15:00    6: Multi-level structural transfer   Discussion: Uralic comparative grammar
 15:00—15:30  Coffee break
 15:30—17:00    Practical: Writing transfer rules
  17th May    09:00—09:30    7: Data consistency, quality and evaluation   Theory: Data consistency, quality and evaluation
 09:30—11:00    Practical: Finding and fixing errors
 11:00—11:30  Coffee break
 11:30—13:00    Practical: Finding and fixing errors
 13:00—14:00  Lunch
 14:00—14:30    8: Project planning, questions and answers   Theory: Project planning, questions and answers
 14:30—15:00    Practical: Finding and fixing errors
 15:00—17:00    Conclusion: Round table on making machine translation systems