User:Deltamachine/proposal

1 Contact information
2 Skills and experience
3 Why is it you are interested in machine translation?
4 Why is it that you are interested in Apertium?
5 Which of the published tasks are you interested in? What do you plan to do?
6 Reasons why Google and Apertium should sponsor it
7 A description of how and who it will benefit in society
8 Work plan
9 Non-Summer-of-Code plans you have for the Summer
10 Coding challenge

Contact information

Name: Anna Kondratjeva

Location: Moscow, Russia

E-mail: an-an-kondratjeva@yandex.ru

Phone number: +79250374221

Github: http://github.com/deltamachine

IRC: deltamachine

Timezone: UTC+3

Skills and experience

Education: Bachelor's Degree in Fundamental and Computational Linguistics (2015 - expected 2019), National Research University «Higher School of Economics» (NRU HSE)

Main university courses:

Theory of Language (Phonetics, Morphology, Syntax, Semantics)
Programming (Python)
Computer Tools for Linguistic Research
Language Diversity and Typology
Introduction to Data Analysis
Math (Discrete Math, Linear Algebra and Calculus, Probability Theory and Mathematical Statistics)

Technical skills: Python (experienced, 1.5 years), HTML, CSS, Flask, Django, SQLite (familiar)

Projects and experience: http://github.com/deltamachine

Languages: Russian (native), English, German

Why is it you are interested in machine translation?

Why is it that you are interested in Apertium?

Which of the published tasks are you interested in? What do you plan to do?

I would like to implement a prototype shallow syntactic function labeller.

Reasons why Google and Apertium should sponsor it

A description of how and who it will benefit in society

Work plan

Post application period

Getting closer with Apertium, reading documentation, playing around with its tools
Setting up Linux and getting used to it
Learning more about UD treebanks
Learning more about machine learning

Community bonding period

Choosing language pairs, with which shallow function labeller will work.
Choosing the most appropriate Python ML library (maybe it will be Tensorflow, maybe not)

Work period

Week 1:
Week 2:
Week 3:
Week 4:
Deliverable #1, June 26 - 30:
Week 5:
Week 6:
Week 7:
Week 8:
Deliverable #2, July 24 - 28:
Week 9:
Week 10:
Week 11:
Week 12:
Project completed

Non-Summer-of-Code plans you have for the Summer

I have exams in the university till 3rd week of June, so I will be able to work only 20-25 hours per week. But I will try to pass as many exams as possible ahead of schedule, so it may be changed. After that I will be able to work full time and spend 45-50 hours per week on the task.

Coding challenge

https://github.com/deltamachine/wannabe_hackerman

flatten_conllu.py: A script that takes a dependency treebank in UD format and "flattens" it, that is, applies the following transformations:

Words with the @conj relation take the label of their head
Words with the @parataxis relation take the label of their head

calculate_accuracy_index.py: A script that does the following:

Takes -train.conllu file and calculates the table: surface_form - label - frequency
Takes -dev.corpus and for each token assigns the most frequent label from the table
Calculates the accuracy index

label_asf: A script that takes a sentence in Apertium stream format and for each surface form applies the most frequent label from the labelled corpus.

User:Deltamachine/proposal

Contents

Contact information

Skills and experience

Why is it you are interested in machine translation?

Why is it that you are interested in Apertium?

Which of the published tasks are you interested in? What do you plan to do?

Reasons why Google and Apertium should sponsor it

A description of how and who it will benefit in society

Work plan

Post application period

Community bonding period

Work period

Non-Summer-of-Code plans you have for the Summer

Coding challenge

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools