User:GD/proposal

From Apertium
Jump to navigation Jump to search

Contact information

Name: Evgenii Glazunov

Location: Moscow, Russia

University: NRU HSE, Moscow (National Research University Higher School of Economics), 3rd-year student

E-mail: glaz.dikobraz@gmail.com

IRC: G_D

Timezone: UTC+3

Github: https://github.com/dkbrz

Am I good enough?

Education: Bachelor's Degree in Fundamental and Computational Linguistics (2015-2019) at NRU HSE

Courses:

  • Programming (Python, R, Flask, HTML, xml, Machine Learning)
  • Morphology, Syntax, Semantics, Typology/Language Diversity
  • Mathematics (Discrete Mathemathics, Linear Algebra and Calculus, Probability Theory, Mathematical Statistics, Computability and Complexity, Logic, Graphs and Topology, Theory of Algorithms)
  • Latin, Latin in modern Linguistics, Ancient Literature

Languages: Russian (native), English (academic), French(A2-B1), Latin (a bit), German (A1)

Personal qualities: responsibility, punctuality, being hard-working, passion for programming, perseverance, resistance to stress

Why is it I am interested in machine translation? Why is it that I am interested in Apertium?

The speed of information circulation does not allow to spend time on human translation. I am truly interested in formal methods and models because they represent the way any language is constructed (as I see it). Despite some exceptions, in general language is very logical and the main problem is how to find proper systematic description. Apertium is a powerful platform that allows to build impressive rule-based engines. I think rule-based translation very promising if we provide enough data and an effective analysis

Which of the published tasks am I interested in? What do I plan to do?

I want to work on Graph dictionaries

Proposal

Why Google and Apertium should sponsor it? How and who it will benefit in society?

I think there is a lot of math in language and graph representation of dictionaries is an exciting idea, because it adds some kind of cross-validation and internal system source of information. This information help to fill some lacunae that appear while creating a dictionary. This will improve a quality of translation as we manage to expand bidix.

Graph representation is very promising because it represents a philosophical model of a metalanguage knowledge. Knowing several languages, I know that it could be hard to recall some rare word and it is easier to translate from French to English and only then to Russian - because I forgot the word-pair between Russian and French. This graph representation works just like my memory: we cannot recall what is this word from L1 in L2. Hmm, we know L1-L3 and L3-L2. Oh, that's the link we need. Now we know L1-L3 word-pair. So, as we work on natural language processing, let's use natural instruments and systems as well.

Coding Challenge

ipynb with current state of my coding challenge

Week by week work plan

Week 0: Preparation

First phase

Week 1:

Week 2:

Week 3:

Week 4:

Second phase

Week 5:

Week 6:

Week 7:

Week 8:

Third phase

Week 9:

Week 10:

Week 11:

Week 12:

Final evaluation

Non-Summer-of-Code plans you have for the Summer

GSoC is the only project I have this summer. I have some exams in the end of June.