Difference between revisions of "User:GD/proposal"

From Apertium
Jump to navigation Jump to search
Line 35: Line 35:
   
 
== Week by week work plan ==
 
== Week by week work plan ==
'''Week 0: until 05/29 : '''Preparation
+
'''Week 0: '''Preparation
Get familiar with Apertium system in details (wiki-sources, installing, creating files etc)
 
Get a corpora of texts for future test and frequency list by using both Wikipedia and
 
Latin and classic texts by Caesar, Cicero, Vergilius and others.
 
Plan every step and write down everything as formally as it is possible (in natural language)
 
Discuss details with a mentor
 
U\V and I\J problem
 
   
 
=== First phase ===
 
=== First phase ===
   
'''Week 1: 05\30 – 06\05 : '''
+
'''Week 1: '''
Dictionary: nouns & adjectives
 
(they have same declension patterns)
 
Add nouns to dictionary (monodix and bidix) - Latin dictionary suppose much more work
 
Describe morphology (add missed categories and paradigms)
 
Add adjectives (they are closely related to nouns)
 
   
'''Week 2: 06\07 – 06\12 : '''
+
'''Week 2: '''
Dictionary: verbs
 
Add verbs to dictionary (monodix and bidix)
 
Plan how to convert basic times
 
   
'''Week 3: 06\13 – 06\19 : '''
+
'''Week 3:'''
Transfer rules
 
Start writing transfer rules (editing bidix to add missed necessary categories)
 
Write basic transfer rules related to morphological transfers
 
Similar cases (case systems of these languages have a lot in common)
 
 
'''Week 4: 06\20 – 06\26 : '''
 
Extend dictionary
 
Add word from other classes to the dictionary (especially, check closed classes)
 
Finish all work scheduled for this period
 
Prepare for the first evaluation
 
Prepare detailed theoretical basis for the next phase
 
 
<p>'''Comment: first part is meant to be mostly technical and consist of some general and routine work.'''</p>
 
<p>'''Results: dictionary data, basic rules, morphological system, first testing'''</p>
 
   
 
'''Week 4: '''
 
 
=== Second phase ===
 
=== Second phase ===
   
'''Week 5: 06\27 – 07\03 : '''
+
'''Week 5: '''
Syntactic rules (word order)
 
Solve general word order and case problems
 
   
'''Week 6: 07\04 – 07\10 : '''
+
'''Week 6: '''
Structures
 
Add basic structures as accusativus cum infinitivo, ablativus absolutus etc
 
Check and add popular collocations like 'res publica'
 
   
'''Week 7: 07\11 – 07\17 : '''
+
'''Week 7: '''
Extend dictionary
 
Add more words from open classes (monodix and bidix)
 
   
'''Week 8: 07\18 – 07\24 : '''
+
'''Week 8: '''
Context based disambiguation
 
 
<p>'''Comment: second part is meant to be main part that suppose working on translation algorithms.'''</p>
 
<p>'''Results: extended dictionary data, syntactic rules, beta version of the system is ready to be used, beta testing'''</p>
 
   
 
=== Third phase ===
 
=== Third phase ===
   
'''Week 9: 07\25 – 07\31 : '''
+
'''Week 9: '''
Syntactic rules 2
 
Extend number of syntactic rules
 
Testing
 
 
'''Week 10: 08\01 – 08\07 : '''
 
Testing
 
Fixing issues that would appear
 
Extending data or rules (depending on previous results)
 
 
'''Week 11: 08\08 – 08\14 : '''
 
Vacation
 
I will be able to do some work, I will have a laptop but may have some troubles with internet access.
 
   
'''Week 12: 08\15 – 08\21 : '''
+
'''Week 10:'''
Final work on details
 
Put everything in order
 
Write documentation
 
   
 
'''Week 11: '''
<p>'''Comment: improving system as much as it possible'''</p>
 
<p>'''Results: all rules written, final version of the system, testing, bugs fixed'''</p>
 
   
  +
'''Week 12: '''
   
 
'''Final evaluation'''
 
'''Final evaluation'''

Revision as of 17:20, 17 March 2018

Contact information

Name: Evgenii Glazunov

Location: Moscow, Russia

University: NRU HSE, Moscow (National Research University Higher School of Economics), 3rd-year student

E-mail: glaz.dikobraz@gmail.com

IRC: G_D

Timezone: UTC+3

Github: https://github.com/dkbrz

Am I good enough?

Education: Bachelor's Degree in Fundamental and Computational Linguistics (2015-2019) at NRU HSE

Courses:

  • Programming (Python, R, Flask, HTML,xml, Machine Learning)
  • Morphology, Syntax, Semantics, Typology/Language Diversity
  • Mathematics (Discrete Mathemathics, Linear Algebra and Calculus, Probability Theory, Mathematical Statistics, Computability and Complexity, Logic, Graphs and Topology)
  • Latin, Latin in modern Linguistics, Ancient Literature

Languages: Russian (native), English (academic), French(A2-B1), Latin (a bit), German (A1)

Personal qualities: responsibility, punctuality, being hard-working, passion for programming, perseverance, resistance to stress

Why is it I am interested in machine translation? Why is it that I am interested in Apertium?

The speed of information circulation does not allow to spend time on human translation. I am truly interested in formal methods and models because they represent the way any language is constructed (as I see it). Despite some exceptions, in general language is very logical and the main problem is how to find proper systematic description. Apertium is a powerful platform that allows to build impressive rule-based engines. Languages like Latin are well-ordered, particularly their morphology, so it makes rule-based translation very promising.

Which of the published tasks am I interested in? What do I plan to do?

I would like to add Latin-Russian language pair. I plan to do my best to reach high results, more details are given in Proposal part.

Proposal

I want to work on Graph dictionaries

Why Google and Apertium should sponsor it? How and who it will benefit in society?

I think there is a lot of math in language and graph representation of dictionaries is an exciting idea, because it adds some kind of cross-validation and internal system source of information. This information help to fill some lacunae that appear while creating a dictionary. This will improve a quality of translation as we manage to expand bidix.

Coding Challenge

Week by week work plan

Week 0: Preparation

First phase

Week 1:

Week 2:

Week 3:

Week 4:

Second phase

Week 5:

Week 6:

Week 7:

Week 8:

Third phase

Week 9:

Week 10:

Week 11:

Week 12:

Final evaluation

Non-Summer-of-Code plans you have for the Summer

GSoC is the only project I have this summer. I have a couple of exams on Week 4 so I planned a task that would be possible at that time and I planned vacation on Week 11 and scheduled more work in July and the beginning of August when I will be able to work more.