Difference between revisions of "User:GD/proposal"
Line 35: | Line 35: | ||
== Week by week work plan == |
== Week by week work plan == |
||
'''Week 0 |
'''Week 0: '''Preparation |
||
Get familiar with Apertium system in details (wiki-sources, installing, creating files etc) |
|||
Get a corpora of texts for future test and frequency list by using both Wikipedia and |
|||
Latin and classic texts by Caesar, Cicero, Vergilius and others. |
|||
Plan every step and write down everything as formally as it is possible (in natural language) |
|||
Discuss details with a mentor |
|||
U\V and I\J problem |
|||
=== First phase === |
=== First phase === |
||
'''Week 1 |
'''Week 1: ''' |
||
Dictionary: nouns & adjectives |
|||
(they have same declension patterns) |
|||
Add nouns to dictionary (monodix and bidix) - Latin dictionary suppose much more work |
|||
Describe morphology (add missed categories and paradigms) |
|||
Add adjectives (they are closely related to nouns) |
|||
'''Week 2 |
'''Week 2: ''' |
||
Dictionary: verbs |
|||
Add verbs to dictionary (monodix and bidix) |
|||
Plan how to convert basic times |
|||
'''Week 3: |
'''Week 3:''' |
||
Transfer rules |
|||
Start writing transfer rules (editing bidix to add missed necessary categories) |
|||
Write basic transfer rules related to morphological transfers |
|||
Similar cases (case systems of these languages have a lot in common) |
|||
⚫ | |||
Extend dictionary |
|||
Add word from other classes to the dictionary (especially, check closed classes) |
|||
Finish all work scheduled for this period |
|||
Prepare for the first evaluation |
|||
Prepare detailed theoretical basis for the next phase |
|||
<p>'''Comment: first part is meant to be mostly technical and consist of some general and routine work.'''</p> |
|||
<p>'''Results: dictionary data, basic rules, morphological system, first testing'''</p> |
|||
⚫ | |||
⚫ | |||
=== Second phase === |
=== Second phase === |
||
'''Week 5 |
'''Week 5: ''' |
||
Syntactic rules (word order) |
|||
Solve general word order and case problems |
|||
'''Week 6 |
'''Week 6: ''' |
||
Structures |
|||
Add basic structures as accusativus cum infinitivo, ablativus absolutus etc |
|||
Check and add popular collocations like 'res publica' |
|||
'''Week 7 |
'''Week 7: ''' |
||
Extend dictionary |
|||
Add more words from open classes (monodix and bidix) |
|||
'''Week 8 |
'''Week 8: ''' |
||
Context based disambiguation |
|||
<p>'''Comment: second part is meant to be main part that suppose working on translation algorithms.'''</p> |
|||
<p>'''Results: extended dictionary data, syntactic rules, beta version of the system is ready to be used, beta testing'''</p> |
|||
=== Third phase === |
=== Third phase === |
||
'''Week 9 |
'''Week 9: ''' |
||
Syntactic rules 2 |
|||
Extend number of syntactic rules |
|||
⚫ | |||
'''Week 10: 08\01 – 08\07 : ''' |
|||
Testing |
|||
Fixing issues that would appear |
|||
Extending data or rules (depending on previous results) |
|||
⚫ | |||
Vacation |
|||
I will be able to do some work, I will have a laptop but may have some troubles with internet access. |
|||
'''Week |
'''Week 10:''' |
||
Final work on details |
|||
Put everything in order |
|||
Write documentation |
|||
⚫ | |||
<p>'''Comment: improving system as much as it possible'''</p> |
|||
<p>'''Results: all rules written, final version of the system, testing, bugs fixed'''</p> |
|||
'''Week 12: ''' |
|||
'''Final evaluation''' |
'''Final evaluation''' |
Revision as of 17:20, 17 March 2018
Contents
- 1 Contact information
- 2 Am I good enough?
- 3 Why is it I am interested in machine translation? Why is it that I am interested in Apertium?
- 4 Which of the published tasks am I interested in? What do I plan to do?
- 5 Proposal
- 6 Why Google and Apertium should sponsor it? How and who it will benefit in society?
- 7 Coding Challenge
- 8 Week by week work plan
- 9 Non-Summer-of-Code plans you have for the Summer
Contact information
Name: Evgenii Glazunov
Location: Moscow, Russia
University: NRU HSE, Moscow (National Research University Higher School of Economics), 3rd-year student
E-mail: glaz.dikobraz@gmail.com
IRC: G_D
Timezone: UTC+3
Github: https://github.com/dkbrz
Am I good enough?
Education: Bachelor's Degree in Fundamental and Computational Linguistics (2015-2019) at NRU HSE
Courses:
- Programming (Python, R, Flask, HTML,xml, Machine Learning)
- Morphology, Syntax, Semantics, Typology/Language Diversity
- Mathematics (Discrete Mathemathics, Linear Algebra and Calculus, Probability Theory, Mathematical Statistics, Computability and Complexity, Logic, Graphs and Topology)
- Latin, Latin in modern Linguistics, Ancient Literature
Languages: Russian (native), English (academic), French(A2-B1), Latin (a bit), German (A1)
Personal qualities: responsibility, punctuality, being hard-working, passion for programming, perseverance, resistance to stress
Why is it I am interested in machine translation? Why is it that I am interested in Apertium?
The speed of information circulation does not allow to spend time on human translation. I am truly interested in formal methods and models because they represent the way any language is constructed (as I see it). Despite some exceptions, in general language is very logical and the main problem is how to find proper systematic description. Apertium is a powerful platform that allows to build impressive rule-based engines. Languages like Latin are well-ordered, particularly their morphology, so it makes rule-based translation very promising.
Which of the published tasks am I interested in? What do I plan to do?
I would like to add Latin-Russian language pair. I plan to do my best to reach high results, more details are given in Proposal part.
Proposal
I want to work on Graph dictionaries
Why Google and Apertium should sponsor it? How and who it will benefit in society?
I think there is a lot of math in language and graph representation of dictionaries is an exciting idea, because it adds some kind of cross-validation and internal system source of information. This information help to fill some lacunae that appear while creating a dictionary. This will improve a quality of translation as we manage to expand bidix.
Coding Challenge
Week by week work plan
Week 0: Preparation
First phase
Week 1:
Week 2:
Week 3:
Week 4:
Second phase
Week 5:
Week 6:
Week 7:
Week 8:
Third phase
Week 9:
Week 10:
Week 11:
Week 12:
Final evaluation
Non-Summer-of-Code plans you have for the Summer
GSoC is the only project I have this summer. I have a couple of exams on Week 4 so I planned a task that would be possible at that time and I planned vacation on Week 11 and scheduled more work in July and the beginning of August when I will be able to work more.