Difference between revisions of "User:GD/proposal"

From Apertium
Jump to navigation Jump to search
Line 30: Line 30:
   
 
== Week by week work plan ==
 
== Week by week work plan ==
Week 0: until 05/29
+
'''Week 0: until 05/29 : '''Preparation
 
Get familiar with Apertium system in details (wiki-sources, installing, creating files etc)
Preparation
 
 
Get a corpora of texts for future test and frequency list by using both Wikipedia in Latin and classic texts by Caesar, Cicero, Vergilius and others.
Get familiar with Apertium system in details (wiki-sources, installing, creating files etc)
 
 
Plan every step and write down everything as formally as it is possible (in natural language)
Get a corpora of texts for future test and frequency list by using both Wikipedia in Latin and classic texts by Caesar, Cicero, Vergilius and others.
 
 
Discuss details with a mentor
Plan every step and write down everything as formally as it is possible (in natural language)
 
 
U\V and I\J problem
Discuss details with a mentor
 
U\V and I\J problem
 
   
Week 1: 05\30 – 06\05
+
'''Week 1: 05\30 – 06\05 : '''
 
Dictionary: nouns & adjectives
 
Dictionary: nouns & adjectives
 
(they have same declension patterns)
 
(they have same declension patterns)
Line 45: Line 44:
 
Add prepositions (they are closely related to nouns)
 
Add prepositions (they are closely related to nouns)
   
Week 2: 06\07 – 06\12
+
'''Week 2: 06\07 – 06\12 : '''
 
Dictionary: verbs
 
Dictionary: verbs
 
Add verbs to dictionary
 
Add verbs to dictionary
 
Plan how to convert basic times
 
Plan how to convert basic times
   
Week 3: 06\13 – 06\19
+
'''Week 3: 06\13 – 06\19 : '''
 
Transfer rules
 
Transfer rules
 
Start writing transfer rules
 
Start writing transfer rules
 
Write basic transfer rules related to morphological transfers
 
Write basic transfer rules related to morphological transfers
Similar cases (case systems of these languages have a lot in common)
+
Similar cases (case systems of these languages have a lot in common)
   
Week 4: 06\20 – 06\26
+
'''Week 4: 06\20 – 06\26 : '''
Extend dictionary
+
Extend dictionary
Add word from other classes to the dictionary (especially, closed classes)
+
Add word from other classes to the dictionary (especially, closed classes)
Finish all work scheduled for this period
+
Finish all work scheduled for this period
Prepare for the first evaluation
+
Prepare for the first evaluation
Prepare detailed theoretical basis for the next phase
+
Prepare detailed theoretical basis for the next phase
   
Comment: first part is meant to be mostly technical and consist of some general and routine work.
+
'''Comment: first part is meant to be mostly technical and consist of some general and routine work.'''
Results: dictionary data, basic rules, morphological system, first testing
+
'''Results: dictionary data, basic rules, morphological system, first testing'''
   
First evaluation
+
=== First evaluation ===
   
Week 5: 06\27 – 07\03
+
'''Week 5: 06\27 – 07\03 : '''
 
Syntactic rules (word order)
 
Syntactic rules (word order)
 
Solve general word order problems
 
Solve general word order problems
   
Week 6: 07\04 – 07\10
+
'''Week 6: 07\04 – 07\10 : '''
 
Structures
 
Structures
 
Add basic structures as accusativus cum infinitivo, ablativus absolutus etc
 
Add basic structures as accusativus cum infinitivo, ablativus absolutus etc
   
Week 7: 07\11 – 07\17
+
'''Week 7: 07\11 – 07\17 : '''
Extend dictionary
+
Extend dictionary
 
Add more words from open classes
 
Add more words from open classes
   
Week 8: 07\18 – 07\24
+
'''Week 8: 07\18 – 07\24 : '''
 
Context based disambiguation
 
Context based disambiguation
   
Comment: second part is meant to be main part that suppose working on translation algorithms.
+
'''Comment: second part is meant to be main part that suppose working on translation algorithms.'''
Results: extended dictionary data, syntactic rules, beta version of the system is ready to be used, beta testing
+
'''Results: extended dictionary data, syntactic rules, beta version of the system is ready to be used, beta testing'''
   
Second evaluation
+
=== Second evaluation ===
   
Week 9: 07\25 – 07\31
+
'''Week 9: 07\25 – 07\31 : '''
 
Syntactic rules 2
 
Syntactic rules 2
 
Extend number of syntactic rules
 
Extend number of syntactic rules
 
Testing
 
Testing
   
Week 10: 08\01 – 08\07
+
'''Week 10: 08\01 – 08\07 : '''
 
Testing
 
Testing
 
Fixing issues that would appear
 
Fixing issues that would appear
 
Extending data or rules (depending on previous results)
 
Extending data or rules (depending on previous results)
   
Week 11: 08\08 – 08\14
+
'''Week 11: 08\08 – 08\14 : '''
 
Vacation
 
Vacation
I will be able to do some work, I will have a laptop but may have some troubles with internet access.
+
I will be able to do some work, I will have a laptop but may have some troubles with internet access.
   
Week 12: 08\15 – 08\21
+
'''Week 12: 08\15 – 08\21 : '''
 
Final work on details
 
Final work on details
Put everything in order
+
Put everything in order
   
Comment: improving system as much as it possible
+
'''Comment: improving system as much as it possible'''
Results: all rules written, final version of the system, testing, bugs fixed
+
'''Results: all rules written, final version of the system, testing, bugs fixed'''
   
 
=== Final evaluation ===
Week 13: 08\22 and after
 
 
Final evaluation
 

Revision as of 15:54, 18 March 2017

Contact information

Name: Irina Glazunova

Location: Moscow, Russia

University: NRU HSE, Moscow (National Research University Higher School of Economics)

E-mail: glaz.dikobraz@gmail.com

Timezone: UTC+3

Am I good enough?

Education: Bachelor's Degree in Fundamental and Computational Linguistics (2015-2019) at NRU HSE

Courses:

  • Programming (Python, Flask, HTML)
  • Morphology, Syntax, Semantics, Typology/Language Diversity
  • Mathematics (Discrete Mathemathics, Linear Algebra and Calculus, Probability Theory, Mathematical Statistics, Computability and Complexity)
  • Latin, Latin in modern Linguistics, Ancient Literature

Languages: Russian (native), English (Academic), French, Latin

Personal qualities: responsibility, punctuality, being hard-working, passion for Latin and programming, perseverance, resistance to stress

Why is it I am interested in machine translation? Why is it that I am interested in Apertium?

The speed of information circulation does not allow to spend time on human translation. I am truly interested in formal methods and models because they represent the way any language is constructed (as I see them). Despite some exceptions, in general language is very logical and the main problem is how to find proper systematic description. Apertium is a powerful platform that allows to build impressive rule-based engines. Languages like Latin are well-ordered, particularly their morphology, so it makes rule-based translation very promising.

Which of the published tasks am I interested in? What do I plan to do?

I would like to add Latin-Russian language pair. I plan to do my best to reach higher results, more details are given in Proposal part.

Proposal

Latin-Russian language pair

Why Google and Apertium should sponsor it? How and who it will benefit in society?

Latin is the language of a great importance. Furthermore, studying Latin has a centuries-old history in Russia. Besides, Russian is spoken in different countries so much larger audience will benefit from this project. In Russia there are a lot of universities where students study Latin (faculties of Linguistics, Philology, History, Law, Medicine). Consequently, there is need for translation, not to mention a great heritage of ancient writers, poets and philosophers as Cicero, Catullus and others. Today only a couple of platforms have Latin-Russian pair, but they still have a lot work to do. So, a perspective of creating this pair is very promising. What is more, it is promising because these languages have a lot in common (morphological system, syntactic role marking).

Week by week work plan

Week 0: until 05/29 : Preparation

          	Get familiar with Apertium system in details (wiki-sources, installing, creating files etc)
          	Get a corpora of texts for future test and frequency list by using both Wikipedia in Latin and classic texts by Caesar, Cicero, Vergilius and others.
          	Plan every step and write down everything as formally as it is possible (in natural language)
          	Discuss details with a mentor
          	U\V and I\J problem

Week 1: 05\30 – 06\05 : Dictionary: nouns & adjectives (they have same declension patterns)

          	Add nouns to dictionary
          	Describe morphology
          	Add prepositions (they are closely related to nouns)

Week 2: 06\07 – 06\12 : Dictionary: verbs

          	Add verbs to dictionary
          	Plan how to convert basic times

Week 3: 06\13 – 06\19 : Transfer rules Start writing transfer rules

          	Write basic transfer rules related to morphological transfers
          	Similar cases (case systems of these languages have a lot in common)

Week 4: 06\20 – 06\26 :

          	Extend dictionary
          	Add word from other classes to the dictionary (especially, closed classes)
          	Finish all work scheduled for this period
          	Prepare for the first evaluation
          	Prepare detailed theoretical basis for the next phase

Comment: first part is meant to be mostly technical and consist of some general and routine work. Results: dictionary data, basic rules, morphological system, first testing

First evaluation

Week 5: 06\27 – 07\03 : Syntactic rules (word order)

          	Solve general word order problems

Week 6: 07\04 – 07\10 : Structures

          	Add basic structures as accusativus cum infinitivo, ablativus absolutus etc

Week 7: 07\11 – 07\17 :

          	Extend dictionary
          	Add more words from open classes

Week 8: 07\18 – 07\24 : Context based disambiguation

Comment: second part is meant to be main part that suppose working on translation algorithms. Results: extended dictionary data, syntactic rules, beta version of the system is ready to be used, beta testing

Second evaluation

Week 9: 07\25 – 07\31 : Syntactic rules 2

          	Extend number of syntactic rules
          	Testing

Week 10: 08\01 – 08\07 : Testing

          	Fixing issues that would appear

Extending data or rules (depending on previous results)

Week 11: 08\08 – 08\14 : Vacation

          	I will be able to do some work, I will have a laptop but may have some troubles with internet access.

Week 12: 08\15 – 08\21 : Final work on details

          	Put everything in order

Comment: improving system as much as it possible Results: all rules written, final version of the system, testing, bugs fixed

Final evaluation