Difference between revisions of "User:Ariessa/proposal"

From Apertium
Jump to navigation Jump to search
m
m
 
(3 intermediate revisions by the same user not shown)
Line 16: Line 16:
 
== Why am I interested in machine translation? ==
 
== Why am I interested in machine translation? ==
   
This is because machine translation platforms like Apertium, Google Translate and Bing Translator helps me a lot in writing. They are very useful when you need simple translations from one language to another. Other than that, the concept of machine translation itself is simply fascinating for me. Isn’t it amazing knowing and seeing that machines are capable of doing things that we thought were only feasible by humans? Through machine translation systems, people can translate one language to any language available in the world. Machine translation empowers people in learning multiple languages. As being a polyglot is my personal goal, I definitely need machine translation platforms in the process of being one. Not just that, machine translation is very interesting in the sense that it bridges the communication gap caused by distinct languages. Frankly speaking, it’s super annoying to not being able to communicate with people. It’s hard when you can’t express your opinions just because of language restrictions.
+
This is because machine translation platforms like Apertium, Google Translate and Bing Translator helps me a lot in writing. They are very useful when you need simple translations from one language to another. Other than that, the concept of machine translation itself is simply fascinating for me. Isn’t it amazing knowing and seeing that machines are capable of doing things that we thought were only feasible by humans? Although many are still left out, people can translate from one language to another using machine translation systems. Moreover, machine translation empowers people in learning multiple languages. As being a polyglot is my personal goal, I definitely need machine translation platforms in the process of being one. Not just that, machine translation is very interesting in the sense that it bridges the communication gap caused by distinct languages. Frankly speaking, it’s very annoying to not being able to communicate with people. It’s hard when you can’t express your opinions just because of language restrictions.
  +
 
  +
 
   
 
== Why am I interested in Apertium? ==
 
== Why am I interested in Apertium? ==
   
I love open source stuffs and Apertium sure is one! Plus, I want to contribute to open source projects and I believe Apertium is the best platform for me to start and learn about contributing. Since it’s open source, people from all over the globe can participate in this project and create a better MT system. It's a common fact that languages change rapidly as year goes by. The phrase that you’re using now could be dead or considered obsolete by your future grandchildren. What’s being used now, wouldn’t be used in the future. Since it’s always changing, it’s best if there is a MT system that enables people to keep on improving the language pair according to the current standard. On top of that, Apertium is a project that combines linguistics and computing. It has two of my favourite subjects in the world. Furthermore, I want to deepen my knowledge about computational linguistics.
+
I love free/open-source software and Apertium sure is one! Plus, I want to contribute to open source projects and I believe Apertium is the platform for me to start and learn about contributing. Since it’s open source, people from all over the globe can participate in this project and create a better MT system. It’s a common fact that languages change rapidly as year goes by. The phrase that you’re using now could be dead or considered obsolete by your future grandchildren. As Apertium is rule-based, new words could easily be added to the existing dictionaries. On top of that, Apertium is a project that combines linguistics and computing. It has two of my favourite subjects in the world. Furthermore, I want to deepen my knowledge about computational linguistics.
 
 
 
 
   
== Which of the published tasks am I interested in? What do I plan to do? ==
+
== Which of the published tasks am I interested in? ==
   
None. I plan to add a new language pair, which is English-Malay (eng-zlm). Since the monolingual packages for both languages are available at Apertium, I plan to use them alongside the zlm-eng dictionary.
+
None. I plan to add a new language pair, which is English-Malay (eng-zlm).
 
 
 
  +
  +
  +
== What do I plan to do? ==
  +
  +
*Expand the Malay and English monolingual dictionaries
  +
*Expand the English-Malay bilingual dictionary
  +
*Add lexical selection rules
 
*Add transfer rules
  +
*Deliver an English-Malay translator that has about 70% or more coverage, testvoc clean and about 20% or lower WER
  +
  +
   
 
== Title ==
 
== Title ==
Line 45: Line 55:
 
== Work plan ==
 
== Work plan ==
   
'''Before GSOC starts'''
+
'''Before GSoC starts'''
 
*Set up apertium on Ubuntu
 
*Set up apertium on Ubuntu
 
*Install id-ms and practice with this language pair
 
*Install id-ms and practice with this language pair
Line 57: Line 67:
 
*Finish reading the Apertium documentation
 
*Finish reading the Apertium documentation
 
*Get to know the Apertium community
 
*Get to know the Apertium community
  +
*Get initial WER from the story
   
   
 
'''Week 1 - Week 2:'''
 
'''Week 1 - Week 2:'''
 
*Work on the morphological dictionaries by expanding them
 
*Work on the morphological dictionaries by expanding them
  +
**Add 2000 entries of nouns and proper nouns to both dictionaries
 
 
   
 
'''Week 3:'''
 
'''Week 3:'''
 
*Work on the bilingual dictionary
 
*Work on the bilingual dictionary
  +
**Add 2000 entries of nouns, proper nouns, adverbs, prepositions, comparative, negation, pronouns, verbs and adjectives
 
 
   
 
'''Week 4:'''
 
'''Week 4:'''
*Get parallel corpus by translating Wikipedia articles from English to Malay
 
 
*Expand bilingual dictionaries with Giza++
 
*Expand bilingual dictionaries with Giza++
 
*Write an article about the grammatical difference between English and Malay
 
*Create pages about English-Malay or Malay on Apertium wiki
 
 
   
Line 76: Line 90:
 
 
 
'''Week 5 - Week 6:'''
 
'''Week 5 - Week 6:'''
  +
*Write lexical selection rules for disambiguation
*Get parallel corpus by translating Wikipedia articles from English to Malay
 
  +
**Exempli gratia:
*Write the transfer rules
 
  +
The word ‘rendang’ has multiple meanings.
 
  +
Ayah duduk di bawah pohon yang rendang itu.
  +
Ibu memasak rendang ayam di dapur.
  +
*Write transfer rules for present, past and future tenses
  +
**For example:
  +
Ali sedang makan nasi.
  +
Ali telah makan nasi.
  +
Ali akan makan nasi.
 
*Write transfer rules for modal auxiliaries
  +
**Exempli gratia:
  +
Adik mesti kelaparan.
  +
Manusia memerlukan air dalam kehidupan.
   
'''Week 7:'''
 
*Work on bidix
 
*Add more words to the dictionaries
 
 
   
'''Week 8:'''
+
'''Week 7 - Week 8:'''
  +
*Write transfer rules for handling negation
*3-day break
 
  +
**For example:
*Work on bidix
 
  +
Saya tidak suka awak.
 
  +
Itu bukan beg Dahlia.
  +
*Write transfer rules for interrogative sentences
  +
**For instance:
  +
Siapakah nama kamu?
  +
Mengapakah kamu menangis?
  +
*Write transfer rules for imperative sentences
  +
**For instance:
  +
Sila tinggalkan kasut di luar.
  +
Dilarang merokok.
   
'''Deliverable #2 Transfer rules'''
 
 
 
 
'''Deliverable #2 Basic Transfer rules'''
 
'''Week 9:'''
 
*Expand bilingual dictionaries
 
*Write more transfer rules
 
 
 
   
'''Week 10:'''
+
'''Week 9 - Week 10:'''
  +
*Write transfer rules for reduplication
*Work on bidix
 
  +
**For instance:
 
  +
Beg-beg itu berwarna merah.
  +
Dedaun itu keguguran.
  +
*Write transfer rules for possessive and demonstrative pronouns
  +
**For example:
  +
Buku itu miliknya.
  +
Baju-baju ini adalah kepunyaan Alia.
  +
   
 
'''Week 11:'''
 
'''Week 11:'''
*Write an article about the grammatical difference between English and Malay
 
*Create pages about English-Malay or Malay on Apertium wiki
 
 
*Debug any problems
 
*Debug any problems
  +
*Run testvoc
 
  +
*Do regression testing
  +
*Evaluate WER of 500 words for eng->zlm
  +
**Objective: 20% or lower WER
  +
   
 
'''Week 12:'''
 
'''Week 12:'''
  +
*Documentation
*Evaluate using testvoc, WER and trimmed coverage
 
 
*Brush up the project
 
*Brush up the project
   
Line 178: Line 213:
 
*[[Quick_and_dirty_guide_addendum:_other_important_things]]
 
*[[Quick_and_dirty_guide_addendum:_other_important_things]]
 
*[http://prpm.dbp.gov.my/ Pusat Rujukan Persuratan Melayu @ DBP]
 
*[http://prpm.dbp.gov.my/ Pusat Rujukan Persuratan Melayu @ DBP]
  +
*[https://en.wikipedia.org/wiki/Malay_grammar Malay Grammar]

Latest revision as of 22:01, 2 April 2017

Contact Info[edit]

Name: Nurul Ariessa Binti Norramli

E-mail: ariessa.norramli@gmail.com

IRC: ariessa

Location: Malaysia

Timezone: UTC+08:00


Why am I interested in machine translation?[edit]

This is because machine translation platforms like Apertium, Google Translate and Bing Translator helps me a lot in writing. They are very useful when you need simple translations from one language to another. Other than that, the concept of machine translation itself is simply fascinating for me. Isn’t it amazing knowing and seeing that machines are capable of doing things that we thought were only feasible by humans? Although many are still left out, people can translate from one language to another using machine translation systems. Moreover, machine translation empowers people in learning multiple languages. As being a polyglot is my personal goal, I definitely need machine translation platforms in the process of being one. Not just that, machine translation is very interesting in the sense that it bridges the communication gap caused by distinct languages. Frankly speaking, it’s very annoying to not being able to communicate with people. It’s hard when you can’t express your opinions just because of language restrictions.


Why am I interested in Apertium?[edit]

I love free/open-source software and Apertium sure is one! Plus, I want to contribute to open source projects and I believe Apertium is the platform for me to start and learn about contributing. Since it’s open source, people from all over the globe can participate in this project and create a better MT system. It’s a common fact that languages change rapidly as year goes by. The phrase that you’re using now could be dead or considered obsolete by your future grandchildren. As Apertium is rule-based, new words could easily be added to the existing dictionaries. On top of that, Apertium is a project that combines linguistics and computing. It has two of my favourite subjects in the world. Furthermore, I want to deepen my knowledge about computational linguistics.


Which of the published tasks am I interested in?[edit]

None. I plan to add a new language pair, which is English-Malay (eng-zlm).


What do I plan to do?[edit]

  • Expand the Malay and English monolingual dictionaries
  • Expand the English-Malay bilingual dictionary
  • Add lexical selection rules
  • Add transfer rules
  • Deliver an English-Malay translator that has about 70% or more coverage, testvoc clean and about 20% or lower WER


Title[edit]

Add English-Malay language pair (eng-zlm)


Why Google and Apertium should sponsor it? How and who will it benefit in society?[edit]

I saw that Apertium doesn't offer translation for English-Malay and vice versa. As a native Malay speaker, I feel that the English-Malay language pair should exist in Apertium. This is because the Malay language or Bahasa Melayu is used in numerous countries like Singapore, Indonesia, Malaysia, Brunei and some part of Thailand. It is a vital language for the Southeast Asians. By doing this, Apertium can reach more audience. Consequently, it could attract more developers, translators or linguists to contribute to Apertium. This could then lead to an active and ongoing development of Apertium. Not just that, more language pairs will be added too since most Southeast Asians speak more than one language. In my case, I know English, Bahasa Melayu, basic Japanese and basic Arabic. Besides, the existing MT systems for this pair are not free/open source and use statistical MT(SMT) systems which are different from Apertium. All in all, Google and Apertium should sponsor this project as it can introduce the Southeast Asians to this machine translation platform.


Work plan[edit]

Before GSoC starts

  • Set up apertium on Ubuntu
  • Install id-ms and practice with this language pair
  • Read the Apertium wikis on how to add a new language pair
  • Improve my knowledge about linguistics related to Malay and English languages
  • Work on the coding challenge
  • Begin working on the morphological dictionary


Community Bonding Period

  • Finish reading the Apertium documentation
  • Get to know the Apertium community
  • Get initial WER from the story


Week 1 - Week 2:

  • Work on the morphological dictionaries by expanding them
    • Add 2000 entries of nouns and proper nouns to both dictionaries


Week 3:

  • Work on the bilingual dictionary
    • Add 2000 entries of nouns, proper nouns, adverbs, prepositions, comparative, negation, pronouns, verbs and adjectives


Week 4:

  • Expand bilingual dictionaries with Giza++
  • Write an article about the grammatical difference between English and Malay
  • Create pages about English-Malay or Malay on Apertium wiki


Deliverable #1 Morphological dictionaries and bilingual dictionary


Week 5 - Week 6:

  • Write lexical selection rules for disambiguation
    • Exempli gratia:
   The word ‘rendang’ has multiple meanings.
           Ayah duduk di bawah pohon yang rendang itu.
           Ibu memasak rendang ayam di dapur.
  • Write transfer rules for present, past and future tenses
    • For example:
           Ali sedang makan nasi.
           Ali telah makan nasi.
           Ali akan makan nasi.
  • Write transfer rules for modal auxiliaries
    • Exempli gratia:
           Adik mesti kelaparan.
           Manusia memerlukan air dalam kehidupan. 


Week 7 - Week 8:

  • Write transfer rules for handling negation
    • For example:
           Saya tidak suka awak.
           Itu bukan beg Dahlia.
  • Write transfer rules for interrogative sentences
    • For instance:
           Siapakah nama kamu?
           Mengapakah kamu menangis?    
  • Write transfer rules for imperative sentences
    • For instance:
           Sila tinggalkan kasut di luar.
           Dilarang merokok.


Deliverable #2 Basic Transfer rules


Week 9 - Week 10:

  • Write transfer rules for reduplication
    • For instance:
           Beg-beg itu berwarna merah.
           Dedaun itu keguguran.
  • Write transfer rules for possessive and demonstrative pronouns
    • For example:
           Buku itu miliknya.
           Baju-baju ini adalah kepunyaan Alia.


Week 11:

  • Debug any problems
  • Run testvoc
  • Do regression testing
  • Evaluate WER of 500 words for eng->zlm
    • Objective: 20% or lower WER


Week 12:

  • Documentation
  • Brush up the project


Project completed


Skills and evidence of my qualifications.[edit]

C, C++, Python, JavaScript, BASH and some HTML & CSS. I’m currently taking an introductory course about C language at a local university. As for the other languages, they were self-taught. Aside from that, I'm a native Malay speaker who happens to speak English fluently. Therefore, I can assure you that I have good command of both languages.


My current field of study[edit]

I’m currently taking a pre-degree program called Foundation of Engineering at Universiti Teknologi Mara Kampus Dengkil, Malaysia.


Let me convince you that I can do the work.[edit]

I was a Google Code-In (GCI) 2015 finalist for MetaBrainz Foundation. So, I dare say that I can work on deadlines. Being a fast learner is one of my strong points. As I was a self-taught coder, I would google about something that’s out of my knowledge first before asking around at online forums. Also, I’m a very passionate and hardworking person when it comes to writing and coding. Moreover, I always give my all when working. On top of that, I’m also a freelance writer. I’m used to work in 2-day deadline. Thus, I can assure you that I got what it takes.


Have I programmed in open source projects before?[edit]

Yes but it was only a single pull request task for GCI 2015.


Non-Summer-of-Code plans for the Summer[edit]

Aside from participating in Google Summer of Code, I’m going to work as a freelance writer. Even so, my writing time is only 3 hours every day.


Schedules and time commitments[edit]

I’m having my finals right now. But when GSoC starts, I’ll be on a 5-months break. I’ll be finishing school by early April 2017. So, I’m confident that I’ll have at least 30 free hours a week.


Coding Challenge[edit]


Existing Language Resources[edit]


Other Resources[edit]