Difference between revisions of "User:Venkat1997/proposal"

From Apertium
Jump to navigation Jump to search
Line 7: Line 7:
   
 
== Why am I interested in machine translation? ==
 
== Why am I interested in machine translation? ==
  +
I come from India, where there exists 22 scheduled languages and almost every other state speaks a different language. From a young age, I was interested in how to translate between languages because it would really help to travel between states in a country like India. I was introduced to machine translation by a ML course at my college and I was fascinated by how the complex process of translation between languages was neatly encapsulated in probabilistic models. Recently, I also read an article on how Google had vastly improved its translation quality by applying Artificial Neural Networks to machine translation. Upon reading Google's work, I became even more excited about this field and its far reaching applications. I wanted to get involved in a project which would aid in helping me understand more about this subject. I believe that the project I have chosen for GSoC will help me in achieving that goal as I would be working on the one of the most important aspects machine translation pipeline of Apertium directly, generating lexical rules. Being a computer science student, it would also be a good exercise in programming and software engineering practices.
I am fascinated by how machine translation is enabling language to become less and less of a barrier for interaction between people. I live in India where people in different states speak different languages. Before, visiting just a neighboring state would be a challenging task without knowing the language spoken in that state. However, with the rise of robust machine translation systems (like Google Translate), people are able to move around freely without language being a hindrance. I was piqued by the power of these systems as they were even able to translate complicated sentences from one language to another. I wanted to learn more about how this process of translation is achieved and I believe that my project would be a great step in understanding this process. I am also very excited by the prospect of working with experienced mentors like Francis Tyers and Unhammer.
 
  +
== Why is it that I am interested in Apertium? ==
  +
Apertium is one of the few translation platforms that has both a helpful community and detailed documentation. It allows users to understand what is actually happening behind machine translation. For example, I spent some time on IRC discussing with Unhammer on how to add more custom lexical rules and evaluate their peformance. He promptly explained what was happening and also pointed me to some wiki pages for more details. I was able to understand, first-hand, how critical tasks were performed and how a language pair is actually created. Many experiences like these made Apertium a very likeable organization. Not only is it one of the most robust machine translation platforms but also its documentation and community enables anyone to learn what machine translation is.
  +
== Which of the published tasks am I interested in? What do I plan to do? ==
  +
I am interested in the task User-friendly lexical selection training. I plan to extend Nikita Medyankin's work on the driver script by refactoring his code and removing unnecessary scripts, adopting a more user friendly yaml config file, making the installation of third-party tools easier (maybe even removing some of them that are currently used), providing regression tests to this driver script and finally testing the work by running on some language pairs that don't have many rules and adding those rules to the pair if it improves quality.
  +
== Reasons why Google and Apertium should sponsor it ==
  +
* Make the process of generating lexical rules a lot more user-friendly.
  +
* Current script is difficult to understand and modify. Upon completion of project, further improvements on generation of lexical selection rules is also made easier.
  +
* Improvements to current language pairs can be performed effectively.

Revision as of 21:36, 31 March 2017

Contact Information

Name: Venkat Parthasarathy

E-Mail: venkat.p1997@gmail.com

IRC: venkat

Why am I interested in machine translation?

I come from India, where there exists 22 scheduled languages and almost every other state speaks a different language. From a young age, I was interested in how to translate between languages because it would really help to travel between states in a country like India. I was introduced to machine translation by a ML course at my college and I was fascinated by how the complex process of translation between languages was neatly encapsulated in probabilistic models. Recently, I also read an article on how Google had vastly improved its translation quality by applying Artificial Neural Networks to machine translation. Upon reading Google's work, I became even more excited about this field and its far reaching applications. I wanted to get involved in a project which would aid in helping me understand more about this subject. I believe that the project I have chosen for GSoC will help me in achieving that goal as I would be working on the one of the most important aspects machine translation pipeline of Apertium directly, generating lexical rules. Being a computer science student, it would also be a good exercise in programming and software engineering practices.

Why is it that I am interested in Apertium?

Apertium is one of the few translation platforms that has both a helpful community and detailed documentation. It allows users to understand what is actually happening behind machine translation. For example, I spent some time on IRC discussing with Unhammer on how to add more custom lexical rules and evaluate their peformance. He promptly explained what was happening and also pointed me to some wiki pages for more details. I was able to understand, first-hand, how critical tasks were performed and how a language pair is actually created. Many experiences like these made Apertium a very likeable organization. Not only is it one of the most robust machine translation platforms but also its documentation and community enables anyone to learn what machine translation is.

Which of the published tasks am I interested in? What do I plan to do?

I am interested in the task User-friendly lexical selection training. I plan to extend Nikita Medyankin's work on the driver script by refactoring his code and removing unnecessary scripts, adopting a more user friendly yaml config file, making the installation of third-party tools easier (maybe even removing some of them that are currently used), providing regression tests to this driver script and finally testing the work by running on some language pairs that don't have many rules and adding those rules to the pair if it improves quality.

Reasons why Google and Apertium should sponsor it

  • Make the process of generating lexical rules a lot more user-friendly.
  • Current script is difficult to understand and modify. Upon completion of project, further improvements on generation of lexical selection rules is also made easier.
  • Improvements to current language pairs can be performed effectively.