User:Ramzz1

From Apertium
Revision as of 07:16, 9 April 2010 by 124.123.251.195 (talk)
Jump to navigation Jump to search

Name

N. Kiran Kumar

Affiliation

Fourth year BTech + MS by Research student, Department of Computer Science and Engineering, International Institute of Information Technology-Hyderabad,INDIA.

Email Address

kirankumar.iiit@gmail.com

Contact Information

IRC:      ramzz@irc.freenode.net
Phone: +91 9290447116

Why is it you are interested in machine translation?

As I am a student of Computer Science, I have done various courses such as Artificial Intelligence, pattern Recognition and Natural language Processing during my academics. Also I did various projects related to the field of Machine Learning and Machine translation. " I am from INDIA that has several languages and where MT systems plays a high significant role. Since I am from the linguistic background, and also owing to the practical use and inherent challenges in the task, I really love working on it. I have worked on projects like "Inducing Transfer Grammar from word aligned Corpus", "Textual Alignment" etc. Also there is a large scope for meeting people of different places and also I can get to know the culture, history and the tradition of various places and groups which is exiting and interesting.

Why is it that you are interested in the Apertium project?

Working on ‘Language related’ projects is my strength since I am from the background of “Langauge Technologies”. Apertium is working on the Machine translation of the various world languages for which the resources availability is limited. This makes it as a challenging project. These kind of open source projects are very much useful to the society. I was introduced to the word "Open Source" through Linux when I was in my first year of graduation, I liked the idea open (free) to all but never thought of contributing to an Open Source. As I started using open source projects, I began to appreciate the role of a good open source project in the development of other projects and became interested in them. That is the reason why I have chosen to work on the Apertium project.

Which of the published tasks are you interested in?

I am interested to work on “POST-EDITION-TOOL” task.

Project Description

The main intent of this project is to build a tool that supports editing Apertium MT system output. The tool must support any pair of languages (available in Apertium) and it has to deal with various errors in the translation like “Wrongly spelt words”, “unknown words”, “grammar mistakes”, “sentences with wrong order”, “non-native phrases”, etc. I am planning to work on only one or two language pairs i.e on en-ca (or) ca-en as time may not be sufficient to work with more number of languages. Detailed description of the project can be found here. [1]

Reasons why Google and Apertium should sponsor it.

The project adds additional functionality to the existing Apertium translator. Since users collaboratively edit the translations and suggest better translations, this would be helpful in understanding the limitations of existing system and in improving the system.

A description of how and who it will benefit in society.

By using an open source Machine translation system such as Apertium (with a better accuracy), people can translate text from one language to the desired language at free of cost. This will be of great help to them. Also some of the European Companies which are spending Billions of rupees can make use of such open source systems to make a cost effective use. Consider an example: I am from India where there are many states and different languages for each state. When we go from one state to another during summer vacation for spending holidays, we experience difficulty in interacting with the people of other state. Also when we go for higher studies to different places we may face difficulties due to the lack of interaction which is mainly because of the language gap. For all such kind of common people this project will be of great help.

Work Plan

Google Official Coding starts from May 24th. If I am quite confident about things before this date, I will start working on the project before May 24th. I am planning to work more or less as per the following schedule. Timeline

Week1: Basic prototype of the UI and suggestions from the mentor.

Week2: Understanding the various rules/features in “Language tool” and sketch a plan on how to incorporate useful new rules/features into the post-edition tool.

Week3: Integrate these new rules/features with the “Language tool” which forms the baseline system of Post-edition-tool.

Week4: Implement the “spell checker” and also build the “Unknown word” detector module by making use of the Apertium monolingual dictionary and web (if needed).

Deliverable#1 =========== A basic Post-edition–tool with “Language tool” and extra features/rules added above it, and also integrated with the “Spell checker” and “Unknown word” detection module.

Week5: Implement a “Grammar checker” using the POS Tags and n-gram model (bigram/ trigram depending on the better results). Testing is done as part of development.

Week6: Implement the “Spell suggestor” using Apertium monolingual dictionary, web and various edit distance algorithms.

Week7: Integrate the “Spell suggestor” and the “Grammar checker” to the baseline system.

Week8: Implement the other add-ons like “Identifying non-native phrases” etc using the Apertium monolingual dictionary, POS Tags, n-gram model and integrating into the system and Testing the system.

Deliverable#2 =========== Build a Post-edition tool with various modules integrated into it, which includes “spell checker and suggestor”, “ non-native phrases” identifier, “Grammar checker”.

Week9: Improving the Front-End UI by adding login facility, logging all editions etc.

Week10-11: Final testing, Code Documentation and user how-to documentation

Week12: Backup week (In case of any new improvements to be made).

Project completed

List your skills and give evidence of your qualifications.

JAVA/web knowledge experience: I find Java exciting and fun to work with. I learnt java out of my own interest. I did most of my projects in java. I have been working with it for 2years. I have also worked on web related projects using JSP and MySQL as a part of my academic course project.

Database experience I have used MySQL and built a small database as a part of my course. Currently, I am in 4th year B.Tech + M.S by Research program in the department of Computer Science and Engineering at International Institute of Information Technology, Hyderabad, Andhra Pradesh, India.

I don’t have the experience of working in the Open source Projects. I have participated in the “Text Analysis Conference(TAC) 2009” and published a paper on “Recognizing Textual Entailment” in it. I have done that project in “java” and made use of various tools such as “WordNet , Monty lingua, VerbOcean, Stanford Parser and Stanford NER” for the project.

List any non-Summer-of-Code plans you have for the Summer.

I will finish my academic work and other stuff by May 10th. From then, I will start learning more about the project related things. I am not engaged in any other works this summer. I have three months of summer vacation. I am planning to spend most of this time on “Google Summer of Code” only. I am willing to spend minimum 5 hours a day for 6-7 days a week. So overall I spend around 30-35 hours a week. Many times, I may spend a whole day doing nothing except working on the project. (around 10 hrs)