Difference between revisions of "User:Oldtrafford.kedar"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
'''Name:'''
+
== Name ==
 
 
KEDAR KULKARNI
 
KEDAR KULKARNI
   
 
 
'''E-mail address:'''
 
   
   
   
  +
== Email Address ==
 
oldtrafford.kedar@gmail.com
 
oldtrafford.kedar@gmail.com
   
   
   
  +
== Contact Information ==
'''Other information that may be useful for contact:'''
 
 
 
 
 
PH-NO: +919160011165
 
PH-NO: +919160011165
   
Line 23: Line 17:
   
   
'''Why is it you are interested in machine translation?'''
+
== Why is it you are interested in machine translation? ==
 
 
Machine Translation is a tool through which you can access texts in other languages. I think the study of languages is very fascinating. As a matter of fact, my mother tongue is Marathi. But I was brought up in an environment where I was more exposed to Hindi than Marathi. Hence I mostly communicated in Hindi. However I belong to a family which is dominated by Marathi culture. Moreover I am interested in reading Marathi literature. So MT is a tool which can help me in understanding Marathi literature. Also I have been keenly listening the advances in MT for a few years which created an interest in me towards developing MT.
 
Machine Translation is a tool through which you can access texts in other languages. I think the study of languages is very fascinating. As a matter of fact, my mother tongue is Marathi. But I was brought up in an environment where I was more exposed to Hindi than Marathi. Hence I mostly communicated in Hindi. However I belong to a family which is dominated by Marathi culture. Moreover I am interested in reading Marathi literature. So MT is a tool which can help me in understanding Marathi literature. Also I have been keenly listening the advances in MT for a few years which created an interest in me towards developing MT.
   
 
== Why is it that you are interested in the Apertium project? ==
 
 
 
 
'''Why is it that you are interested in the Apertium project?'''
 
 
 
 
 
Apertium acts as a machine translation platform. Basically, it provides you an engine and toolbox that allow you to build your own MT systems. Also it is open source and open content. Since I am interested in building a MT system, I was looking for available resources. A couple of resources which interested me were Anusaaraka and Apertium. Anusaaraka only gives language access but doesn't give translation. Also it is not very user friendly as its use requires proper training. On the other hand Apertium is very user friendly and it can be used straight out of the box. So here is a opportunity to test the usability of Apertium on closely related languages such as Marathi-Hindi. Apertium is small and efficient. So closely related Indic languages should work well on Apertium.
 
Apertium acts as a machine translation platform. Basically, it provides you an engine and toolbox that allow you to build your own MT systems. Also it is open source and open content. Since I am interested in building a MT system, I was looking for available resources. A couple of resources which interested me were Anusaaraka and Apertium. Anusaaraka only gives language access but doesn't give translation. Also it is not very user friendly as its use requires proper training. On the other hand Apertium is very user friendly and it can be used straight out of the box. So here is a opportunity to test the usability of Apertium on closely related languages such as Marathi-Hindi. Apertium is small and efficient. So closely related Indic languages should work well on Apertium.
 
Since there has been no work done in Indic Languages on the Apertium platform ( except for Urdu - Hindi), I thought this is an opportunity to show the usefulness of Apertium for Indic Languages.
 
Since there has been no work done in Indic Languages on the Apertium platform ( except for Urdu - Hindi), I thought this is an opportunity to show the usefulness of Apertium for Indic Languages.
Line 42: Line 28:
   
   
'''Which of the published tasks are you interested in?'''
+
== Which of the published tasks are you interested in? ==
 
 
 
 
Apertium: Machine Translation between Marathi to Hindi
 
Apertium: Machine Translation between Marathi to Hindi
   
Line 52: Line 35:
   
   
'''What do you plan to do?'''
+
== What do you plan to do? ==
   
 
''STATE – OF – ART:''
 
''STATE – OF – ART:''
Line 122: Line 105:
   
   
'''Why should Google and Apertium Sponsor it?'''
+
== Why should Google and Apertium Sponsor it? ==
   
   
Line 128: Line 111:
   
   
'''How and who will it benefit in Society?'''
+
== How and who will it benefit in Society? ==
   
 
Marathi is the 4th most spoken language in India. Mahabhasya by Patanjali is only available in Marathi but not in Hindi. So it not accessible to the Hindi population in the country. Mahabhasya is only an example. There are many such cases in Marathi Literature. So MT would help us in such cases. The Hindi-Marathi MT can serve as a case study for building Telugu-Hindi,Kannada-Hindi and Punjabi-Hindi systems because Anusaaraka systems are already available for these languages under GPL license.
 
Marathi is the 4th most spoken language in India. Mahabhasya by Patanjali is only available in Marathi but not in Hindi. So it not accessible to the Hindi population in the country. Mahabhasya is only an example. There are many such cases in Marathi Literature. So MT would help us in such cases. The Hindi-Marathi MT can serve as a case study for building Telugu-Hindi,Kannada-Hindi and Punjabi-Hindi systems because Anusaaraka systems are already available for these languages under GPL license.
Line 135: Line 118:
   
   
'''List your skills and give evidence of your qualifications.'''
+
== List your skills and give evidence of your qualifications. ==
   
   
Line 158: Line 141:
   
   
 
== List any non-Summer-of-Code plans you have for the Summer ==
'''
 
List any non-Summer-of-Code plans you have for the Summer.'''
 
 
 
No. No other plans if my application is selected.
 
No. No other plans if my application is selected.
   
Line 169: Line 150:
   
   
'''REFERENCES:'''
+
== REFERENCES: ==
   
   

Revision as of 12:23, 9 April 2010

Name

KEDAR KULKARNI



Email Address

oldtrafford.kedar@gmail.com


Contact Information

PH-NO: +919160011165



Why is it you are interested in machine translation?

Machine Translation is a tool through which you can access texts in other languages. I think the study of languages is very fascinating. As a matter of fact, my mother tongue is Marathi. But I was brought up in an environment where I was more exposed to Hindi than Marathi. Hence I mostly communicated in Hindi. However I belong to a family which is dominated by Marathi culture. Moreover I am interested in reading Marathi literature. So MT is a tool which can help me in understanding Marathi literature. Also I have been keenly listening the advances in MT for a few years which created an interest in me towards developing MT.

Why is it that you are interested in the Apertium project?

Apertium acts as a machine translation platform. Basically, it provides you an engine and toolbox that allow you to build your own MT systems. Also it is open source and open content. Since I am interested in building a MT system, I was looking for available resources. A couple of resources which interested me were Anusaaraka and Apertium. Anusaaraka only gives language access but doesn't give translation. Also it is not very user friendly as its use requires proper training. On the other hand Apertium is very user friendly and it can be used straight out of the box. So here is a opportunity to test the usability of Apertium on closely related languages such as Marathi-Hindi. Apertium is small and efficient. So closely related Indic languages should work well on Apertium. Since there has been no work done in Indic Languages on the Apertium platform ( except for Urdu - Hindi), I thought this is an opportunity to show the usefulness of Apertium for Indic Languages.



Which of the published tasks are you interested in?

Apertium: Machine Translation between Marathi to Hindi



What do you plan to do?

STATE – OF – ART: 1.Marathi morph analyzer with around 80% coverage on web text in Anusaaraka format. 2.Hindi morph analyzer with around 90% coverage on web text in Anusaaraka format. 3.Marathi-Hindi bilingual dictionary with around 15K headwords . 4.Working system of Marathi-Hindi Anusaaraka producing core Anusaaraka output.

COMMUNITY BONDING PERIOD: • Learning Apertium Framework in general. • Use of Apertium viewer. • An overview of what is available in Anusaaraka.

WEEK1:

• Developing programs for converting Anusaaraka morph analyzers to Apertium format. • Building a Apertium morphological dictionary for highly frequent 5000 words of Marathi and Hindi. • Converting WX resources to Unicode data. WEEK 2 & WEEK 3: • Checking the completeness of paradigms in Unicode format and providing missing paradigms if any. • Testing morphological analyzers on various sample from Wikipedia to ensure that coverage is at least 80%. • Adding enough entries from high frequent words so as to get 80% coverage for Marathi.

WEEK 4: • Marathi-Hindi Transfer rules. Since Marathi and Hindi are very similar, Maximum work will be in t1x, little work in t2x and almost no work in t3x

DELIVERABLE AT THE END OF 4th WEEK:- Marathi and Hindi Morphological analyzer with standardized tagsets.

WEEK 5: • Developing a program to convert the Marathi-Hindi bilingual Anusaaraka dictionary to Apertium format.

WEEK 6 & WEEK 7: • Ensuring that the words in Marathi-Hindi dictionary of morph's analyzers have been covered.If not add them.

WEEK 8: • Testing the bilingual dictionary on random Wiki pages to ensure to seek 80% coverage.

DELIVERABLE AT THE END OF 8TH WEEK:- Bilingual Dictionary with 80% coverage in Marathi ---> Hindi.

WEEK 9 & WEEK 10: • Training a POS tagger for both Marathi and Hindi • Developing mapping from ILMT tags to Apertium tags and exploring the possibility of using POS data of ILMT for training POS taggers of Marathi and Hindi.

WEEK 11: • Testing and improving the quality and coverage of the translation.

WEEK 12: • Testing the complete Machine Translation system on Wikipedia texts and evaluating.




Why should Google and Apertium Sponsor it?

There are no Indic Languages in the open source except for Anusaaraka. However Anusaaraka doesn't have MT component of it. Also Apertium currently does not have language pairs in Indic Languages. Hence this group would make a nice group for Apertium systems and also expand its horizon. Also it would act as a building block for other language pairs in the Indic Language group.Google is an organization which works for the benefit of society. Google doesn't have a Hindi-Marathi language pair. So Google may find this project interesting and acquire this work to its translator toolkit. This project is also beneficial to the society as explained in the later parts.


How and who will it benefit in Society?

Marathi is the 4th most spoken language in India. Mahabhasya by Patanjali is only available in Marathi but not in Hindi. So it not accessible to the Hindi population in the country. Mahabhasya is only an example. There are many such cases in Marathi Literature. So MT would help us in such cases. The Hindi-Marathi MT can serve as a case study for building Telugu-Hindi,Kannada-Hindi and Punjabi-Hindi systems because Anusaaraka systems are already available for these languages under GPL license.



List your skills and give evidence of your qualifications.

I am currently on my first year in the Integrated Masters Program in Economics at the University Of Hyderabad,Hyderabad.I am good in Shell scripting and Perl Programming. I think I am a good manager and leader.So I can build a team which will work on the project.




Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.


I am a 18-year old student. I have just finished my schooling. I am very much fascinated by the open-source tools available on the web. With my very little knowledge of shell and Perl programming I could convert the Marathi-Hindi bilingual dictionary from one format ( Anusaaraka format) to the Apertium format very easily in a couple of days. With this experience I am confident enough that during this summer I can contribute substantially by developing the Marathi-Hindi Apertium using the resources from Marathi-Hindi Anusaaraka both of which are available under GPL.




List any non-Summer-of-Code plans you have for the Summer

No. No other plans if my application is selected.




REFERENCES:

Anusaaraka http://ltrc.iiit.ac.in/showfile.php?filename=downloads/anu/index.htm.

Speech at "First Workshop on Free Rule Based MT", at Alacante, Spain, 2nd Nov 2009 on Anusaaraka: An Accessor cum Machine Translator by Amba Kulkarni

Bharati, Akshar, Amba P Kulkarni, Dipti Misra Sharma Anusaaraka: A better approach to Machine Translation { A case study for English-Hindi/Telugu} Presented at Language Technology Tools: Implementation of Telugu; A 3 day National conference, 8-10 October, 2003, University of Hyderabad, Hyderabad

Kulkarni, Amba P. Design and Architecture of anusAraka: An Approach to Machine Translation Satyam Technical Review vol 3, Oct 2003