Difference between revisions of "User:Sl33k/Application"

From Apertium
Jump to navigation Jump to search
Line 101: Line 101:
   
 
The Indo-Aryan Languages - Colin Masica [http://books.google.com/books?id=J3RSHWePhXwC&printsec=frontcover&dq=indo-aryan+languages#v=onepage&q&f=false]
 
The Indo-Aryan Languages - Colin Masica [http://books.google.com/books?id=J3RSHWePhXwC&printsec=frontcover&dq=indo-aryan+languages#v=onepage&q&f=false]
  +
   
 
'''References:'''
 
'''References:'''
  +
 
----
 
----
   

Revision as of 17:02, 6 April 2011

Name: Ahsan Bagwan

IRC: sl33k_

Email ahsanbagwan@gmail.com

Skype ahsanbagwan


Why is it you are interested in machine translation?

I am quite impressed by the work of machine translation tools and techniques on most of the Indo-European languages. Although, the work in this sub-field of computational linguistic on Indian subcontinental languages is relatively low, but there are recent trends that show us that this picture is likely to change[1]. As a student interested in natural language processing, MT gives me a great platform to work closely with linguistics and the interaction of natural languages and computer languages.


Why is it that you are interested in the Apertium project?

Apertium was pretty fascinating to me when I first came across it. Largely, because of its appeal as a great open source MT engine and also the converting the linguistic data by the tools in a fairly comprehensive way even at the first glance. Its couples it with some easiest to follow documentation. Having been looking to gain some detailed knowledge on its working, it walked me through some basic concepts and linked pages to extra resources on the wiki, which has provided me a good base to build upon.

Which of the published tasks are you interested in?

Apertium-ur-hi: Adopting the Urdu-Hindi language pair.

Why should Google and Apertium sponsor it?

The ur-hi language pair has some initial work done to its Urdu and Hindi morphological analyser. Currently, there are no stable Indo-Aryan language pair in the repository. Making a stable language pair with release quality results would trigger the development with other related subcontinental language paired with the Hindi language.


Community Bonding Period

- Lots of reading on machine translation.

- Setup the apertium environment and get more familiar with the tools.

- Plan more about conflicts that could arise when writing structural transfer rules for both the languages.

- Work out more detailed scripts for testing.

Work Plan:

Week 1: Make sure the tagsets are consistent between M. Humayoun, IIIT and apertium.

Week 2: Convert M. Humayoun's Urdu Morphology to lttoolbox probably using speeling tools and full form list.

Week 3:

Week 4:

Deliverable #1: Widely covered monolingual dictionaries and bilingual dictionary for ur-hi.

Week 5: Train parts of speech taggers for both Urdu and Hindi.

Week 6:

Week 7:

Week 8:

Deliverable #2:

Week 9: Write transfer rules if any.

Week 10: Retrain parts of speech tagger with target language training.

Week 11:

Week 12: Update the ur-hi documentation to reflect the changes. Write up some brief summary about

Deliverable #3: Completed project. Ready for release.


Reasearch Done:




Bio


I am an undergraduate student of Information Technology in the final year at Sinhagad Academy of Engineering (India). We had a course included finite state transducers, computer laboratory assignments on XML and C++.

Experience

Programming languages: Competency in Python, C, Java. Limited experience with C++.

Markup languages: HTML, XML, reST


Notes


http://wiki.apertium.org/wiki/Contributing_to_an_existing_pair

http://secure.wikimedia.org/wikipedia/en/wiki/Hindi-Urdu_grammar

The Indo-Aryan Languages - Colin Masica [1]


References:


[1] Use of Machine Translation in India: Current Status http://www.mt-archive.info/MTS-2005-Naskar-2.pdf