User:Invo98

From Apertium
Jump to navigation Jump to search

CONTACT INFORMATION

Name: Vidyadheesha D N

Location: Kochi, India

University: Model Engineering College

E-mail: vidyadheeshadn.mec@gmail.com

IRC: invo

Timezone:IST or UTC + 5:30

Github:https://github.com/MissingBytes/

Am I good enough?

Education:

B.Tech. in Computer Science and Engineering (2015-2019) 3rd year

Related Courses known:

Theory of computation, Soft computing, Natural Language Processing(On-going), Compiler Design(On-going)

Natural Languages:

Elementary proficiency :Kannada ,Malayalam, Tamil.

Fluent and formal education: English and Hindi.

Technical Skills:

Programming: C/C++, Python, Java, C#, Bash scripting

Database: MySQL

Web Technologies: CSS, HTML, JavaScript,XML

Qualities:

Punctual, focused, determined,finding and improving over my mistakes, friendly and open, innovative/creative.

What’s so cool about Machine Translation? Why is Apertium way cooler?

India is home to a diverse set of languages. There are 22 officially recognised languages in the country. I’ve found myself unable to communicate in the language of my choice more than once. During those times, tools like Google Translate have been a lifesaver. I have always been an avid fan of sci-fi shows like Rick and Morty where Rick uses live translation to talk to aliens, I sincerely believe that live translation when achieved would be beneficial in this world, thus enabling us to move past the language barrier bringing us all closer together.

We are approaching a sci-fi future and Apertium being a free and open source project dedicated to machine translation with an ever-growing library of about 47 stable languages and more, will be sure to be the basis for language translation. Translations will need many contributions, both big and small from different people from all over the world. Many existing translation tools are mostly commercial and they use different methods for each language pairs, whereas Apertium uses common defining semantics(LIS). I believe my contribution to add a new language pair would further the project. The mentors I’ve interacted with is super friendly and is ready to answer any number of questions, for which I’m grateful

Which of the published tasks are you interested in? What do you plan to do?

I am interested in adding a new language pair. My recipe: Kannada-Marathi pair.

Why Google and Apertium should sponsor it? How and who it will benefit in society?

As of 2017, there are 67 million native Kannada speakers(Kannadigas). Kannada stands among 30 most widely spoken languages of the world as of 2001. Experts vouch that in the next 20 years, Kannada speakers will outnumber many foreign languages speakers, including German and French.It has 2,000 years of history and grand literature. Marathi ranks 19th in the list of most spoken language in the world with 73 million speakers as of 2007. Maharashtra(where people speak Marathi) and Karnataka(where people speak Kannada) are neighbouring states/province, and in contrary are not much similar, Marathi is an Indo-Aryan language whereas Kannada is a Dravidian language. Cities such as Bangalore and Mumbai are densely populated with IT hubs and are typical multilingual cities and also these places have good internet penetration to access our translator.And of course, there would be other groups to benefit from this.

CODING CHALLENGE

1.Installed Apertium tools

2.Bootstrapped the language pair:kan-mar. Marathi monolingual dix already exists.

3.Added some words(nouns and adjectives) to both bidix and kan monodix.Also, added paradigm definitions for kan monodix.

4.Translated some words from the James and Mary story.Pushed into the main git repo.

5.Converted Kannada to HFST for morphological analysis.

6.Reading more from wiki.apertium for better understanding.

7.Then, will do more of James and Mary story translation.

MILESTONES

Week 0: Community bonding

               Getting familiar with all the Apertium modules and it’s working. Discussion with mentors and clearing doubts. 
               Also, discussion with other selected members about their ideas in detail. Reading and gathering information about other
               different Machine translation tools.
               Reading and editing the wiki to include small changes for making it more easier for newbies to join Apertium.



First phase

Week 1:

               Performing analysis of Kannada morphology.
               Adding a ton of nouns and adjectives to the Kannada dictionary.
          	Defining paradigms: Extending the dictionary by adding declension to Kannada dictionary(cases).

Week 2:

          	Analysis and generation of those added words.
          	

Week 3:

          	Continue with the work.

Week 4:

          	Testvoc for closed-classes.
          	Preparing for evaluation.

Outcome: Kannada monolingual dictionary takes action.



Second phase

Week 5:

          	Observing the sentence structure of both the languages.
          	Fixing minor issues in Marathi dictionary.

Week 6:

          	Adding words to the bilingual dictionary.

Week 7:

          	Including transfer rules.

Week 8:

               Continue with the work.
               Test translation and prepare for evaluation.

Outcome:Understandable translation between these two languages.



Third phase

Week 9:

          	Disambiguation rules and discourse structure for Kannada. 

Week 10:

          	Constraint grammar and tagger training.

Week 11:

          	Continue with the work. Testing using corpus or newspaper contents.

Week 12:

          	Writing documentation, complete testing and fixing bugs.

Outcome: Hopefully a testvoc clean pair for release.


Final evaluation



Other commitments

Until 8th of May, I am having semester exams. I won’t be able to spend much time till then, thereafter I shall be fully committed to this project. I have no other plans as of now and also will compensate in case I am having any change in my plans. I would be interested in spending about 40+ hours a week and maybe or more depending on my schedule.