Difference between revisions of "User:Invo98"
Line 44: | Line 44: | ||
<p>Added some words(nouns and adjectives) to both bidix and kan monodix.Also, added paradigm definitions for kan monodix.</p> |
<p>Added some words(nouns and adjectives) to both bidix and kan monodix.Also, added paradigm definitions for kan monodix.</p> |
||
<p>Translated some words from the James and Mary story.Pushed into the main git repo.</p> |
<p>Translated some words from the James and Mary story.Pushed into the main git repo.</p> |
||
<p> |
<p>Converted Kannada to HFST for morphological analysis.</p> |
||
<p>Reading more from wiki.apertium for better understanding.</p> |
<p>Reading more from wiki.apertium for better understanding.</p> |
||
<p>Then, will do more of James and Mary story translation.</p> |
<p>Then, will do more of James and Mary story translation.</p> |
Revision as of 15:59, 22 March 2018
Contents
- 1 Contact information
- 2 Am I good enough?
- 3 What’s so cool about Machine Translation? Why is Apertium way cooler?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 Why Google and Apertium should sponsor it? How and who it will benefit in society?
- 6 Coding Challenge
- 7 MILESTONES
- 8 Other commitments
Contact information
Name: Vidyadheeshad D N
Location: Kochi, India
University: Model Engineering College
E-mail: vidyadheeshadn.mec@gmail.com
IRC: invo
Timezone:IST or UTC + 5:30
Github:https://github.com/MissingBytes/
Am I good enough?
Education: B.Tech. in Computer Science and Engineering (2015-2019) 3rd year
Related Courses known:
Theory of computation, Soft computing, Natural Language Processing(On-going), Compiler Design(On-going)
Natural Languages:
Elementary proficiency :Kannada ,Malayalam, Tamil.
Fluent and formal education: English and Hindi.
Technical Skills:
Programming: C/C++, Python, Java, C#, Bash scripting
Database: MySQL
Web Technologies: CSS, HTML, JavaScript,XML
Qualities: Punctual, focused, determined,finding and improving over my mistakes, friendly and open, innovative/creative.
What’s so cool about Machine Translation? Why is Apertium way cooler?
India is home to a diverse set of languages. There are 22 officially recognised languages in the country. I’ve found myself unable to communicate in the language of my choice more than once. During those times, tools like Google Translate have been a lifesaver. I have always been an avid fan of sci-fi shows like Rick and Morty where Rick uses live translation to talk to aliens, I sincerely believe that live translation when achieved would be beneficial in this world, thus enabling us to move past the language barrier bringing us all closer together.
We are approaching a sci-fi future and Apertium being a free and open source project dedicated to machine translation with an ever-growing library of about 47 stable languages and more, will be sure to be the basis for language translation. Translations will need many contributions, both big and small from different people from all over the world. Many existing translation tools are mostly commercial and they use different methods for each language pairs, whereas Apertium uses common defining semantics(LIS). I believe my contribution to add a new language pair would further the project. The mentors I’ve interacted with is super friendly and is ready to answer any number of questions, for which I’m grateful
Which of the published tasks are you interested in? What do you plan to do?
I am interested in adding a new language pair. My recipe: Kannada-Marathi pair.
Latin-Russian language pair
Why Google and Apertium should sponsor it? How and who it will benefit in society?
As of 2017, there are 67 million native Kannada speakers(Kannadigas). Kannada stands among 30 most widely spoken languages of the world as of 2001. Experts vouch that in the next 20 years, Kannada speakers will outnumber many foreign languages speakers, including German and French.It has 2,000 years of history and grand literature. Marathi ranks 19th in the list of most spoken language in the world with 73 million speakers as of 2007. Maharashtra(where people speak Marathi) and Karnataka(where people speak Kannada) are neighbouring states/province, and in contrary are not much similar, Marathi is an Indo-Aryan language whereas Kannada is a Dravidian language. Cities such as Bangalore and Mumbai are densely populated with IT hubs and are typical multilingual cities and also these places have good internet penetration to access our translator.And of course, there would be other groups to benefit from this.
Coding Challenge
Installed Apertium
Bootstrapped the language pair:kan-mar. Marathi monolingual dix already exists.
Added some words(nouns and adjectives) to both bidix and kan monodix.Also, added paradigm definitions for kan monodix.
Translated some words from the James and Mary story.Pushed into the main git repo.
Converted Kannada to HFST for morphological analysis.
Reading more from wiki.apertium for better understanding.
Then, will do more of James and Mary story translation.
MILESTONES
Week 0: Community bonding
Getting familiar with all the Apertium modules and it’s working. Discussion with mentors and clearing doubts. Also, discussion with other selected members about their ideas in detail. Reading and gathering information about other different Machine translation tools. Reading and editing the wiki to include small changes for making it more easier for newbies to join Apertium.
First phase
Week 1:
Performing analysis of Kannada morphology. Adding a ton of nouns and adjectives to the Kannada dictionary. Defining paradigms: Extending the dictionary by adding declension to Kannada dictionary(cases).
Week 2:
Analysis and generation of those added words.
Week 3:
Continue with the work.
Week 4:
Testvoc for closed-classes. Preparing for evaluation.
Outcome: Kannada monolingual dictionary takes action.
Second phase
Week 5:
Observing the sentence structure of both the languages. Fixing minor issues in Marathi dictionary.
Week 6:
Adding words to the bilingual dictionary.
Week 7:
Including transfer rules.
Week 8:
Continue with the work. Test translation and prepare for evaluation.
Outcome:Understandable translation between these two languages.
Third phase
Week 9: Syntactic rules 2
Disambiguation rules and discourse structure for Kannada.
Week 10:
Constraint grammar and tagger training.
Week 11:
Continue with the work. Testing using corpus or newspaper contents.
Week 12:
Writing documentation, complete testing and fixing bugs.
Outcome: Hopefully a testvoc clean pair for release.
Final evaluation
Other commitments
Until 8th of May, I am having semester exams. I won’t be able to spend much time till then, thereafter I shall be fully committed to this project. I have no other plans as of now and also will compensate in case I am having any change in my plans. I would be interested in spending about 40+ hours a week and maybe or more depending on my schedule.