User:Iamas/GSoC13 Application: "Improved Bilingual Dictionary Induction"
Name
Arnav Sharma
Contact Information
E-mail : arnavsharma93@gmail.com
Facebook : arnavsharma93
IRC : iamas, arnavsharma93
SourceForge : iamas
Why am I interested in Machine Translation?
Machine Translation is an important technology for localization, and is particularly relevant in a linguistically diverse country like India. Machine Translation can help reduce the language barrier. That motivated me to study Computational Linguistics in IIIT-H. I am currently working in the Machine Translation Department of IIIT-H.
Why am I interested in the Apertium Project?
I have been fascinated by FOSS and open source software since the time I heard about it. As mentioned above, Machine Translation interests me a lot. Apertium combines both of these factors. Plus, I really like Begiak.
Which of the published tasks am I interested in? What do I plan to do?
I am interested in the project ===Improved Billingual Induction===.
Proposal Title
Improved bilingual dictionary induction
Why Apertium and Google should sponsor it?
A description on who and how it will benefit the society
Work Plan
Coding Challenge
The coding challenge involved:
- Install Apertium
- Install GIZA++
- Generate a word alignment model for a parallel corpus of your choice.
- Rewrite the script generate-bidix-templates.py to use python3/ElementTree.
I have finished the coding challenge. It can be found on github [[1]].
Community bonding period
Week Plan
Week | Plan |
---|---|
Week 01 | |
Week 02 | |
Week 03 | |
Week 04 | |
Deliverable #1 | |
Week 05 | |
Week 06 | |
Week 07 | |
Week 08 | |
Deliverable #2 | |
Week 09 | |
Week 10 | |
Deliverable #3 | |
Week 11 | |
Week 12 | |
Deliverable #Final |
Biography
I am currently pursuing Bachelor of Technology in Computer Science and MS by Research in Computational Linguistics at IIIT-H. I have just finished my second year in that. I have been studying the various fields of Computational Linguistics for the past two years and I can not wait to study more. I am proficient in Python, C/C++, Bash, SQL and HTML5. I have developed an Urdu-Hindi transliterator using NLP tools. It gave an accuracy of 75%.
Non-Summer-of-Code plans for the summer
I might have to go for a social entrepreneurship trip for 3 days in July. Otherwise, I have nothing else planned for the summer. This project will be my main priority.