Difference between revisions of "User:Iamas/GSoC13 Application: "Improved Bilingual Dictionary Induction""

Revision as of 15:39, 2 May 2013

1 Name
2 Contact Information
3 Why am I interested in Machine Translation?
4 Why am I interested in the Apertium Project?
5 Which of the published tasks am I interested in? What do I plan to do?
6 Proposal Title
7 Why Apertium and Google should sponsor it?
8 Work Plan
- 8.1 Coding Challenge
9 Headline text
- 9.1 Interim period and community bonding period
- 9.2 Week Plan
10 Biography
11 Non-Summer-of-Code plans for the summer

Name

Arnav Sharma

Contact Information

E-mail : arnavsharma93@gmail.com
GitHub : arnavsharma93
IRC : iamas, arnavsharma93
SourceForge : iamas

Why am I interested in Machine Translation?

Machine Translation is an important technology for localization, and is particularly relevant in a linguistically diverse country like India. Machine Translation can help reduce the language barrier. That motivated me to study Computational Linguistics in IIIT-H. I am currently working in the Machine Translation Department of IIIT-H.

Why am I interested in the Apertium Project?

I have been fascinated by FOSS and open source software since the time I heard about it. As mentioned above, Machine Translation interests me a lot. Apertium combines both of these factors. Plus, I really like Begiak.

Which of the published tasks am I interested in? What do I plan to do?

I am interested in the project Improved Bilingual Dictionary Induction.

Proposal Title

Improved bilingual dictionary induction

Why Apertium and Google should sponsor it?

Bilingual Dictionary is one of the five main dictionaries used in Apertium. This project involves generating valid and consistent Apertium bilingual dictionary entries from a word-aligned parallel corpus. There exist such tools but most of the generated entries have to be checked, which can greatly increase the amount of time it takes to make a new translation system. This will greatly benefit the lexicographers and other contributors and will help in reducing the effort and time taken to make new translation system.

Work Plan

Coding Challenge

The coding challenge involved:

Install Apertium
Install GIZA++
Generate a word alignment model for a parallel corpus of your choice.
Rewrite the script generate-bidix-templates.py to use python3/ElementTree.

I have finished the coding challenge.

Link can be found on github here.
Please refer to the README for further details.

Headline text

Interim period and community bonding period

Get to know the community better
Habituate myself with the Apertium platform and project
Make preparations and gain necessary information that will help me in the coding period.
Contribute by solving bugs, rewriting scripts and contributing to the language pairs Hindi-Punjabi and Hindi-Urdu.

Week Plan

WEEK	DATE	PLANS
Week 01	06.17-06.23
Week 02	06.24-06.30
Week 03	07.01-07.07
Week 04	07.08-07.14
Deliverable #1
Week 05	07.15-07.21
Week 06	07.22-07.28
Week 07	07.29-08.04
Week 08	08.05-08.11
Deliverable #2
Week 09	08.12-08.18
Week 10	08.19-08.25
Week 11	08.26-09.01
Week 12	09.02-09.08
Deliverable #3

Biography

I am currently pursuing Bachelor of Technology in Computer Science and MS by Research in Computational Linguistics at IIIT-H. I have just finished my second year in that. I have been studying the various fields of Computational Linguistics for the past two years and I can not wait to study more. I am proficient in Python, C/C++, Bash, SQL and HTML5. I have developed an Urdu-Hindi transliterator using NLP tools. It gave an accuracy of 75%.

Non-Summer-of-Code plans for the summer

I might have to go for a social entrepreneurship trip for 3 days in July. Also, I plan on improving my programming skills by taking part in algorithmic coding competitions. Otherwise, I have nothing else planned for the summer. This project will be my main priority.

@@ Line 36: / Line 36: @@
 * Please refer to the [https://github.com/arnavsharma93/CodingChallengeApertium/blob/master/README.md README] for further details.
-=== Interim Period and Community bonding period ===
+== Headline text ==
+=== Interim period and community bonding period ===
 *Get to know the community better
 *Habituate myself with the Apertium platform and project

Difference between revisions of "User:Iamas/GSoC13 Application: "Improved Bilingual Dictionary Induction""

Revision as of 15:39, 2 May 2013

Contents

Name

Contact Information

Why am I interested in Machine Translation?

Why am I interested in the Apertium Project?

Which of the published tasks am I interested in? What do I plan to do?

Proposal Title

Why Apertium and Google should sponsor it?

Work Plan

Coding Challenge

Headline text

Interim period and community bonding period

Week Plan

Biography

Non-Summer-of-Code plans for the summer

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools