User:Ragib06/Application2011

From Apertium
Jump to navigation Jump to search

Google Summer of Code Application 2011[edit]

Ragib Ahsan
Department of Computer Science and Engineering
Bangladesh University of Engineering and Technology

1 Name

Ragib Ahsan

2 Email Address

ragib.ahsan@gmail.com

3 Contact Information

IRC: ragib@irc.freenode.net

Cell Phone: +880 1717051469

4 Why is it you are interested in machine translation?

In my academic years, as a student of Computer Science and Engineering, I came across with some fundamental courses related to the field like Automata theory and Compiler. I worked out some implementations also as course assignments and that made me interested in it. Moreover, In my next semester I’m going to have some advanced courses like Machine Learning and Pattern Recognition. As a whole, I think it would be a nice experience for me working around in this field.

5 Why is it you are interested in the Apertium Project?

There are several key points that made me interested in Apertium Project. Firstly and obviously I love the beautiful concept of Open Source Development. There are lots of movements for OSD around in my country during the past few years. Though not directly involved, I want to contribute for my country.

With the increasing demand in the technological advancement, computer linguistic is also getting more and more challenging. And more demanding is well. I’m eager to take the challenge and move forward with this project.

It was great to find a fellow Bangladeshi working on the project. That inspired me a bit more. And finally, communicating with the team and meeting the nice community, I really feel that I’m ready to go on with Apertium.

6 Which of the published tasks are you interested in? What do you plan to do?

I found an incomplete task on Bengali-English language pair in Apertium. I also checked that it was a GSoC project of 2009. Among the three major parts, Morphological Generation of Bengali dictionary is nearly complete. But the Bilingual dictionary needs lot more entries to be complete and there is only a few transfer rules implemented at the moment. So, practically there are lots of tasks to be done. Talking to the fellow contributors on this project, Abu Zaher and Francis Tyers, I figured out that my primary goal should be to complete both the monolingual and bilingual dictionaries for bn-en to a wide coverage of Bengali wiki. Then I’ll focus on the transfer rules and finally, the testvoc.

It should be noted that only translation from English to Bengali is the focus of this Project. Bengali to English is out of the scope right now.

7 Why should Google and Apertium Sponsor it?

This is not the beginning of adopting a new language pair from the scratch. As I mentioned earlier, there has been a GSoC project back in 2009 on adopting Bengali-English language pair. But that project was not complete enough to release bn-en from Apertium. Now, it is necessary to complete the task and release a fully fledged Bengali-English translation system. With a complete Release, in the first place, it'll be a great help for the Bengali Wikipedians, I think.

8 How and who will it benefit in society?

I’m from Bangladesh, a developing country in southern Asia. Population is still a big problem here. Technology is the only way to overcome the problems and hope for a better future. But language has always been a great barrier for the people here towards the study of technology. In a country like this, Open Source Philosophy is a blessing. Localization of various open source softwares can play a great role in the technological advancement of the society. I have a strong belief that, this project will accelerate the movement of open source development as well as IT in Bangladesh.

9 Work Plan

I have been keeping in contact with mentor Francis Tyers and Abu Zaher regarding this idea. The major aspects here are:

  • Expanding the morphological generator
  • Completing the bilingual dictionary for bn-en
  • Writing the transfer rules to complete the transfer system
  • Testvocing

From 2009 report and other documentation on the previous project i found that currently the Bengali monodix has 68% coverage of the 20K most frequent words in Benglai wiki. I’m planning to expand it to about a wide coverage (at least 80%). But Keeping all the findings and GSoC’s tight schedule in mind, I have decided that, my first priority would be to complete the bilingual dictionary and writing necessary transfer rules.

I intend to follow this time schedule:

Community Bonding Period (April 26 - May 22)

  • Familiarizing with Apertium’s tool-chain and its community
  • Thorough checkup of Apertium
  • Creating a test case list that would help on testvocing

Week 1 (May 23 – May 29)

  • Start working on Bengali Morphological Generator
  • Adding new/necessary words to the monodix

Deliverable: Updated monodix

Week 2 (May 30 - June 5)

  • Continuing working on Bengali Morphological Generator
  • Getting to know the appertium pipeline a bit more

Deliverable: Updated monodix

Week 3 (June 6 - June 12)

  • Continue working on Bengali Morphological Generator
  • Finish Bengali Morphological Generation

Deliverable Complete Bengali Morphological Generator

Week 4 (June 13 - June 19)

  • Start working on bilingual dictionary on bn-en
  • Adding new/necessary mappings to the bidix

Deliverable: Updated Bidix

Week 5 (June 20 - June 26)

  • Continue working on bilingual dictionary on bn-en

Deliverable: Updated Bidix

Week 6 (June 27 - July 3)

  • Continue working on bilingual dictionary on bn-en

Deliverable: Updated Bidix

Week 7 (July 4 - July 10)

  • Manual Checking on English to Bengali bidix
  • Checking for any missing but necessary mappings to add to the bidix
  • Finish bidix

Deliverable: Finished Bidix

Mid term evaluation

Week 8 (July 11 - July 17)

  • Investigation on transfer system
  • Start to write the transfer rules

Deliverable: Refined transfer system

Week 9 (July 18 - July 24)

  • Continue working on transfer system
  • Write new transfer rules

Deliverable: Refined transfer system

Week 10 (July 25 - July 31)

  • Continue working on transfer system
  • Write new transfer rules

Deliverable: Refined transfer system

Week 11 (August 1 - August 7)

  • Finish working on transfer system
  • Start TestVocing

Deliverable: Finished transfer system

Week 12 (August 8 - August 14)

  • TestVocing

Deliverable: Refined bn-en

Week 13 (August 15 - August 16)

  • Finish TestVocing
  • Evolution / Cleanup

Deliverable: Complete Product

Final Submission (August 15 – August 26)

10 List your skills and give evidence of your qualifications

At the moment, I’m in the final year of my undergraduate studies in Computer Science and Engineering in Bangladesh University of Engineering and Technology. I have attended both theoretical and lab courses in Data Structure, Algorithm, Automata Theory and Compiler and I’m quite eager to involve myself in the implementation of the theoretical knowledge I’ve got.

In my academic career I’ve developed several applications both on desktop and web e.g. social networking site for students, third party online reservation system etc. I also prepared a small scale compiler for Pascal with Lex and YACC and I think that would be helpful for me for this project.

Though I’m new in the Open Source Development, I’m quite aware of the movements around in my country especially with the organization BDOSN. I was just looking for the opportunity and I think this would be a great platform for me to start the beautiful journey.

My resume can be viewed from here.

11 List any non-Summer-of-Code plans you have for the Summer

I do not have any other plans for the summer at the moment. My academic calendar may conflict with the GSoC timeline, but I’m quite confident that I can make it. Also I’d get a vacation before the final submission deadline. That’ll be of great help.

12 Conclusion

I’d like to thank Google for organizing this great event for the students all over the world. This is a great opportunity for us to learn as well as prove ourselves as competent in the field. I’m really looking forward to make my last summer in undergrad a memorable one.