User:Dshgna/GSoC 2014 Proposal
Name: Dulshani Gunawardhana
E-mail address: dulshani[dot]gunawardhana89@gmail[dot]com
IRC: dshgna
Why is it you are interested in machine translation?
I am a computer science undergrad with a huge interest in linguistics. Machine translation is the perfect combined application for both these interests! Additionally, I live in a multilingual country given me first hand experience of the political, socio-economic and educational divide caused due to the language barrier. This makes me appreciate the need of MT and the change it would make.
Why is it that you are interested in the Apertium project?
The concept of freedom of software, specially when applied to a domain as complex as MT, is extremely appealing to me. The emphasis of Apertium on less-resourced languages is one of my interested points as it opens the door to many MT projects that would never see the light of day due to lack of funding and interest.
Which of the published tasks are you interested in? What do you plan to do?
Adopt an unreleased language pair: Sinhala-Tamil
As per this task, I will work on implementing bi-directional translation for the Sinhala-Tamil language pair based on the Apertium platform. This will involve developing the skelton monodix and bidix dictionaries I've already created and implementing transfer rules for Sinhala and Tamil.
Why Google and Apertium should sponsor it?
Currently Apertium has no language pair for Sinhala-Tamil. Both of these are low resource languages with a lack of open source MT systems. The only related language pair in Apertium is Sinhala-English in the nursery. (a quick literature review showed that Sinhala-Tamil translation has been only attempted using SMT which yielded low results due to the lack of language resources).
Sponsoring my work on this language pair will enable me to develop resources for two minor languages which in turn will enable others to use them for future work.
A description of how and who it will benefit in society
The biggest benefit would be that it would facilitate overcoming the language barrier between the Sinhala and Tamil people of Sri Lanka (an issue that caused a long and bloody civil war). In addition it would create valuable, open source resources that could be used in many future projects such as language learning etc.