User:Dshgna/GSoC 2014 Proposal

From Apertium
< User:Dshgna
Revision as of 18:02, 17 March 2014 by Dshgna (talk | contribs) (Created page with "Name: Dulshani Gunawardhana E-mail address: dulshani[dot]gunawardhana89@gmail[dot]com IRC: dshgna '''Why is it you are interested in machine translation?''' ---- I am very in...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Name: Dulshani Gunawardhana E-mail address: dulshani[dot]gunawardhana89@gmail[dot]com IRC: dshgna

Why is it you are interested in machine translation?


I am very interested in both languages and computer science, for which machine translation is the perfect combined application! Additionally, I live in a multilingual country given me first hand experience of the political, socio-economic and educational divide caused due to the language barrier. This makes me appreciate the need of MT and the change it would make.

Why is it that you are interested in the Apertium project?


The concept of freedom of software, specially when applied to a domain as complex as MT, is extremely appealing. The emphasis of Apertium on less-resourced languages is one of my interested points as it opens the door to many MT projects that would never see the light of day due to lack of funding and interest.

Which of the published tasks are you interested in? What do you plan to do?


Adopt an unreleased language pair: Sinhala-Tamil

As per this task, I will work on implementing bi-directional translation for the Sinhala-Tamil language pair based on the Apertium platform. This will involve developing the skelton monodix and bidix dictionaries I've already created and implementing transfer rules for Sinhala and Tamil.

Why Google and Apertium should sponsor it?


Currently Apertium has no language pair for Sinhala-Tamil. Both of these are low resource languages with a lack of open source MT systems. The only related language pair in Apertium is Sinhala-English in the nursery. (a quick literature review showed that Sinhala-Tamil translation has been only attempted using SMT which yielded low results due to the lack of language resources).

Sponsoring my work on this language pair will enable me to develop resources for two minor languages which in turn will enable others to use them for future work.

A description of how and who it will benefit in society


The biggest benefit would be that it would facilitate overcoming the language barrier between the Sinhala and Tamil people of Sri Lanka (an issue that caused a long and bloody civil war). In addition it would create valuable, open source resources that could be used in many future projects such as language learning etc.