Draft for GSoC 2019 proposal for the release of Torwali-Urdu translation pair.
Name: Naeem Uddin
Email address: firstname.lastname@example.org
Time zone: UTC +5
GitHub Username: Namhadi
Project proposal: Torwali-Urdu MT pair
Development of the language pair for Torwali-Urdu.
Why is it that you are interested in Apertium?
I am a Computer Engineering student with interest in linguistics. Apertium is the best place for me to work on linguistics using my computational skills. Also, Apertium is mainly concerned with minority languages, therefore, it will be a good platform for me to work on Torwali-Urdu MT also because of my knowledgeability of both Urdu and Torwali Languages.
Description of how and who will it benefit in society?
Torwali is considered as one of the endangered language of the world with number of speakers less than a hundred thousand. It is not normally being used for writing even by native speakers. Since a language can sustain and revitalize only when it is actively used in as much functional domains as possible, this project will equally benefit native users as well as non-native (Urdu users) to explore and use it. Most importantly, this project will not only boost the language computationally but also help preserve the language and reduce its vulnerability to endangerment by providing the first ever translation pair between Torwali and Urdu.
Reasons why Google and Apertium should sponsor it?
I am currently a student and I love to work in summers to earn some money. Through working on this project I will also contribute towards the strengthening of my mother tongue which is a less studied and endangered language. Through this project the apertium system will be added with a new language pair of Torwali-Urdu. Therefore, sponsoring this project will boost up my expertise in computational linguistics and enhance my dedication to the Natural Language Processing. Which of the published tasks are you interested in? What do you plan to do? I will work on the release of pair between Torwali and Urdu.
To get coverage of >90%
Torwali-Urdu Dictionary http://18.104.22.168:8081/otd/HomePage.aspx
Ullah, Inam (2004). "Lexical database of the Torwali Dictionary", paper presented at the Asia Lexicography Conference,
Lunsford, Wayne A. (2001), "An overview of linguistic structures in Torwali, a language of Northern Pakistan" (PDF), M.A. Thesis, University of Texas at Arlington: 26–30
Post Application Period
- Finish the coding challenge with good WER.
Community bonding period
- Getting to know about transfer rules for Torwali and implement them.
- Getting more familiarity with Apertium community.
- Find the resources needed to generate Torwali Urdu pair.
- Add nouns to bidix
- Add adjectives to bidix
- Write transfer rules for nouns and adjectives
- Adding other POS to bidix
- Writing transfer rules for them
- Even up nouns and adjectives
- Even up other POS
- Extend bidix
- Run Tests
- Add multiwords to bidix
- Work on transfer rules
- Run final tests
- Fix loopholes/remove bugs
- Make things ready for final evaluation
1. Dictionary of Torwali-Urdu words
2. Improved bidix, transfer rules
Qualification and skills
I am a final year Computer Systems Engineering student at University of Engineering and Technology, Peshawar. I have worked on Transliteration of Torwali language and contributed as an intern game developer for 4 months at a Game studio. I have a good knowledge of databases and I am comfortable with XML, C++ and C#.
Summer obligations and commitments
I will be an Intern Computer Engineer at TRF (Torwali Research Forum), where I will work for four hours a day. From the remaining time I will allocate 5 hours to the proposed project to make a total 30 hours of work per week and will dedicate more than 5 hours when needed.