Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

User:Naeem/Proposal

From Apertium
(Difference between revisions)
Jump to: navigation, search
(add to the 'GSoC 2019 student proposals' page)
(Project proposal: Torwali-Urdu MT pair)
Line 41: Line 41:
 
'''Major Goals'''
 
'''Major Goals'''
   
To get coverage of >70%
+
To get coverage of >90%
   
   
Line 54: Line 54:
   
 
Lunsford, Wayne A. (2001), "An overview of linguistic structures in Torwali, a language of Northern Pakistan" (PDF), M.A. Thesis, University of Texas at Arlington: 26–30
 
Lunsford, Wayne A. (2001), "An overview of linguistic structures in Torwali, a language of Northern Pakistan" (PDF), M.A. Thesis, University of Texas at Arlington: 26–30
 
 
   
 
== Work Plan ==
 
== Work Plan ==

Revision as of 17:45, 12 April 2019

Draft for GSoC 2019 proposal for the release of Torwali-Urdu translation pair.

Personal Information



Name: Naeem Uddin

Email address: naeemuddinhadi@gmail.com

Time zone: UTC +5

IRC: Naeem

GitHub Username: Namhadi


Project proposal: Torwali-Urdu MT pair

Title

Development of the language pair for Torwali-Urdu.

Why is it that you are interested in Apertium?

I am a Computer Engineering student with interest in linguistics. Apertium is the best place for me to work on linguistics using my computational skills. Also, Apertium is mainly concerned with minority languages, therefore, it will be a good platform for me to work on Torwali-Urdu MT also because of my knowledgeability of both Urdu and Torwali Languages.

Description of how and who will it benefit in society?

Torwali is considered as one of the endangered language of the world with number of speakers less than a hundred thousand. It is not normally being used for writing even by native speakers. Since a language can sustain and revitalize only when it is actively used in as much functional domains as possible, this project will equally benefit native users as well as non-native (Urdu users) to explore and use it. Most importantly, this project will not only boost the language computationally but also help preserve the language and reduce its vulnerability to endangerment by providing the first ever translation pair between Torwali and Urdu.

Reasons why Google and Apertium should sponsor it?

I am currently a student and I love to work in summers to earn some money. Through working on this project I will also contribute towards the strengthening of my mother tongue which is a less studied and endangered language. Through this project the apertium system will be added with a new language pair of Torwali-Urdu. Therefore, sponsoring this project will boost up my expertise in computational linguistics and enhance my dedication to the Natural Language Processing. Which of the published tasks are you interested in? What do you plan to do? I will work on the release of pair between Torwali and Urdu.

Major Goals

To get coverage of >90%


Resources

Wikipedia

Torwali-Urdu Dictionary http://202.142.159.36:8081/otd/HomePage.aspx

Ullah, Inam (2004). "Lexical database of the Torwali Dictionary", paper presented at the Asia Lexicography Conference,

Lunsford, Wayne A. (2001), "An overview of linguistic structures in Torwali, a language of Northern Pakistan" (PDF), M.A. Thesis, University of Texas at Arlington: 26–30

Work Plan

Post Application Period

  • Finish the coding challenge with good WER.

Community bonding period

  • Getting to know about transfer rules for Torwali and implement them.
  • Getting more familiarity with Apertium community.
  • Find the resources needed to generate Torwali Urdu pair.

Week 1,2

  • Add nouns to bidix
  • Add adjectives to bidix
  • Write transfer rules for nouns and adjectives

Week 3,4

  • Adding other POS to bidix
  • Writing transfer rules for them
  • Documentation

Week 5,6

  • Even up nouns and adjectives
  • Even up other POS

Week 7,8

  • Extend bidix
  • Run Tests
  • Documentation

Week 9,10

  • Add multiwords to bidix
  • Work on transfer rules

Week 11,12

  • Run final tests
  • Fix loopholes/remove bugs
  • Documentation
  • Make things ready for final evaluation

Deliverables

1. Dictionary of Torwali-Urdu words

2. Improved bidix, transfer rules


Qualification and skills

I am a final year Computer Systems Engineering student at University of Engineering and Technology, Peshawar. I have worked on Transliteration of Torwali language and contributed as an intern game developer for 4 months at a Game studio. I have a g good knowledge of databases and I am comfortable with XML, C++ and C#.

Summer obligations and commitments

I will be an Intern Computer Engineer at TRF (Torwali Research Forum), where I will work for four hours a day. From the remaining time I will allocate 5 hours to the proposed project to make a total 30 hours of work per week and will dedicate more than 5 hours when needed.

Personal tools