User:Memduh/GSoC 2017
Draft for GSoC 2017 project proposal.
Proposal to create and develop a Crimean Tatar-Turkish translation pair.
Contents
Personal Information
Name: Memduh Gökırmak
Email address: memduhg@gmail.com
UTC+2 Time Zone
IRC: fotonzade
Why is it you are interested in machine translation?
The study of natural language processing is fascinating to me, and machine learning is a remarkably practical application of this field readily usable by and appealing to most of the world.
Why is it that you are interested in Apertium?
Rule-based machine translation facilitates the automatic translation of languages that suffer from scarcity of resources, and so makes it possible to work with interesting languages from Kalmyk to Zazaki. The open sourced nature of Apertium and the energy and communication of the community are also particularly appealing to me.
Proposal: Crimean Tatar-Turkish MT
Why should Google and Apertium sponsor this proposal?
Which of the published tasks are you interested in? What do you plan to do?
Who will it benefit in society, and how?
Major Goals
- Around 95% coverage
- WER comparable to other inter-Turkic/Romance pairs.
Obstacles
Resources
- Wikipedia
Work Plan
- Post-application period:
- Community-bonding period:
- testvoc, will go without saying that I will clean the testvoc fairly regularly throughout the development process.
- continue writing scripts
- Month 1:
- Writing scripts
- Adding words to monodix/bidix, get naive coverage to around 95%
- Chunking
- Transfer rules
- Month 2:
- POS tagging/constraint grammar
- Transfer rules
- Month 3:
- Creation of an Annotated Corpus
Plan by Weeks
Deliverables
- WER comparable to other inter-Turkic/Romance pairs.
- Data for machine-learned disambiguation.
Summer Obligations and Commitments
I will work as an intern for 20 days in a tech startup, and also take summer classes one day of the week for two months.
Qualification
I am a fourth year computer engineering student at Istanbul Technical University, and part of the ITU NLP team. I worked on the conversion of the ITU Turkish Treebanks to Universal Dependencies format (UD Turkish) (Sulubacak et. al., 2016), and have co-written a paper on MWEs in Turkish (Adalı et. al., 2016).