User:Memduh/GSoC 2017

From Apertium
< User:Memduh
Revision as of 20:25, 30 March 2017 by Memduh (talk | contribs)
Jump to navigation Jump to search

Draft for GSoC 2017 project proposal.

Proposal to create and develop a Crimean Tatar-Turkish translation pair.

Personal Information

Name: Memduh Gökırmak

Email address: memduhg@gmail.com

UTC+2 Time Zone

IRC: fotonzade

Why is it you are interested in machine translation?

The study of natural language processing is fascinating to me, and machine learning is a remarkably practical application of this field readily usable by and appealing to most of the world.

Why is it that you are interested in Apertium?

Rule-based machine translation facilitates the automatic translation of languages that suffer from scarcity of resources, and so makes it possible to work with interesting languages from Kalmyk to Zazaki. The open sourced nature of Apertium and the energy and communication of the community are also particularly appealing to me.

Proposal: Crimean Tatar-Turkish MT

Why should Google and Apertium sponsor this proposal?

Which of the published tasks are you interested in? What do you plan to do?

Who will it benefit in society, and how?

Major Goals

  • Around 95% coverage
  • WER comparable to other inter-Turkic/Romance pairs.

Obstacles

Resources

  • Wikipedia

Work Plan

  • Post-application period:
  • Community-bonding period:
    • testvoc, will go without saying that I will clean the testvoc fairly regularly throughout the development process.
    • continue writing scripts
  • Month 1:
    • Writing scripts
    • Adding words to monodix/bidix, get naive coverage to around 95%
    • Chunking
    • Transfer rules
  • Month 2:
    • POS tagging/constraint grammar
    • Transfer rules
  • Month 3:
    • Creation of an Annotated Corpus

Plan by Weeks

Deliverables

  • WER comparable to other inter-Turkic/Romance pairs.
  • Data for machine-learned disambiguation.

Summer Obligations and Commitments

I will work as an intern for 20 days in a tech startup, and also take a summer class one day of the week for two months.

Qualification

I am a fourth year computer engineering student at Istanbul Technical University, and part of the ITU NLP team. I worked on the conversion of the ITU Turkish Treebanks to Universal Dependencies format (UD Turkish) (Sulubacak et. al., 2016), and have co-written a paper on MWEs in Turkish (Adalı et. al., 2016).