User:Firespeaker/GSoC2014/Application draft

From Apertium
Jump to navigation Jump to search
  • Name:
    Jonathan North Washington
  • E-mail address:
    (fill in)
  • Other information that may be useful to contact you:
    cell phone: (fill in)
  • Why is it you are interested in machine translation? / Why is it that they are interested in the Apertium project?
    I got interested in MT between Turkic languages in 2011 when I mentored Mirlan's tur-kir project. I found that with my familiarity with linguistics, my knowledge of the languages, and my bravery with new formalisms, I was able to learn quickly and do useful work. I've been active in Apertium every since. I'm a Turkic linguist speciali‌sing in phonology, phonetics, and socio-historical linguistics, but because of my work with Apertium, I have started to consider myself a computational linguist as well.
  • Which of the published tasks are you interested in? What do you plan to do?
    I plan to "Adopt an unreleased language pair", or in this case three: tur-kir, kaz-kir, tur-uzb. These three pairs were developed originally as GSoC projects, but none of them made it to release quality (they are all currently in the nursery). My goal is to bring tur-kir and kaz-kir to release quality (trunk), and bring tur-uzb to at least "working" quality (staging).
  • Include a proposal, including
  • * a title,
  • * reasons why Google and Apertium should sponsor it,
  • * a description of how and who it will benefit in society,
  • * and a detailed work plan (including, if possible, a brief schedule with milestones and deliverables).
    Include time needed to think, to program, to document and to disseminate.
  • List your skills and give evidence of your qualifications. Tell us what is your current field of study, major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.
    I have been involved with Apertium since 2011, and have extensive experience writing and improving morphological transducers. I also have a reasonable amount of experience with bidix, CG, and lrx, and can manage transfer. I will need to develop my skills in running testvoc, as that will be another large focus of the work. As far as the languages go, I'm proficient in Kyrgyz and Kazakh, and can get by in and read Uzbek and Turkish (often with the help of a dictionary). I also work on the linguistics of all of these languages (especially Kazakh and Kyrgyz), and have a "deep" understanding of the way all of these languages work. I have at my disposal dictionaries and grammars for all the languages. I also know potential consultants for most of them available as well, which will be important for getting post-edited texts to get WER numbers on.
  • List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.
    I will essentially be free for the entire period of GCI. My semester (and current work) end on May 2, and I will resume such activities on August 25. I will probably be doing minimal hourly tutoring-type work for a few weeks in May, and may lose a day here and there in June and July for attending a conference, domestic road travel, etc. I also plan to be working on my dissertation project over the summer, but I do not expect it to interfere with GSoC as I will not have any "crunch times" related to it during that period. During previous breaks from study/work, even with other looming deadlines, I have easily spent 30 hours a week on Apertium-related work.