User:Firespeaker/GSoC2014/Application draft
< User:Firespeaker | GSoC2014
Jump to navigation
Jump to search
Revision as of 19:29, 12 March 2014 by Firespeaker (talk | contribs)
- Name:
- Jonathan North Washington
- E-mail address / gtalk:
- (fill in)
- Other information that may be useful to contact you:
- cell phone: (fill in)
- Why is it you are interested in machine translation? / Why is it that they are interested in the Apertium project?
- I got interested in MT between Turkic languages in 2011 when I mentored Mirlan's tur-kir project. I found that with my familiarity with linguistics, my knowledge of the languages, and my bravery with new formalisms, I was able to learn quickly and do useful work. I've been active in Apertium ever since. I'm a Turkic linguist specialising in phonology, phonetics, and socio-historical linguistics, but because of my work with Apertium, I have started to consider myself a computational linguist as well.
- Which of the published tasks are you interested in? What do you plan to do?
- I plan to "Adopt an unreleased language pair", or in this case three: tur-kir, kaz-kir, tur-uzb. These three pairs were developed originally as GSoC projects, but none of them made it to release quality (they are all currently in the nursery). My goal is to bring tur-kir and kaz-kir to release quality (trunk), and bring tur-uzb to at least "working" quality (staging).
- Include a proposal, including
- a title,
- Bringing tur-kir, kaz-kir, and tur-uzb out of Nursery
- reasons why Google and Apertium should sponsor it,
- These are three pairs that could be brought to (or very near to) production quality without too much work. There are probably not many other people who know these languages all well enough and are familiar enough with the pairs to accomplish this in one summer. If successful, this project would add three more production-quality pairs to Apertium's inventory, quadrupling the number of Turkic pairs in production.
- a description of how and who it will benefit in society,
- Over 80 million people speak the four languages involved as a first language, and they all stand to benefit from this project.
- and a detailed work plan (including, if possible, a brief schedule with milestones and deliverables).
- Include time needed to think, to program, to document and to disseminate.
- a title,
- List your skills and give evidence of your qualifications. Tell us what is your current field of study, major, etc. Convince us that you can do the work. In particular we would like to know whether you have programmed before in open-source projects.
- I have been involved with Apertium since 2011, and have extensive experience writing and improving morphological transducers. I also have a reasonable amount of experience with bidix, CG, and lrx, and can manage transfer. I will need to develop my skills in running testvoc, as that will be another large focus of the work. As far as the languages go, I'm proficient in Kyrgyz and Kazakh, and can get by in and read Uzbek and Turkish (often with the help of a dictionary). I also work on the linguistics of all of these languages (especially Kazakh and Kyrgyz), and have a "deep" understanding of the way all of these languages work. I have at my disposal dictionaries and grammars for all the languages. I also know potential consultants for most of them available as well, which will be important for getting post-edited texts to get WER numbers on.
- List any non-Summer-of-Code plans you have for the Summer, especially employment, if you are applying for internships, and class-taking. Be specific about schedules and time commitments. we would like to be sure you have at least 30 free hours a week to develop for our project.
- I will essentially be free for the entire period of GCI. My semester (and current work) end on May 2, and I will resume such activities on August 25. I will probably be doing minimal hourly tutoring-type work for a few weeks in May, and may lose a day here and there in June and July for attending a conference, domestic road travel, etc. I also plan to be working on my dissertation project over the summer, but I do not expect it to interfere with GSoC as I will not have any "crunch times" related to it during that period. During previous breaks from study/work, even with other looming deadlines, I have easily spent 30 hours a week on Apertium-related work.