Difference between revisions of "User:Ilnar.salimzyan/GSoC2014/Application"

From Apertium
Jump to navigation Jump to search
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
You can find my proposal for GSoC 2014 [http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/selimcan/5649050225344512 here].
Remember that this is only a preview :)
 
   
  +
[[Category:GSoC_2014_Student_proposals|Ilnar.salimzyan]]
== GSoC application: Apertium-kaz-tat: machine translation between Kazakh and Tatar ==
 
'''Name:''' Ilnar Salimzyanov
 
 
'''E-mail adress:''' ilnar.salimzyan@gmail.com
 
 
''Other information that may be useful to contact you:''
 
 
'''IRC:''' selimcan '''Sourceforge account:''' selimcan '''Cellphone:''' +79625617985 '''Timezone:''' UTC+04.00
 
 
=Why is it you are interested in machine translation?=
 
 
=Why is it that you are interested in the Apertium project?=
 
 
=Which of the published tasks are you interested in? What do you plan to do?=
 
'''Task:'''
 
''Adopting a language pair''
 
 
'''Title:'''
 
''Apertium-kaz-tat — machine translation between Kazakh and Tatar''
 
 
==Why should Google and Apertium sponsor it?==
 
 
==How and whom it will benefit in society?==
 
 
=Work plan=
 
 
=Work To do=
 
==Before the coding period:==
 
 
==The coding period:==
 
 
==Non-GSoC activities==
 
 
==List your skills and give evidence of your qualifications==
 
 
I am the first year master’s student at the Kazan Federal University, studying Applied Linguistics <ref>A not-so-clear term, which caused many debates. What we study is a mix of computational linguistics, lexicography and several other courses.</ref>
 
 
I got to know about Apertium first time in 2009, while writing a small paper at the university on comparison of available machine translation systems. Apertium fascinated me then being open source, showing rapid growth and being a good potential starting point for Tatar and other Turkic languages (yes, I have thought about them too). I played around with lttoolbox dictionary for Tatar (bad idea, I know, but I didn’t know about "X/S/HFST"s then and there weren’t any other Turkic languages involved). I even managed to model nouns morphotactics using it!
 
 
Back in 2009 I translated part of the Official Documentation into Russian <ref> See /apertium/trunk/apertium-documentation/apertium-2.0/ru/apertium_docu.odt</ref> (till chapter 3.2.3; besides someone willing to finish it the translation needs a good editor). Also in 2009 I translated Apertium New language pair Howto into Russian.
 
 
I was one of the participants of the Šupaškar Apertium Workshop, held in January this year, where Francis Tyers, Hector Alos-i-Font,
 
Jonathan Washington and Trond Trosterud were instructors.
 
 
I was very fortunate to see Jonathan and Francis work on Tatar-Bashkir pair as an example pair for the Šupaškar Workshop and move it to nursery. It is very useful to have a transducer for my native language (and a language closest to it) to learn the semantics and structure of lexc and twol files (which I wasn’t really familiar with, since using HFST with Apertium is relatively new thing and it is not mentioned in the Official Documentation), along with the reading the famous FSMBook.
 
 
I have been involved in work on Tatar-Bashkir pair as, let’s say, “language-consultant” and “tester”<ref>See accepted, but not-yet-published paper here: https://www.softconf.com/lrec2012/TurkicLanguage2012/cgi-bin/scmd.cgi?scmd=getFinal&passcode=18X-P9A6A3D6H8&_lDoc=Paper</ref>. With another fellow from Ufa we have been translating top-5000 wordlist of Russian National Corpus into Tatar and Bashkir. This translations were added then to the translator files. Also, I have been analyzing some errors in the translations finding out, where Apertium-tt-ba performed not so well, describing it on the wiki <ref>The Morphology of Tatar Language</ref> and commiting from time to time to svn.
 
 
==References==
 
<references/>
 
 
[[Category:GSoC 2012 Student Proposals]]
 

Latest revision as of 13:17, 14 May 2014

You can find my proposal for GSoC 2014 here.