Difference between revisions of "User:Ilnar.salimzyan/GSoC2014/Application"

From Apertium
Jump to navigation Jump to search
 
(32 intermediate revisions by the same user not shown)
Line 1: Line 1:
You can find my proposal for GSoC 2014 [http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/selimcan/5649050225344512 here].
Remember that this is only a preview :)


[[Category:GSoC_2014_Student_proposals|Ilnar.salimzyan]]
== GSoC application: Apertium-kaz-tat: machine translation between Kazakh and Tatar ==
'''Name:''' Ilnar Salimzyanov

'''E-mail adress:''' ilnar.salimzyan@gmail.com

''Other information that may be useful to contact you:''

'''IRC:''' selimcan '''Sourceforge account:''' selimcan '''Cellphone:''' +79625617985 '''Timezone:''' UTC+04.00

=Why is it you are interested in machine translation?=

=Why is it that you are interested in the Apertium project?=

=Which of the published tasks are you interested in? What do you plan to do?=
'''Task:'''
''Adopting a language pair''

'''Title:'''
''Apertium-kaz-tat — machine translation between Kazakh and Tatar''

==Why should Google and Apertium sponsor it?==

==How and whom it will benefit in society?==

=Work plan=

=Work To do=
==Before the coding period:==

==The coding period:==

==Non-GSoC activities==

==List your skills and give evidence of your qualifications==

I am the first year master’s student at the Kazan Federal University, studying Applied Linguistics <ref>A not-so-clear term, which caused many debates. What we study is a mix of computational linguistics, lexicography and several other courses.</ref>

I got to know about Apertium first time in 2009, while writing a small paper at the university on comparison of available machine translation systems. Apertium fascinated me then being open source, showing rapid growth and being a good potential starting point for Tatar and other Turkic languages (yes, I have thought about them too). I played around with lttoolbox dictionary for Tatar (bad idea, I know, but I didn’t know about "X/S/HFST"s then and there weren’t any other Turkic languages involved). I even managed to model nouns morphotactics using it!

Back in 2009 I translated part of the Official Documentation into Russian <ref> See /apertium/trunk/apertium-documentation/apertium-2.0/ru/apertium_docu.odt</ref> (till chapter 3.2.3; besides someone willing to finish it the translation needs a good editor). Also in 2009 I translated Apertium New language pair Howto into Russian.

I was one of the participants of the Šupaškar Apertium Workshop, held in January this year, where Francis Tyers, Hector Alos-i-Font,
Jonathan Washington and Trond Trosterud were instructors.

I was very fortunate to see Jonathan and Francis work on Tatar-Bashkir pair as an example pair for the Šupaškar Workshop and move it to nursery. It is very useful to have a transducer for my native language (and a language closest to it) to learn the semantics and structure of lexc and twol files (which I wasn’t really familiar with, since using HFST with Apertium is relatively new thing and it is not mentioned in the Official Documentation), along with the reading the famous FSMBook.

I have been involved in work on Tatar-Bashkir pair as, let’s say, “language-consultant” and “tester”<ref>See accepted, but not-yet-published paper here: https://www.softconf.com/lrec2012/TurkicLanguage2012/cgi-bin/scmd.cgi?scmd=getFinal&passcode=18X-P9A6A3D6H8&_lDoc=Paper</ref>. With another fellow from Ufa we have been translating top-5000 wordlist of Russian National Corpus into Tatar and Bashkir. This translations were added then to the translator files. Also, I have been analyzing some errors in the translations finding out, where Apertium-tt-ba performed not so well, describing it on the wiki <ref>The Morphology of Tatar Language</ref> and commiting from time to time to svn.

==References==
<references/>

[[Category:GSoC 2012 Student Proposals]]

Latest revision as of 13:17, 14 May 2014

You can find my proposal for GSoC 2014 here.