Difference between revisions of "User:Oresta/GSoC Proposal"

From Apertium
Jump to navigation Jump to search
Line 12: Line 12:


Being a student of Computer Science, I always tried to get NLP-related tasks for course projects. At the beginning it was subconsciously, but during last four years I am strongly interested in computational linguistics.<br />
Being a student of Computer Science, I always tried to get NLP-related tasks for course projects. At the beginning it was subconsciously, but during last four years I am strongly interested in computational linguistics.<br />
Machine translation is a part of NLP, but it is not only reason of my interest. I am using machine translation often for getting to know meaning of unknown words, for example during translation part of the book Adam Przepiórkowski “The IPI PAN Corpus: Preliminary version” (http://nlp.ipipan.waw.pl/~adamp/Papers/2004-corpus/) or site of European Summer Scholl Culture & Technology (http://www.culingtec.uni-leipzig.de/ESU/, coming soon). So my life experience shows that MT is really useful.
Machine translation is a part of NLP, but it is not only reason of my interest. I am using machine translation often for getting to know meaning of unknown words, for example during translation part of the book Adam Przepiórkowski “The IPI PAN Corpus: Preliminary version” (http://nlp.ipipan.waw.pl/~adamp/Papers/2004-corpus/) or site of European Summer Scholl Culture & Technology (http://www.culingtec.uni-leipzig.de/ESU/, Ukrainian coming soon). So my life experience shows that MT is really useful.


== Why is it that you are interested in the Apertium project? ==
== Why is it that you are interested in the Apertium project? ==

Revision as of 22:33, 8 April 2010

Name

Oresta Tymchyshyn

Contact information

e-mail: oresta.tymchyshyn@gmail.com
IRC: Oresta in #apertium at irc.freenode.net
Cell phone: +38 067 37 97 836

Why is it you are interested in machine translation?

Being a student of Computer Science, I always tried to get NLP-related tasks for course projects. At the beginning it was subconsciously, but during last four years I am strongly interested in computational linguistics.
Machine translation is a part of NLP, but it is not only reason of my interest. I am using machine translation often for getting to know meaning of unknown words, for example during translation part of the book Adam Przepiórkowski “The IPI PAN Corpus: Preliminary version” (http://nlp.ipipan.waw.pl/~adamp/Papers/2004-corpus/) or site of European Summer Scholl Culture & Technology (http://www.culingtec.uni-leipzig.de/ESU/, Ukrainian coming soon). So my life experience shows that MT is really useful.

Why is it that you are interested in the Apertium project?

Why should Google and Apertium sponsor it ?

Polish and Ukrainian are closely related, since both belonging to the group of Slavic languages. Ukrainian is my native language. Polish is fluent for me, because I spend one year in Poland, where I have studied at the University of Warsaw. Also I completed a course of Polish for foreign students. So I am really able to do this language pair.
For Apertium one of main priorities is new language pairs. Slavic languages are not strongly represented in Apertium yet. So new language pair which consists from two Slavic languages should be accepted by Appertium community.
Ukrainian is low resourced language, so it is great chance to support it.

How and who will it benefit in society ?

Ukraine and Poland are neighboring countries. These countries have close cultural and economic relations. In Western Ukraine most Ukrainians, especially older, understands Polish, but in Central and Eastern parts of Ukraine people do not know Polish at all.
Each language have more than 40 mln native speakers. Polish is a West Slavic language and the official language of Poland. Its written standard based on a Latin alphabet with a few additions. Ukrainian is a East Slavic language and the official language of Ukraine. Ukrainian is written using a modified version of the Cyrillic alphabet. These languages are highly inflective.
There are very few computer applications translated between Polish and Ukrainian.
Oldest and well known is Pragma – rule-based translated software developed in 2000 by Trident Software (Kyiv, Ukraine). Pragma expands functionality of popular office and Internet applications by adding translation function to them. Pragma is a closed source, Windows only application. Pragma software is used by government institutions in Ukraine, large companies and small business.
Google Translate - worldwide known multilingual free online translation service to translate a section of text, or a whole webpage. Unlike other translation services which use SYSTRAN rule-based MT technology, Google uses its own translation software based on statistical approach. Polish and Ukraine launched in May and September 2008 respectively.
But still no open source solutions exist :(.
Polish-Ukrainian machine translation is very actual in a context of the European Football Championship which will take place in Ukraine and Poland in summer 2012. For providing language needs of the championship Web-resources are created, for example http://www.eurolang2012.com/. EUROLANG is mostly language textbook, it seems sites like that could be potential users of Polish-Ukrainian MT.


Which of the published tasks are you interested in? What do you plan to do?

I am going to work on a project Apertium-pl-uk: Machine translation between Polish and Ukrainian.

Work plan

Community Bonding Period

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

List your skills and give evidence of your qualifications

List any non-Summer-of-Code plans you have for the Summer