Difference between revisions of "User:RomanZegarski"

From Apertium
Jump to navigation Jump to search
(Created page with 'Apertium Summer of Code application: Dictionary induction from wikis == Name == Roman Zegarski == Contact == Email: Roman.Zegarski@gmail.com Skype: roman.zegarski IRC: Roma…')
 
(Replaced content with 'My name is Roman Zegarski and I am final year student on the Gdańsk University of Technology in Poland (Informatics). Here is my pr…')
Line 1: Line 1:
  +
My name is Roman Zegarski and I am final year student on the Gdańsk University of Technology in Poland (Informatics).
Apertium Summer of Code application:
 
Dictionary induction from wikis
 
   
  +
[[User:RomanZegarski/GSoC2011_proposal|Here]] is my proposal for "Dictionary inductions form wiki".
 
 
 
== Name ==
 
Roman Zegarski
 
 
== Contact ==
 
Email: Roman.Zegarski@gmail.com
 
Skype: roman.zegarski
 
IRC: RomanZegarski (irc.freenode.net)
 
Phone number: +48 692827146
 
 
 
 
 
== Why is it you are interested in machine translation? ==
 
 
 
Working on such essential part of communication as language is very interesting for me. Finding similarities between languages and creating rules making possible to translate from one language to another is intriguing process.
 
Also I am fascinated by possibilities of sharing knowledge and machine translation is for me kind of bridge that allows to pass information regardless of language in which was created and language known by person retrieving it. Even if translation isn't perfect it gives access to the knowledge with in other ways could be really hard to retrieve.
 
 
 
== Why is it that you are interested in the Apertium project? ==
 
 
 
It's important for me that Apertium is allows to translate less popular languages. There is still not enough resources for them and it is good to know that someone is taking care of them. Moreover it is an open-source project allows people to easy contribute and share they knowledge and interest with others. Also, I am impressed by the community. It's very dynamic and I feel like I always can count on fast response.
 
 
 
== Which of the published tasks are you interested in? ==
 
 
 
I am interested in project: “Dictionary inductions form wiki”.
 
 
 
== What do you plan to do? ==
 
 
 
The idea is to generate new dictionaries with data obtained from DBPedia and OmegaWiki. To achieve this I will use and (if it will be possible) improve the existing OmegaWiki data retriever and amend DBPedia extraction framework to be able to retrieve more data from Wiktionary. Then with this data source I would like to create a dixtools module able to retrieve data and create dictionaries for Apertium.
 
 
== Why Google and Apertium should sponsor it? ==
 
 
New source of data will bring to Apertium project possibilities to constantly improve dictionaries and make it easier to create new ones.
 
New linguistic data would be published as Linked Data, so they would be accessible to bigger publicity.
 
Also, I will be able to compare data gathered by OmegaWiki, with data harvested from Wiktionary using DBPedia.
 
 
 
 
 
== Work plan ==
 
 
 
Community Bonding Period:
 
get more familiar with Apertium community
 
retrieve more information about DBPedia
 
DBPedia mappings
 
DBPedia ontology
 
get to know GOLD ontology
 
get to know the Scala language
 
read documentation related to the project
 
 
=== Week 1 - 3 ===
 
Improving DBPedia extraction framework
 
creating code in Scala, which could handle more languages
 
creating basic templates to English Wiktionary
 
Week 4:
 
expansion of templates for en.wiktionary
 
create templates for pl.wiktionary
 
 
'''Deliverable #1''' ← improved DBPedia extraction framework
 
 
=== Week 5 - 7 ===
 
create module for dixtools retrieving data from DBPedia
 
=== Week 8 ===
 
create dictionaries in Apertium format
 
 
'''Deliverable #2''' ← Aperitum-dixtools module creating dictionaries from data extracted from DBPedia
 
 
=== Week 9 ===
 
improving existing OmegaWiki data retriever implemented in apertium-dixtools
 
retrieve dictionaries data from OmegaWiki using dixtools
 
=== Week 10 - 11 ===
 
find if some data from OmegaWiki and DBPedia are complementary
 
merge complementary data retrieved from OmegaWiki and DBPedia
 
=== Week 12 ===
 
final amendments
 
creation of documentation for the project
 
 
Project completed ← dictionaries created, new features in dixtools, improved DBPedia extraction framework
 
 
 
== Skills and qualifications ==
 
 
I am final year student on the Gdańsk University of Technology in Poland (Informatics, specialization - Distributed Applications and Internet Services).I have spent some time with topics related to computational linguistics. In
 
the past year I worked on student project which target was to build virtual student assistant (precisely chatter-bot, generating base of knowledge from university moodle server. It still need some work, but most application functionality is working fine). Currently I am working on development and implementation of word sense disambiguation
 
algorithm using WordNet.
 
About my experience: I have done some part time work in C++ and C# on commercial projects, and I am experienced in Java from university (both projects mentioned earlier are written in Java).
 
 
 
== Summer plans ==
 
In the summer time I could spend 30 hours or more on developing project. I spent the last few months sharing my time between my student responsibilities and work, so if I would participate in Apertium project it won't be a problem for me to spend required time on coding.
 
At this semester I won't have any exams in the Summer of Code time and I plan to stay in Gdańsk in the summer, so I would be available all the time.
 

Revision as of 07:30, 7 April 2011

My name is Roman Zegarski and I am final year student on the Gdańsk University of Technology in Poland (Informatics).

Here is my proposal for "Dictionary inductions form wiki".