Difference between revisions of "User:RomanZegarski"

From Apertium
Jump to navigation Jump to search
(Created page with 'Apertium Summer of Code application: Dictionary induction from wikis == Name == Roman Zegarski == Contact == Email: Roman.Zegarski@gmail.com Skype: roman.zegarski IRC: Roma…')
 
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
My name is Roman Zegarski and I am final year student on the Gdańsk University of Technology in Poland (Informatics).
Apertium Summer of Code application:
Dictionary induction from wikis


[[User:RomanZegarski/GSoC2011_proposal|Here]] is my proposal for "Dictionary inductions form wiki".




== Weekly reports ==


== Name ==
=== Week 1 ===
- read documentation related to RDF and OWL
Roman Zegarski


- get used to Scala language (simple coding + read documentation)
== Contact ==
Email: Roman.Zegarski@gmail.com
Skype: roman.zegarski
IRC: RomanZegarski (irc.freenode.net)
Phone number: +48 692827146


- configured environment to work with DBPedia extraction framework


- created fork of DBPedia extraction framework on bitbucket


- get in contact with Jonas an Sebastian from DBPedia (briefly but always)


== Project time line (Google calendar) ==
== Why is it you are interested in machine translation? ==
Google calendar with project timeline is avalible [https://www.google.com/calendar/embed?src=s5guk16bebj6uj7tcjivhp2li4%40group.calendar.google.com&ctz=Europe/Warsaw here].


== Fork of DBPedia extraction framework on bitbucket ==


[https://bitbucket.org/RomanZegarski/dbpedia_extraction_framework_gsoc Fork] (so far no changes in code)
Working on such essential part of communication as language is very interesting for me. Finding similarities between languages and creating rules making possible to translate from one language to another is intriguing process.
Also I am fascinated by possibilities of sharing knowledge and machine translation is for me kind of bridge that allows to pass information regardless of language in which was created and language known by person retrieving it. Even if translation isn't perfect it gives access to the knowledge with in other ways could be really hard to retrieve.


== Why is it that you are interested in the Apertium project? ==

It's important for me that Apertium is allows to translate less popular languages. There is still not enough resources for them and it is good to know that someone is taking care of them. Moreover it is an open-source project allows people to easy contribute and share they knowledge and interest with others. Also, I am impressed by the community. It's very dynamic and I feel like I always can count on fast response.


== Which of the published tasks are you interested in? ==


I am interested in project: “Dictionary inductions form wiki”.


== What do you plan to do? ==

The idea is to generate new dictionaries with data obtained from DBPedia and OmegaWiki. To achieve this I will use and (if it will be possible) improve the existing OmegaWiki data retriever and amend DBPedia extraction framework to be able to retrieve more data from Wiktionary. Then with this data source I would like to create a dixtools module able to retrieve data and create dictionaries for Apertium.

== Why Google and Apertium should sponsor it? ==

New source of data will bring to Apertium project possibilities to constantly improve dictionaries and make it easier to create new ones.
New linguistic data would be published as Linked Data, so they would be accessible to bigger publicity.
Also, I will be able to compare data gathered by OmegaWiki, with data harvested from Wiktionary using DBPedia.




== Work plan ==


Community Bonding Period:
get more familiar with Apertium community
retrieve more information about DBPedia
DBPedia mappings
DBPedia ontology
get to know GOLD ontology
get to know the Scala language
read documentation related to the project

=== Week 1 - 3 ===
Improving DBPedia extraction framework
creating code in Scala, which could handle more languages
creating basic templates to English Wiktionary
Week 4:
expansion of templates for en.wiktionary
create templates for pl.wiktionary

'''Deliverable #1''' ← improved DBPedia extraction framework

=== Week 5 - 7 ===
create module for dixtools retrieving data from DBPedia
=== Week 8 ===
create dictionaries in Apertium format

'''Deliverable #2''' ← Aperitum-dixtools module creating dictionaries from data extracted from DBPedia

=== Week 9 ===
improving existing OmegaWiki data retriever implemented in apertium-dixtools
retrieve dictionaries data from OmegaWiki using dixtools
=== Week 10 - 11 ===
find if some data from OmegaWiki and DBPedia are complementary
merge complementary data retrieved from OmegaWiki and DBPedia
=== Week 12 ===
final amendments
creation of documentation for the project

Project completed ← dictionaries created, new features in dixtools, improved DBPedia extraction framework


== Skills and qualifications ==

I am final year student on the Gdańsk University of Technology in Poland (Informatics, specialization - Distributed Applications and Internet Services).I have spent some time with topics related to computational linguistics. In
the past year I worked on student project which target was to build virtual student assistant (precisely chatter-bot, generating base of knowledge from university moodle server. It still need some work, but most application functionality is working fine). Currently I am working on development and implementation of word sense disambiguation
algorithm using WordNet.
About my experience: I have done some part time work in C++ and C# on commercial projects, and I am experienced in Java from university (both projects mentioned earlier are written in Java).


== Summer plans ==
In the summer time I could spend 30 hours or more on developing project. I spent the last few months sharing my time between my student responsibilities and work, so if I would participate in Apertium project it won't be a problem for me to spend required time on coding.
At this semester I won't have any exams in the Summer of Code time and I plan to stay in Gdańsk in the summer, so I would be available all the time.

Latest revision as of 21:38, 2 May 2011

My name is Roman Zegarski and I am final year student on the Gdańsk University of Technology in Poland (Informatics).

Here is my proposal for "Dictionary inductions form wiki".


Weekly reports[edit]

Week 1[edit]

- read documentation related to RDF and OWL

- get used to Scala language (simple coding + read documentation)

- configured environment to work with DBPedia extraction framework

- created fork of DBPedia extraction framework on bitbucket

- get in contact with Jonas an Sebastian from DBPedia (briefly but always)

Project time line (Google calendar)[edit]

Google calendar with project timeline is avalible here.

Fork of DBPedia extraction framework on bitbucket[edit]

Fork (so far no changes in code)