First name: Filip Last name: Petkovski email: firstname.lastname@example.org fpetkovski on IRC: #apertium
- 1 Why are you interested in machine translation ?
- 2 Why is it that you are interested in the Apertium project?
- 3 Which of the published tasks are you interested in? What do you plan to do?
- 4 Work already done
- 5 Work to do
- 6 Skills, qualifications and field of study
- 7 Non-GSoC activities
Why are you interested in machine translation ?
Machine translation can be thought of as one of the greatest challenges in natural language processing. It is the single most useful application of NLP and building a good MT system requires a blend of numerous different techniques from both computer science and linguistics.
Why is it that you are interested in the Apertium project?
Apertium is a great project. It is obvious that a ton of work has been put into both developing the platform and creating the language resources. However, there is always more work to be done and being a part of this project is a perfect opportunity to make a big contribution to society.
Which of the published tasks are you interested in? What do you plan to do?
I'm interested in building a Corpus-based lexicalised feature transfer module which will set tags based on a corpus-generated model.
Work already done
started the apertium-sh-en language pair in incubator.
created a stream-processor for the output of apertium-transfer that reads character by character (branches/gsoc2012/fpetkovski/stream-rocessor).
created a stream processor for the output of apertium-transfer that removes stop words specified in a dictionary (branches/gsoc2012/fpetkovski/stopwords-filter).
Work to do
Prior to May 21
Finish the stream-processor so it will parse lemmas and tokens in a data structure. This will be useful later for extracting features from lexical units in the stream.
Set baselines for definiteness, pronoun insertion, correct preposition, aspect etc.
Create a simple n-gram model and see how it performs.
Skills, qualifications and field of study
I am a Graduate student of Computer Science, holding a Bachelor's degree in Computing. I have an excellent knowledge of Java and C#, and I'm fairly comfortable with C/C++ and scripting languages.
Machine learning is one of my strongest skills. I have worked on quite a few ML projects involving named entity relation extraction, news articles classification, image based gender classification and real time vehicle detection. I have experience with building and optimizing a model, feature selection and feature extraction for classification.
I did my bachelor thesis in the field of computer vision, and my master thesis is in the field of natural language processing.
I have final exams at the beginning of June, but I will be able to work more than 30 hours / week.