User:Gang Chen/GSoC 2013 Application: "Sliding Window PoS Tagger"

From Apertium
Jump to navigation Jump to search

List your skills and give evidence of your qualifications.

I am currently a 2-nd year postgraduate majoring in Natural Language Processing.

During the recent 2 years, I have been interning in a statistical machine translation team of Youdao Inc..

There, I brought up 2 language pairs (Spanish-Chinese and Russian Chinese) into online services [1]. With that chance, I explored the whole pipeline of a statistical machine translation system, including bilingual data crawling and filtering , word alignment, model extraction, and parameter tunning, etc.

I also used Map-Reduce framework to processe large amounts of text, to improve the Ngram language models, which a significant +0.5 BLEU quality improvement for Chinese-English translation. During that, I watched and analysed a lot of raw text and ngram data, read many papers on data selection and domain adaptation, and conducted various kinds of experiments on text filtering and cleaning. Thanks to that project, it made me form a habit of analysing bad cases and data, instead of blindly doing the black-box parameter tuning without thinking.

Besides, I also took part in a project for speeding up the translation decoder, and gained a significant 2x efficiency improvement at very little quality cost. We watched a lot of bad cases, analyse them linguistically, and finally we made a good balance between translation qualitiy and speed.

During my undergraduate period, I took courses on linguistics, mathematics and computer science, where I got the basic knowledge about languistic analyses and programming. I also took part in the development of some NLP systems, such as the construction of Chinese Concept Dictionary, which is a a WordNet-like Chinese ontology, a lexical similarity computation software written in Java, and especially, an supervised HMM PoS tagger written in C++, which I think may help to the implementation of the Sliding-window PoS tagger.

As to the coding skills, I have 3 years' experience programming in Java and C++, and I am also familiar with basic Python and Shell for dealing with light-weight tasks.

With the knowledge and experiences on natural language processing, I am confident in accomplishing the task well.

This is the first time that I take part in an open source project, and I'm excited about it! I have been using open source toolkits for long, for example, the Moses machine translation toolkit, Srilm language model, OpenNLP toolkit, etc, and they all brought great help to me. I'd be happy to make some contributions to the open source community, and try best to adapt myself to the open source developing environment.