Difference between revisions of "User:Gang Chen/GSoC 2013 Application: "Sliding Window PoS Tagger""

From Apertium
Jump to navigation Jump to search
Line 21: Line 21:
   
 
Besides that, Apertium is an open source project, so ordinary people can join in and try ideas to improve it for more and better machine translation services. And what's also important is that they can be used by people all around the world.
 
Besides that, Apertium is an open source project, so ordinary people can join in and try ideas to improve it for more and better machine translation services. And what's also important is that they can be used by people all around the world.
  +
  +
  +
== Which of the published tasks are you interested in? What do you plan to do? ==
  +
I have a great interest in the project "Sliding-window part-of-speech tagger".
  +
Currently the PoS tagger for Apertium is a HMM tagger, while the project aims to the implementation of a sliding-window PoS tagger, for better quality and efficiency.
  +
  +
My plans are basiclly as follows:
  +
1) read the paper and get a full understanding to the algorithm, and familiarize myself with the Apertium pipeline.
  +
2) implement a supervised training version of the algorithm. Implement a FST without a minimization.
  +
3) implenent an unsupervised training version of the algorithm. Implement a FST with a minimization.
  +
4) integrate FORBID rules into the implementation.
  +
5) optionally, re-study the algorithm to explore further possible improvements mentioned in the paper.

Revision as of 08:25, 26 April 2013

Name

Name: Gang Chen

Contact Information

Email: pkuchengang@gmail.com

IRC: Gang

GitHub Repo: https://github.com/elephantgcc

Why is it you are interested in machine translation?

I am majored in Natural Language Processing, and after taking a course on Machine Translation, I became intrested in it. It sits in the center of many NLP techniques, and has a very promising application in real life.

2 years ago, I started to intern in a statistical machine translation team of an Internet company. Since then, I began to be in deeper touch with the classic IBM models, the n-gram language models, various decoding algorithms, and many excellent open source toolkits. I was greatly attracted by the ideas and applications of machine translation. Reading the papers and making the toolkits to work properly brought me so much fun, that I still remember the sleepless night thinking about the nerual network language model.

A language is such a complicated system, that translating a sentence from one language to another has always been a challenging task. On the other hand, with the translation need drastically growing, developing better machine translation systems can also benifit the society.


Why is it that you are interested in the Apertium project?

I came across the Apertium project a year ago, when I knew about last year's GSoC. The rule-based methodology of it attracted me most. The language system is organized in a structural way in its nature, so studying the underlying rules of it should be the most direct effort to unveil the mysteries of the system. Linguistic knoledge plays a central role in language processing problems, whether it's by emprical statistics or rational analysis. So I think it is important to play with the "colorless green ideas" within a language.

Besides that, Apertium is an open source project, so ordinary people can join in and try ideas to improve it for more and better machine translation services. And what's also important is that they can be used by people all around the world.


Which of the published tasks are you interested in? What do you plan to do?

I have a great interest in the project "Sliding-window part-of-speech tagger". Currently the PoS tagger for Apertium is a HMM tagger, while the project aims to the implementation of a sliding-window PoS tagger, for better quality and efficiency.

My plans are basiclly as follows: 1) read the paper and get a full understanding to the algorithm, and familiarize myself with the Apertium pipeline. 2) implement a supervised training version of the algorithm. Implement a FST without a minimization. 3) implenent an unsupervised training version of the algorithm. Implement a FST with a minimization. 4) integrate FORBID rules into the implementation. 5) optionally, re-study the algorithm to explore further possible improvements mentioned in the paper.