User:Davidho/Application

From Apertium
Jump to navigation Jump to search

Contact information

Name: Junhao He

Email: davidho7066@gmail.com

IRC: Davidho

Why is it you are interested in machine translation?

I am a Chinese and have learned English for more than 10 years and Spanish for 2 years. However, when I encounter some sentences or phrases in English or Spanish that I cannot comprehend, none of translation systems that can translate Chinese into other languages satisfies me. The longer I learn foreign languages, the more I understand the differences between Chinese and them. I always want to create something which can handle Chinese translation appropriately, but it was not until the course about compiler last year that I knew how a translator worked. And it was the time that I got being interested in machine translation.


Why is it that you are interested in the Apertium project?

The first time I came across Apertium was when I was reading the accepted projects list of GSoC 2013. And it was the Chinese-Spanish Apertium System that attracted me. Before knowing Apertium, I had no idea how to start Then I started to read documentations about Apertium and joined the IRC channel #apertium. After doing some research on Apertium, I found three characteristic of the system that impressed me. The first and the most important one is that Apertium is an open-source machine translation engine and has been expanded to treat more divergent language pairs. It is well-designed and allows everyone to contribute to it. This ensures its continuous growth and convinces me of its great prospect. Second, the linguistic data files are encoded in XML-based formats. XML files are easy to understand, which enables those who have little linguistic knowledge to expand the dictionaries. This is helpful to improve the quality of existing pairs and to adopt new pairs.


Which of the published tasks are you interested in?

Prototype recursive transfer implementation


What do you plan to do?

Reasons why Google and Apertium should sponsor it

Apertium was designed to translate between closely related languages. And this translation does not involve much constituent reordering. However, with the development of the system, it is inevitable but significantly beneficial to expand to treat more divergent language pairs, of which reordering would be a key concern. This project aims to develop a prototype of a new module that can handle long-distance reordering. It will be a long stride for the whole system if it succeed. That is why it should be sponsored.


A description of how and who it will benefit in society

Languages are critical tool for people to communicate

List your skills and give evidence of your qualifications

I am a 3rd-year undergraduate majoring in Software Engineering in South China University of Technology. I am skillful to code with C/C++, C# and Java because I have done some projects using these three programming languages. I am also able to use python to carry out some small tasks. I had courses of Principles of Compilers and Formal Languages last year. It was them that made me interested in Natural Language Processing. And I believe that knowledge of parsers, syntax analyzers, finite automatas and finite state transducers will help me to understand the Apertium system deeper. I can speak three languages. They are Chinese(mother tongue), English(fluent) and Spanish(refreshing) respectively. These three language comes from three different language systems. And I am sure knowing the differences among them is of great help to propose a new formalism of transfer rules. I am working on implementing a part of functions of a columnar database. It involves some techniques of parallel programming like OpenMP, MPI and pthread. It is a huge project and I have to work with some other people through the Internet. So I am quite confident that I am capable of finishing the programming work from distance.