User:Chy/Gsoc 2010 Application/Java port of Apertium

From Apertium
< User:Chy
Revision as of 07:49, 20 April 2010 by Chy (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Moffassal Hossain Proposal for Google summer of code.(Feel free to comment)


Name: Moffassal Hossain

Email address: moffassal@gmail.com

Education: IT and Communication Technology


Other information that may be useful for contact:

IRC : chy@irc.freenode.com.

Why is it you are interested in machine translation?:

The advancement in Information technology makes the term globalization became reality. Today informations are traveling across the continent. People are communicating all over the world, but we are still enforced to limit our knowledge due to localization issues. We use different language,characters to express our selves in day to day life. To tackle these issues machine translation can be an interface between languages. A fully functional machine translator which can process natural languages, can improve the productivity of the application, specially on software localization and reduce linguistic barrier. This context makes me enthusiastic about building tools and resources for Machine Translation system which is needed in a great extend nowadays. Due to open source, I found Apertium is a great tool to be part of machine translation development.

Why is it that you are interested in the Apertium project?:

First of all Apertium is an open source project, as a student point of view it is a good way to start working with a professional software, I could easily get support from the community. Secondly I have close contact with mentors especially Jacob Nordfalk who is a Teacher of my university for the proposed project, It would be easier for me to communicate and discuss the project during the development. Thirdly I am an experienced Java programmer, and last and foremost I am interested on learning linguistic data processing where Apertium is great application to start with.


Which of the published tasks are you interested in?:

Porting Java for Apertium Runtime: I am intersted in porting Java port for Apertium runtime, since Java is a platform independent language and it makes application portable. Most of the C++ code is already ported but there are still needed to finish porting rest of the files, for example interchunk.cc postchunk.cc, piping, formating and complete the tagger. After finished porting those file, I would like to port Apertium to mobile platform, since nowadays smart phones are getting more and more powerful, and their increased data processing capacity makes it a good candidate platform to introduce Apertium.

What do you plan to do?:

Fix tagger part of java.

Port interchunk,postchunk,pretransfer from C++ to Java.

Formatting.

Piping.

Port Apertium into mobile platform.

Why should Google and Apertium sponsor you? and how and who will it benefit in society?:

Having fully functional Apertium Java version will reduce the platform dependency and easy installation. Besides having Apertium on mobile platform, Apertium will reach to mass population. This project will play an important role for Apertium project.

Work plan:

Community bonding period:Read existing documentation regarding apertium,lttoolbox,lttolbox-java. Studying Hiden Markov Model, which will help to understand it's implementation at tagger.cc and hmm.cc. Study the Apertium viewer parser which implents the parsing .modes files using Java. Communicating with the mentors via IRC and post question.

Week Tasks
1 Tagger: Debugging the C++ code with a C++ debugger and the Java code with a Java debugger, making sure the java code does the same thing, Fixing error's at tagger.java file.
2 Continuation of the previous week.
3 Parsing .mode files using java instead of a bash script.Write the document.
4 Understanding each of Apertium's pipeline processes and what the data flowing from process means (deformatter, analyzer, tagger, pretransfer, transfer, interchunk, postchunk, generator, reformatter).
5 Porting interchunk.cc, postchunk.cc.
6 Continue prevoius week work. Port the pretransfer.cc to Java.
7 Regression testing/QA of pretransfer, interchunk, postchunk, making sure it is 100% compatible (testing Java version and C++ version on a range of inputs). Fix bugs, documenting the test.
8 Formating: De/reformatter for text. Deformatter for HTML.
9 Finding parts of pipe process that can be done in-process in Java, and doing in-process instead of spawning a subprocess. It will be helpfull to get Apertium in a single executable file for mobile device.
10 Optimization: Find/decide on data structures causing less overhead than standard collection classes and implement usage of them.
11 Generating a custom prepared .jar package per mode, so a language pair can be shipped as a JAR file with all included.
12 Optimization: Profile hot spots, remedy CPU intensive tasks. Complete documentation.


From my present point of view about I have prepared the above work plan which may updated with time .


List of qualifications and experience:

I had completed a two years Academic Professional degree at IT and Communication Technology from TEC Copenhagen. Currently I am studying at fifth semester in IT and Communication Technology at Copenhagen university college of engineering.

During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux).

I have worked with several internal project using Java and c programming language, which makes me confident enough to work with this project.

Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project.

Any other activity than Summer-of-Code:

Due to credit from my previous education, I am having only two optional courses where I used to have four courses and a big project, having needed less time in my school ensures me to work with this project uninterruptedly.