Difference between revisions of "User:Chy/Gsoc 2010 Application/Java port of Apertium"

From Apertium
Jump to navigation Jump to search
Line 36: Line 36:
 
Fix tagger part of java.
 
Fix tagger part of java.
   
Port Inter_chunk and Post_chunk from C plus plus to Java.
+
Port interchunk,postchunk,pretransfer from C++ to Java.
   
 
Formatting.
 
Formatting.
Line 50: Line 50:
 
'''Work plan:'''
 
'''Work plan:'''
   
  +
'''Week 1:''' Read existing documentation regarding apertium,lttoolbox,lttolbox-java, communicating with the mentors via IRC and post question.
'''April''' : Study the existing code. Prepare some documentation about the project.Read existing documentation.
 
  +
 
'''May''': Complete porting C++ code to Java. Test completed Java version Apertium. Make documentation.
+
'''Week 2:''' Study the Apertium viewer parser which implents the parsing .modes files using Java.Doing the parsing using Java instead of a bash script.Prepare some documentation.
  +
  +
'''Week 3:''' Studying Hiden Markov Model, which will help me to understand it's implementation at tagger.cc and hmm.cc files.
  +
  +
'''Week 4:''' Tagger: Debugging the C++ code with a C++ debugger and the Java code with a Java debugger, making sure the java code does the same thing, Fixing error's at tagger.java file.
  +
  +
'''Week 5:''' Understanding each of Apertium's pipeline processes and what the data flowing from process means (deformatter, analyzer, tagger, pretransfer, transfer, interchunk, postchunk, generator, reformatter).
  +
  +
'''Week 6:''' Porting interchunk.cc, postchunk.cc, pretransfer.cc to Java. Prepare documentation.
  +
  +
'''Week 7:''' Formating: De/reformatter for text. Deformatter for HTML.
  +
  +
'''Week 8:''' Regression testing/QA of pretransfer, interchunk, postchunk, making sure it is 100% compatible (testing Java version and C++ version on a range of inputs). Fix bugs,documenting the test.
  +
  +
'''Week 9:''' Finding parts of pipe process that can be done in-process in Java, and doing in-process instead of spawning a subprocess. It will be helpful to get Apertium in a single executable file for mobile device.
  +
  +
'''Week 10:''' Optimization: Find/decide on data structures causing less overhead than standard collection classes and implement usage of them. Regression testing on change. Documenting the optimization issues.
  +
  +
'''Week 11:''' Generating a custom prepared .jar package per mode, so a language pair can be shipped as a JAR file with all included.
   
  +
'''Week 12:''' Optimization: Profile hot spots, remedy CPU intensive tasks. Complete documentation.
'''June''': Start working for porting Apertium into mobile platform.
 
   
  +
From my present point of view about I have prepared the above work plan which may updated with time .
'''July-August''': Finish coding and do extensive test and Document the work.
 
 
 
   
Line 65: Line 83:
 
During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux).
 
During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux).
   
I have worked with several project using Java programming language during my semesters, which makes me confident enough to work with this project.
+
I have worked with several internal project using Java and c programming language, which makes me confident enough to work with this project.
   
 
Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project.
 
Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project.

Revision as of 23:28, 7 April 2010

Moffassal Hossain Proposal for Google summer of code.(Feel free to comment)


Name: Moffassal Hossain

Email address: moffassal@gmail.com

Education: IT and Communication Technology


Other information that may be useful for contact:

Phone - Mobile: +4550140762.

IRC : chy@irc.freenode.com.

Why is it you are interested in machine translation?:

The advancement in Information technology makes the term globalization became reality. Today informations are traveling across the continent. People are communicating all over the world, but we are still enforced to limit our knowledge due to localization issues. We use different language,characters to express our selves in day to day life. To tackle these issues machine translation can be an interface between languages. A fully functional machine translator which can process natural languages, can improve the productivity of the application, specially on software localization and reduce linguistic barrier. This context makes me enthusiastic about building tools and resources for Machine Translation system which is needed in a great extend nowadays. Due to open source, I found Apertium is a great tool to be part of machine translation development.

Why is it that you are interested in the Apertium project?:

First of all Apertium is an open source project, as a student point of view it is a good way to start working with a professional software, I could easily get support from the community. Secondly I have close contact with mentors especially Jacob Nordfalk who is a Teacher of my university for the proposed project, It would be easier for me to communicate and discuss the project during the development. Thirdly I am an experienced Java programmer, and last and foremost I am interested on learning linguistic data processing where Apertium is great application to start with.


Which of the published tasks are you interested in?:

Porting Java for Apertium Runtime: I am intersted in porting Java port for Apertium runtime, since Java is a platform independent language and it makes application portable. Most of the C++ code is already ported but there are still needed to finish porting rest of the files, for example interchunk.cc postchunk.cc, piping, formating and complete the tagger. After finished porting those file, I would like to port Apertium to mobile platform, since nowadays smart phones are getting more and more powerful, and their increased data processing capacity makes it a good candidate platform to introduce Apertium.

What do you plan to do?:

Fix tagger part of java.

Port interchunk,postchunk,pretransfer from C++ to Java.

Formatting.

Piping.

Port Apertium into mobile platform.

Why should Google and Apertium sponsor you? and how and who will it benefit in society?:

Having fully functional Apertium Java version will reduce the platform dependency and easy installation. Besides having Apertium on mobile platform, Apertium will reach to mass population. This project will play an important role for Apertium project.

Work plan:

Week 1: Read existing documentation regarding apertium,lttoolbox,lttolbox-java, communicating with the mentors via IRC and post question.

Week 2: Study the Apertium viewer parser which implents the parsing .modes files using Java.Doing the parsing using Java instead of a bash script.Prepare some documentation.

Week 3: Studying Hiden Markov Model, which will help me to understand it's implementation at tagger.cc and hmm.cc files.

Week 4: Tagger: Debugging the C++ code with a C++ debugger and the Java code with a Java debugger, making sure the java code does the same thing, Fixing error's at tagger.java file.

Week 5: Understanding each of Apertium's pipeline processes and what the data flowing from process means (deformatter, analyzer, tagger, pretransfer, transfer, interchunk, postchunk, generator, reformatter).

Week 6: Porting interchunk.cc, postchunk.cc, pretransfer.cc to Java. Prepare documentation.

Week 7: Formating: De/reformatter for text. Deformatter for HTML.

Week 8: Regression testing/QA of pretransfer, interchunk, postchunk, making sure it is 100% compatible (testing Java version and C++ version on a range of inputs). Fix bugs,documenting the test.

Week 9: Finding parts of pipe process that can be done in-process in Java, and doing in-process instead of spawning a subprocess. It will be helpful to get Apertium in a single executable file for mobile device.

Week 10: Optimization: Find/decide on data structures causing less overhead than standard collection classes and implement usage of them. Regression testing on change. Documenting the optimization issues.

Week 11: Generating a custom prepared .jar package per mode, so a language pair can be shipped as a JAR file with all included.

Week 12: Optimization: Profile hot spots, remedy CPU intensive tasks. Complete documentation.

From my present point of view about I have prepared the above work plan which may updated with time .


List of qualifications and experience:

I had completed a two years Academic Professional degree at IT and Communication Technology from TEC Copenhagen. Currently I am studying at fifth semester in IT and Communication Technology at Copenhagen university college of engineering.

During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux).

I have worked with several internal project using Java and c programming language, which makes me confident enough to work with this project.

Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project.

Any other activity than Summer-of-Code:

Due to credit from my previous education, I am having only two optional courses where I used to have four courses and a big project, having needed less time in my school ensures me to work with this project uninterruptedly.