Difference between revisions of "User:Chy/Gsoc 2010 Application/Java port of Apertium"
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''Moffassal Hossain Proposal for Google summer of code.''' |
'''Moffassal Hossain Proposal for Google summer of code.'''(Feel free to comment) |
||
Line 11: | Line 11: | ||
'''Other information that may be useful for contact:''' |
'''Other information that may be useful for contact:''' |
||
Phone - Mobile: +4550140762. |
|||
IRC : chy@irc.freenode.com. |
IRC : chy@irc.freenode.com. |
||
Line 20: | Line 18: | ||
The advancement in Information technology makes the term globalization became reality. Today informations are traveling across the continent. People are communicating all over the world, but we are still enforced to limit our knowledge due to localization issues. We use different language,characters to express our selves in day to day life. To tackle these issues machine translation can be an interface between languages. |
The advancement in Information technology makes the term globalization became reality. Today informations are traveling across the continent. People are communicating all over the world, but we are still enforced to limit our knowledge due to localization issues. We use different language,characters to express our selves in day to day life. To tackle these issues machine translation can be an interface between languages. |
||
A fully functional machine translator which can process natural languages, can improve the productivity of the application, specially on software localization and reduce linguistic barrier. |
A fully functional machine translator which can process natural languages, can improve the productivity of the application, specially on software localization and reduce linguistic barrier. |
||
This context makes me enthusiastic about building tools and resources for Machine Translation system which is needed in a great extend |
This context makes me enthusiastic about building tools and resources for Machine Translation system which is needed in a great extend nowadays. Due to open source, I found Apertium is a great tool to be part of machine translation development. |
||
''' |
''' |
||
Why is it that you are interested in the Apertium project?:''' |
Why is it that you are interested in the Apertium project?:''' |
||
First of all Apertium is an open source project, as a student point of view it is a good way to start working with a professional software, I could easily get support from the community. Secondly I have close contact with mentors especially Jacob Nordfalk who is a Teacher of my university for the proposed project, It would be easier for me to communicate and discuss the project during the development. Thirdly I am |
First of all Apertium is an open source project, as a student point of view it is a good way to start working with a professional software, I could easily get support from the community. Secondly I have close contact with mentors especially Jacob Nordfalk who is a Teacher of my university for the proposed project, It would be easier for me to communicate and discuss the project during the development. Thirdly I am an experienced Java programmer, and last and foremost I am interested on learning linguistic data processing where Apertium is great application to start with. |
||
'''Which of the published tasks are you interested in?:''' |
'''Which of the published tasks are you interested in?:''' |
||
Porting Java for Apertium Runtime: I am intersted in porting Java port for Apertium runtime, since Java is |
Porting Java for Apertium Runtime: I am intersted in porting Java port for Apertium runtime, since Java is a platform independent language and it makes application portable. Most of the C++ code is already ported but there are still needed to finish porting rest of the files, for example interchunk.cc postchunk.cc, piping, formating and complete the tagger. After finished porting those file, I would like to port Apertium to mobile platform, since nowadays smart phones are getting more and more powerful, and their increased data processing capacity makes it a good candidate platform to introduce Apertium. |
||
'''What do you plan to do?:''' |
'''What do you plan to do?:''' |
||
Line 36: | Line 34: | ||
Fix tagger part of java. |
Fix tagger part of java. |
||
Port |
Port interchunk,postchunk,pretransfer from C++ to Java. |
||
Formatting. |
Formatting. |
||
Line 46: | Line 44: | ||
'''Why should Google and Apertium sponsor you? and how and who will it benefit in society?:''' |
'''Why should Google and Apertium sponsor you? and how and who will it benefit in society?:''' |
||
Having fully functional Apertium Java version will reduce the platform dependency and easy installation. Besides having Apertium on mobile platform, Apertium will reach to mass population. This project will play an important |
Having fully functional Apertium Java version will reduce the platform dependency and easy installation. Besides having Apertium on mobile platform, Apertium will reach to mass population. This project will play an important role for Apertium project. |
||
'''Work plan:''' |
'''Work plan:''' |
||
'''Community bonding period:'''Read existing documentation regarding apertium,lttoolbox,lttolbox-java. Studying Hiden Markov Model, which will help to understand it's implementation at tagger.cc and hmm.cc. Study the Apertium viewer parser which implents the parsing .modes files using Java. Communicating with the mentors via IRC and post question. |
|||
'''April''' : Study the existing code. Prepare some documentation about the project.Read existing documentation. |
|||
{| class="wikitable" border="1" |
|||
'''May''': Complete the porting C plus plus code to Java. Test completed Java version Apertium. Make documentation. |
|||
|- |
|||
! Week |
|||
! Tasks |
|||
|- |
|||
| 1 |
|||
| Tagger: Debugging the C++ code with a C++ debugger and the Java code with a Java debugger, making sure the java code does the same thing, Fixing error's at tagger.java file. |
|||
|- |
|||
| 2 |
|||
| Continuation of the previous week. |
|||
|- |
|||
| 3 |
|||
| Parsing .mode files using java instead of a bash script.Write the document. |
|||
|- |
|||
| 4 |
|||
| Understanding each of Apertium's pipeline processes and what the data flowing from process means (deformatter, analyzer, tagger, pretransfer, transfer, interchunk, postchunk, generator, reformatter). |
|||
|- |
|||
| 5 |
|||
| Porting interchunk.cc, postchunk.cc. |
|||
|- |
|||
| 6 |
|||
| Continue prevoius week work. Port the pretransfer.cc to Java. |
|||
|- |
|||
| 7 |
|||
| Regression testing/QA of pretransfer, interchunk, postchunk, making sure it is 100% compatible (testing Java version and C++ version on a range of inputs). Fix bugs, documenting the test. |
|||
|- |
|||
| 8 |
|||
| Formating: De/reformatter for text. Deformatter for HTML. |
|||
|- |
|||
| 9 |
|||
| Finding parts of pipe process that can be done in-process in Java, and doing in-process instead of spawning a subprocess. It will be helpfull to get Apertium in a single executable file for mobile device. |
|||
|- |
|||
| 10 |
|||
| Optimization: Find/decide on data structures causing less overhead than standard collection classes and implement usage of them. |
|||
|- |
|||
| 11 |
|||
| Generating a custom prepared .jar package per mode, so a language pair can be shipped as a JAR file with all included. |
|||
|- |
|||
| 12 |
|||
| Optimization: Profile hot spots, remedy CPU intensive tasks. Complete documentation. |
|||
|} |
|||
'''June''': Start working for porting Apertium into mobile platform. |
|||
From my present point of view about I have prepared the above work plan which may updated with time . |
|||
'''July-August''': Finish coding and do extensive test and Document the work. |
|||
Line 65: | Line 102: | ||
During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux). |
During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux). |
||
I have worked with several project using Java |
I have worked with several internal project using Java and c programming language, which makes me confident enough to work with this project. |
||
Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project. |
Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project. |
Latest revision as of 07:49, 20 April 2010
Moffassal Hossain Proposal for Google summer of code.(Feel free to comment)
Name: Moffassal Hossain
Email address: moffassal@gmail.com
Education: IT and Communication Technology
Other information that may be useful for contact:
IRC : chy@irc.freenode.com.
Why is it you are interested in machine translation?:
The advancement in Information technology makes the term globalization became reality. Today informations are traveling across the continent. People are communicating all over the world, but we are still enforced to limit our knowledge due to localization issues. We use different language,characters to express our selves in day to day life. To tackle these issues machine translation can be an interface between languages. A fully functional machine translator which can process natural languages, can improve the productivity of the application, specially on software localization and reduce linguistic barrier. This context makes me enthusiastic about building tools and resources for Machine Translation system which is needed in a great extend nowadays. Due to open source, I found Apertium is a great tool to be part of machine translation development.
Why is it that you are interested in the Apertium project?:
First of all Apertium is an open source project, as a student point of view it is a good way to start working with a professional software, I could easily get support from the community. Secondly I have close contact with mentors especially Jacob Nordfalk who is a Teacher of my university for the proposed project, It would be easier for me to communicate and discuss the project during the development. Thirdly I am an experienced Java programmer, and last and foremost I am interested on learning linguistic data processing where Apertium is great application to start with.
Which of the published tasks are you interested in?:
Porting Java for Apertium Runtime: I am intersted in porting Java port for Apertium runtime, since Java is a platform independent language and it makes application portable. Most of the C++ code is already ported but there are still needed to finish porting rest of the files, for example interchunk.cc postchunk.cc, piping, formating and complete the tagger. After finished porting those file, I would like to port Apertium to mobile platform, since nowadays smart phones are getting more and more powerful, and their increased data processing capacity makes it a good candidate platform to introduce Apertium.
What do you plan to do?:
Fix tagger part of java.
Port interchunk,postchunk,pretransfer from C++ to Java.
Formatting.
Piping.
Port Apertium into mobile platform.
Why should Google and Apertium sponsor you? and how and who will it benefit in society?:
Having fully functional Apertium Java version will reduce the platform dependency and easy installation. Besides having Apertium on mobile platform, Apertium will reach to mass population. This project will play an important role for Apertium project.
Work plan:
Community bonding period:Read existing documentation regarding apertium,lttoolbox,lttolbox-java. Studying Hiden Markov Model, which will help to understand it's implementation at tagger.cc and hmm.cc. Study the Apertium viewer parser which implents the parsing .modes files using Java. Communicating with the mentors via IRC and post question.
Week | Tasks |
---|---|
1 | Tagger: Debugging the C++ code with a C++ debugger and the Java code with a Java debugger, making sure the java code does the same thing, Fixing error's at tagger.java file. |
2 | Continuation of the previous week. |
3 | Parsing .mode files using java instead of a bash script.Write the document. |
4 | Understanding each of Apertium's pipeline processes and what the data flowing from process means (deformatter, analyzer, tagger, pretransfer, transfer, interchunk, postchunk, generator, reformatter). |
5 | Porting interchunk.cc, postchunk.cc. |
6 | Continue prevoius week work. Port the pretransfer.cc to Java. |
7 | Regression testing/QA of pretransfer, interchunk, postchunk, making sure it is 100% compatible (testing Java version and C++ version on a range of inputs). Fix bugs, documenting the test. |
8 | Formating: De/reformatter for text. Deformatter for HTML. |
9 | Finding parts of pipe process that can be done in-process in Java, and doing in-process instead of spawning a subprocess. It will be helpfull to get Apertium in a single executable file for mobile device. |
10 | Optimization: Find/decide on data structures causing less overhead than standard collection classes and implement usage of them. |
11 | Generating a custom prepared .jar package per mode, so a language pair can be shipped as a JAR file with all included. |
12 | Optimization: Profile hot spots, remedy CPU intensive tasks. Complete documentation. |
From my present point of view about I have prepared the above work plan which may updated with time .
List of qualifications and experience:
I had completed a two years Academic Professional degree at IT and Communication Technology from TEC Copenhagen. Currently I am studying at fifth semester in IT and Communication Technology at Copenhagen university college of engineering.
During my study, I have studied various programming languages such as Java/C/C++. Data bases, Data mining,Data structure and Algorithm, Distributed system, Embedded system(Linux), web application,Network protocol,Network security, Operating System(Linux).
I have worked with several internal project using Java and c programming language, which makes me confident enough to work with this project.
Though this is my first time I am applying for an open source project, I believe my determination, knowledge and passion to work with an open source project will help me to accomplished the proposed project. Besides it is also possible to get help from the community which makes it realistic to complete this project.
Any other activity than Summer-of-Code:
Due to credit from my previous education, I am having only two optional courses where I used to have four courses and a big project, having needed less time in my school ensures me to work with this project uninterruptedly.