Difference between revisions of "User:Commial/GSoCApplication2011"

From Apertium
Jump to navigation Jump to search
Line 52: Line 52:


{| class="wikitable" border="1"
!Date
!Plan to do
|-
|Week 1
|''More work hours are planned to make up for lost weeks.''




====Port the code for all recent php versions; By the way, finish to familiarize with the code====
Week 1


for example :
More work hours are planned to make up for lost weeks.



====Port the code for all recent php versions; By the way, finish to familiarize with the code====

for example :


if($_FILES["in_doc"] AND !($_FILES["in_doc"]["error"] > 0))
if($_FILES["in_doc"] AND !($_FILES["in_doc"]["error"] > 0))


file_put_contents
file_put_contents
become :

become :


if(isset($_FILES[“in_doc”] and !empty($_FILES[“in_doc”]) ...
if(isset($_FILES[“in_doc”] and !empty($_FILES[“in_doc”]) ...
Line 80: Line 79:
====Rewrite the Javascript as separate modules so that it be easy to decide which tools to enable or disable in the interface====
====Rewrite the Javascript as separate modules so that it be easy to decide which tools to enable or disable in the interface====


-> Make a dependences tree of functions in current libraries
-> Make a dependences tree of functions in current libraries


-> Write more generic functions and procedure to give foundation for modules
-> Write more generic functions and procedure to give foundation for modules


-> Make an user interface to enabled/disabled modules ( List of modules, recommended modules with description, use example )
-> Make an user interface to enabled/disabled modules ( List of modules, recommended modules with description, use example )


|-

Week 2
|Week 2
|''More work hours are planned to make up for lost weeks.''

More work hours are planned to make up for lost weeks.


====Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell and LanguageTool.====
====Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell and LanguageTool.====


-> Separate the translation system and the environment management system
-> Separate the translation system and the environment management system


-> Make the translation system as an PHP Object, which is initialised with languages pairs
-> Make the translation system as an PHP Object, which is initialised with languages pairs


-> Extend the environment management system to allow writing of interfaces modules for Aspell and LanguageTool
-> Extend the environment management system to allow writing of interfaces modules for Aspell and LanguageTool


-> Write these modules
-> Write these modules


Line 108: Line 105:
====Provide more formatting modules; currently only ODF, OOXML, html and text are supported. Mediawiki (using apertium-mediawiki) and others are wanted.====
====Provide more formatting modules; currently only ODF, OOXML, html and text are supported. Mediawiki (using apertium-mediawiki) and others are wanted.====


-> Add module for Rich Text Format formatting (using existing Apertium's modules)
-> Add module for Rich Text Format formatting (using existing Apertium's modules)


|-
|Week 3
|''More work hours are planned to make up for lost weeks.''


(suite)
Week 3
-> Add module for Mediawiki formatting (using existing Apertium's modules)


-> Make test for Pdf formatting, with pdf2html on a pdf set, and test for the reconstruction step
More work hours are planned to make up for lost weeks.


-> If they are inconclusive, write the pdf module


-> Provide a module using cuneiform ( who is able to recognize multiple languages and maintain a basic layout ) for Pictures. The export HTML fonctionnality can be used (jointly with html module ).
4)(suite) -> Add module for Mediawiki formatting (using existing Apertium's modules)


-> Make test for Pdf formatting, with pdf2html on a pdf set, and test for the reconstruction step

-> If they are inconclusive, write the pdf module

-> Provide a module using cuneiform ( who is able to recognize multiple languages and maintain a basic layout ) for Pictures. The export HTML fonctionnality can be used (jointly with html module ).



====Localisation, make it possible to translate the interface into different languages.====
====Localisation, make it possible to translate the interface into different languages.====


-> The localisation is given by the browser, IP Address or set by the user
-> The localisation is given by the browser, IP Address or set by the user


-> The choice is save ( cookies, .. )
-> The choice is save ( cookies, .. )


-> The interface texts are load from files, which contain the text for every button, checkbox, ... in a specific order ( to make analysis faster than an XML format )
-> The interface texts are load from files, which contain the text for every button, checkbox, ... in a specific order ( to make analysis faster than an XML format )


As it was said, people who do want to write a language file should share it with the Apertium community.
As it was said, people who do want to write a language file should share it with the Apertium community.





====Improve overall design====
====Improve overall design====


-> The current design is very basic ( or non-existent )
-> The current design is very basic ( or non-existent )


-> The Apertium website will pass on WordPress system, so the overall design can be taken
-> The Apertium website will pass on WordPress system, so the overall design can be taken


-> If the website isn't yet on WordPress, make a design similar to the Apertium current website ( in colour, simplicity, .. )
-> If the website isn't yet on WordPress, make a design similar to the Apertium current website ( in colour, simplicity, .. )




====Fix possibly remaining bugs====
====Fix possibly remaining bugs====


-> It's a task on the long time, but at this stage, I want to make a
-> It's a task on the long time, but at this stage, I want to make a complete test of the system


-> It's include :
complete test of the system


- Fix remaining bugs
-> It's include :


|-
- Fix remaining bugs
|Week 4

|''More work hours are planned to make up for lost weeks.''
Week 4

More work hours are planned to make up for lost weeks.



7)(suite)
(suite)


- Make some test sets
- Make some test sets


- Optimize code
- Optimize code


- Fix security issues
- Fix security issues


Line 184: Line 164:
====Make it possible to input a TMX to help for a translation (either with Apertium's TMX input system, or an external tool like OmegaT)====
====Make it possible to input a TMX to help for a translation (either with Apertium's TMX input system, or an external tool like OmegaT)====


-> Ask Sergio Ortiz on the integration of TMX to identify and translate segments from a translation memory
-> Ask Sergio Ortiz on the integration of TMX to identify and translate segments from a translation memory


-> Make it
-> Make it




====Use existing server-side TMX database, so that the memory generated after a translation be stored and reused automatically for next translations in this language pair.====
====Use existing server-side TMX database, so that the memory generated after a translation be stored and reused automatically for next translations in this language pair.====
Line 198: Line 174:


The main idea is to permit to the user to give or alter current translations, export them to TMX format. At the same
The main idea is to permit to the user to give or alter current translations, export them to TMX format. At the same time, these modifications are saved in a server-side TMX
database, which contain two kinds of dictionaries by language :


- Submitted and awaiting for approving translations
time, these modifications are saved in a server-side TMX database, which contain two kinds of dictionaries by language :


- Submitted and awaiting for approving translations
- Approved translations


-> Add it in the user interface with simply a checkbox "Reuse old translations to improve translation" and another one "Share translation results" (this choice is important
- Approved translations
due to confidentiality problem with content submitted to the engine)


-> Perhaps we can see here a way to use the logging system, may edit it to allow him to detect what the user change on the translation ( Like Igor Chtivelband said ).
-> Add it in the user interface with simply a checkbox "Reuse old translations to improve translation" and another one "Share translation results" (this choice is important due to confidentiality problem with content submitted to the engine)

-> Perhaps we can see here a way to use the logging system, may edit it to allow him to detect what the user change on the translation ( Like Igor Chtivelband said ).


It seems necessary to save the context too.
It seems necessary to save the context too.
|-

Week 5
|Week 5
|(continue and finish)

9)(continue and finish)


According to Jimmy O'Regan, it seems that it's a difficult task. Time is needed.
According to Jimmy O'Regan, it seems that it's a difficult task. Time is needed.




Again, Fix possibly remaining bugs to have a good foundation for the further.
Again, Fix possibly remaining bugs to have a good foundation for the further.
|-

July 15
|July 15
|Mid Evaluation
|-

|Week 6
Mid Evaluation
|

Week 6


====Provide a module to use the apertium.org web service instead of a local Apertium installation====
====Provide a module to use the apertium.org web service instead of a local Apertium installation====


-> Make a bridge between apertium.org website and local Apertium installation :
-> Make a bridge between apertium.org website and local Apertium installation :
-perhaps by parsing, with regular expression, the page translation result ( simulate the entry of user on the website, and analyse the result ), but it is expensive


-an other choice is the use of an API for the apertium.org web-service, which will avoid the phase return analysis
-perhaps by parsing, with regular expression, the page translation result ( simulate the entry of user on the website, and analyse the result ), but it is expensive

-an other choice is the use of an API for the apertium.org web-service, which will avoid the phase return analysis



<blockquote>
To make my explanation easier to understand, here is an example :
To make my explanation easier to understand, here is an example :


To translate "Test" on the website, you have to build a web POST request
To translate "Test" on the website, you have to build a web POST request


The servers return a html file, so you have to research the expression between '>' and '</textarea><br/><label for="mark">' (perhaps a more accurate research).
The servers return a html file, so you have to research the expression between ''>'' and ''</textarea><br/><label for="mark">'' (perhaps a more accurate research).



But if an API is write, it allow the script to just get back the server response, because it will be only the translation, without html formatting.
But if an API is write, it allow the script to just get back the server response, because it will be only the translation, without html formatting.
</blockquote>

-> Add the possibility for the user to use this service, in the interface


-> Add the possibility for the user to use this service, in the interface




====Make it faster and cross-browser compatible====
====Make it faster and cross-browser compatible====


-> Give, and add to JS existing modules, libraries to adapt the system for recent browser. We can down to IE 6.
-> Give, and add to JS existing modules, libraries to adapt the system for recent browser. We can down to IE 6.

-> Make it faster by analysing the time critical path


-> Make the different libraries download faster, by reducing their size ( like Google for jquery )
-> Make it faster by analysing the time critical path


-> Make the different libraries download faster, by reducing their size ( like Google for jquery )
Week 7
|-
|Week 7
|


11)(continue and finish)
(continue and finish)


This is a long task, need time.
This is a long task, need time.


|-
Week 8
|Week 8
|
====Improve integration with Wikipedia — it should be able to fetch pages, translate, allow to be revised and then published.====
====Improve integration with Wikipedia — it should be able to fetch pages, translate, allow to be revised and then published.====


-> Integrate module for recognize Wikipedia format
-> Integrate module for recognize Wikipedia format


-> Use WikiBhasha to fetch pages, and post them
-> Use WikiBhasha to fetch pages, and post them


Line 288: Line 249:
====Provide modules to integrate alternative tools for spell and grammar checking (AfterTheDeadline, etc.)====
====Provide modules to integrate alternative tools for spell and grammar checking (AfterTheDeadline, etc.)====


-> Make outputs during the translation process, which can be redirected to others tool
-> Make outputs during the translation process, which can be redirected to others tool


-> Define the possible interaction with the software
-> Define the possible interaction with the software


-> Implement these interactions
-> Implement these interactions
|-
|Week 9
|


(continue and finish)
Week 9

13)(continue and finish)


====Fix new or remaining s bugs====
====Fix new or remaining s bugs====


-> Stand back, redo 7)
-> Stand back
|-
|Week 10
|


(continue and finish)
Week 10
|-
|August 16

|
14)(continue and finish)

August 16


Suggested 'pencils down' date. Take to scrub code, write tests, improve documentation, etc.
Suggested 'pencils down' date. Take to scrub code, write tests, improve documentation, etc.
|-

August 26
|August 26
|


End
End


|}



Revision as of 09:26, 8 April 2011

Email: camille.mougey@ensimag.fr

MOUGEY Camille

First Year ENSIMAG (Grenoble)

Address : mougeyc@ensimag.fr

IRC : commial/ajax

Website : [1]

Blog : [2]




Application for : APERTIUM : Improvements to postedition interface



Contents

Why is it you are interested in machine translation?

Currently I'm in a school with a lot of exchange program, so I continually see a mix of society, of manners, of culture, but what is the more “visible” is the mix of language. It is necessary to understand everyone, and of course we don't have time, or inclination, to learn a language, just for a work, just for a e-mail answer … A machine translation becomes necessary, and this machine have to work best as possible and be simplest possible use.

In addition, there is an aspect reached by machine translation, which is not reached by most applications : strong link with human. Indeed, we tell the computer to mimic the human in its own domain, the language.

But this is just my point of view, I think the people participating to the project all have different and varied reasons :) .


Why is it that you are interested in the Apertium project?

I'm really enthusiast to help the community to advance, because although I use open source software, I never had the opportunity to participate in the adventure.

I chose this project due to two main reasons :


-> I think if we want a project to be develop, it have to “touch” many people. And most people aren't accustomed to download, install, configure and use shell to obtain a result. Currently, with the development of the cloud, most people want on-line services, accessible with just a click. It's why, for me, the Web Interface is very important and have to use all the power of the tool behind, alias Apertium.

-> My skills concerns web development, particulary PHP, Javascript ( and the mix : AJAX ). I really like to develop with these languages, and for this reason, I have accumulated experience in this kind of development.


Which of the published tasks are you interested in? What do you plan to do?

I want to apply for the Google Summer of Code project named “Improvements to the Advanced Web Interface”. Below, my Summer's planning .

Due to school end of year's project, I would start on the June 10


Date Plan to do
Week 1 More work hours are planned to make up for lost weeks.


Port the code for all recent php versions; By the way, finish to familiarize with the code

for example :


               if($_FILES["in_doc"] AND !($_FILES["in_doc"]["error"] > 0))
               file_put_contents

become :

               if(isset($_FILES[“in_doc”] and !empty($_FILES[“in_doc”]) ...
               fopen, fwrite, fclose


Rewrite the Javascript as separate modules so that it be easy to decide which tools to enable or disable in the interface

-> Make a dependences tree of functions in current libraries

-> Write more generic functions and procedure to give foundation for modules

-> Make an user interface to enabled/disabled modules ( List of modules, recommended modules with description, use example )


Week 2 More work hours are planned to make up for lost weeks.


Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell and LanguageTool.

-> Separate the translation system and the environment management system

-> Make the translation system as an PHP Object, which is initialised with languages pairs

-> Extend the environment management system to allow writing of interfaces modules for Aspell and LanguageTool

-> Write these modules


Provide more formatting modules; currently only ODF, OOXML, html and text are supported. Mediawiki (using apertium-mediawiki) and others are wanted.

-> Add module for Rich Text Format formatting (using existing Apertium's modules)

Week 3 More work hours are planned to make up for lost weeks.

(suite) -> Add module for Mediawiki formatting (using existing Apertium's modules)

-> Make test for Pdf formatting, with pdf2html on a pdf set, and test for the reconstruction step

-> If they are inconclusive, write the pdf module

-> Provide a module using cuneiform ( who is able to recognize multiple languages and maintain a basic layout ) for Pictures. The export HTML fonctionnality can be used (jointly with html module ).


Localisation, make it possible to translate the interface into different languages.

-> The localisation is given by the browser, IP Address or set by the user

-> The choice is save ( cookies, .. )

-> The interface texts are load from files, which contain the text for every button, checkbox, ... in a specific order ( to make analysis faster than an XML format )

As it was said, people who do want to write a language file should share it with the Apertium community.


Improve overall design

-> The current design is very basic ( or non-existent )

-> The Apertium website will pass on WordPress system, so the overall design can be taken

-> If the website isn't yet on WordPress, make a design similar to the Apertium current website ( in colour, simplicity, .. )

Fix possibly remaining bugs

-> It's a task on the long time, but at this stage, I want to make a complete test of the system

-> It's include :

- Fix remaining bugs

Week 4 More work hours are planned to make up for lost weeks.

(suite)

- Make some test sets

- Optimize code

- Fix security issues


Make it possible to input a TMX to help for a translation (either with Apertium's TMX input system, or an external tool like OmegaT)

-> Ask Sergio Ortiz on the integration of TMX to identify and translate segments from a translation memory

-> Make it

Use existing server-side TMX database, so that the memory generated after a translation be stored and reused automatically for next translations in this language pair.

(It might be wise to add some kind of validation too, to make sure that people don't mess with the whole system by submitting wrong translations...)


The main idea is to permit to the user to give or alter current translations, export them to TMX format. At the same time, these modifications are saved in a server-side TMX database, which contain two kinds of dictionaries by language :

- Submitted and awaiting for approving translations

- Approved translations

-> Add it in the user interface with simply a checkbox "Reuse old translations to improve translation" and another one "Share translation results" (this choice is important due to confidentiality problem with content submitted to the engine)

-> Perhaps we can see here a way to use the logging system, may edit it to allow him to detect what the user change on the translation ( Like Igor Chtivelband said ).

It seems necessary to save the context too.

Week 5 (continue and finish)

According to Jimmy O'Regan, it seems that it's a difficult task. Time is needed.

Again, Fix possibly remaining bugs to have a good foundation for the further.

July 15 Mid Evaluation
Week 6

Provide a module to use the apertium.org web service instead of a local Apertium installation

-> Make a bridge between apertium.org website and local Apertium installation : -perhaps by parsing, with regular expression, the page translation result ( simulate the entry of user on the website, and analyse the result ), but it is expensive

-an other choice is the use of an API for the apertium.org web-service, which will avoid the phase return analysis

To make my explanation easier to understand, here is an example :

To translate "Test" on the website, you have to build a web POST request

The servers return a html file, so you have to research the expression between > and </textarea>
<label for="mark">
(perhaps a more accurate research).

But if an API is write, it allow the script to just get back the server response, because it will be only the translation, without html formatting.

-> Add the possibility for the user to use this service, in the interface


Make it faster and cross-browser compatible

-> Give, and add to JS existing modules, libraries to adapt the system for recent browser. We can down to IE 6.

-> Make it faster by analysing the time critical path

-> Make the different libraries download faster, by reducing their size ( like Google for jquery )

Week 7

(continue and finish)

This is a long task, need time.

Week 8

Improve integration with Wikipedia — it should be able to fetch pages, translate, allow to be revised and then published.

-> Integrate module for recognize Wikipedia format

-> Use WikiBhasha to fetch pages, and post them


Provide modules to integrate alternative tools for spell and grammar checking (AfterTheDeadline, etc.)

-> Make outputs during the translation process, which can be redirected to others tool

-> Define the possible interaction with the software

-> Implement these interactions

Week 9

(continue and finish)

Fix new or remaining s bugs

-> Stand back

Week 10

(continue and finish)

August 16

Suggested 'pencils down' date. Take to scrub code, write tests, improve documentation, etc.

August 26

End



List your skills and give evidence of your qualifications

Currently, I'm in the first year of the ENSIMAG ( [3] ), a school in computer engineering and applied mathematics.

I already developped some website project( like [4], [5], [6], [7] ) and do some web security audit (like CMS kwsphp : [8] ) .

I have developping skills in PHP, Html/Css, Javascript/AJAX, C, Python, ADA, and notions of algorithmic ( such as cost evaluation, data structure, .. ).


List any non-Summer-of-Code plans you have for the Summer

Apart the end of school project (cited above) and a 4 days journey with friends during august, nothing is plan during this Summer.