User:Commial/AWI

From Apertium
Jump to navigation Jump to search

APERTIUM : Improvements to postedition interface Google Summer Of Code 2011



Jobs already done

Port the code for all recent php versions

I have replace all file_put_contents by fopen, fwrite, fclose, which are supported by older php versions. (Commit 30690)

Rewrite the PHP & Javascript as separate modules so that it be easy to decide which tools to enable or disable in the interface

The file "modules.php" is designed to manage modules. It includes :

Configuration of modules :

* Module structure : 
* name : Module's name
* description : Short description of the module
* default : Recommanded module
* javascript : an array of javascript dependencies (with path javascript/)
* php : an array of php dependencies (with path includes/)
* button_in : HTML code for the interface, under the input, when the module is activated
* button_out : HTML code for the interface, under the output, when the module is activated

For example : 'SpellGrammarChecking':

'name' => 'Spell and Grammar checking',
'description' => 'Integrate the ability to check both input and output texts for mistakes, with a button “Check for mistakes”.
When pressed, it runs spell checking and grammar checking on the text and underlines mistakes in different colours (red for spelling, blue for grammar).',
'default' => TRUE,
'javascript' => array('textEditor.js', 'main.js'),
'php' => array('gramproof.php', 'strings.php', 'system.php'),
'button_in' => '<input type="submit" name="check_input" value="Check for mistakes" />',
'button_out' => '<input type="submit" name="check_output" value="Check for mistakes" />'

It includes too :

Functions for modules management :

* Check dependencies of a module
* Load PHP dependencies of a module
* Write Interface of a module
* Load/Unload a module
* List of module loaded, list of recommended modules
* Write the list of Javascript libraries needed by loaded modules

How I see the current separations of modules : Apertium-AWI Organisation 06.15.jpg (Concerning PHP files)

The current interface for Load/Unload Modules (in Firefox 4, Ubuntu) : Apertium-AWI ModulesInterface 06.15.jpg

(Commit 30719) includes :

* Creation of module structure
* Creation of the content and form to easily Load/Unload modules
* Cut existing files to make independent modules (the module 'Logs' isn't currently independent)
* Modification of the interface display to allow the Add/Remove of buttons
* Few changes to make the code more readable (repetitive code)

(Commit 30724) includes :

* Creation of gramproof.js, which contains javascript functions used by the module Spell checking and Grammar checking
* Creation of logs.js, which contains javascript functions used by the module Logs
* Some functions in textEditor.js are now in gramproof.js and logs.js
* Functions of nodes.js, paste_event.js, logging_lowlevel.js and logging.js are now in logs.js (gain in number of browser request)
* The module Logs is now independent

How I see the current separations of modules in javascript : Apertium-AWI ModulesInterface 06.16.jpg

(Commit 30738) includes :

* Correction of an error in gramproof.js. The function displaySuggestionsList was defined two time.

(Commit 30792) includes :

* Creation of main.tmpl, textEditor.tmpl, which are template files for main.js and textEditor.js
* Update index.php to launch the rewrite of main.js and textEditor.js when modules are load/unload
* Add the function which rewrite file using template (named gen_templateJS(source, target)), in modules.php
* Displacement of "is_installed" function, which is a function for the environment, not the translation core

An example of how the function gen_templateJS works : Apertium-AWI ModulesInterface 06.19.jpg

(Commit 30819) includes :

* Modules to load are now set by the web-master, in config.php
* The home page show which modules are loaded, and their description
* The generation of main.js, textEditor.js has to be launched by the web-master, by uncommented a region in modules.php
* Modules loaded are retains in a global variable $modules_load, not Sessions anymore

(Commit 30925) includes :

* translate.php has to use global variable for php translate Object($trans), not Sessions

All modules are now separate, and it's easy to add, load/unload, alter and remove module.

Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell and LanguageTool

"language.php" has been separate in 2 parts : environment management and translation system.

The translation system is managed by a PHP Object named "translate", define in includes/translation.php. It's instantiated in language.php.

How it works: Apertium-AWI translate Object 06.20.jpg

The link with external tool such as LanguageTool, Aspell, After the DeadLine is now made by an object named "externtool", define in includes/externaltool.php.

The webmaster can choose, in includes/config.php, which tool he wants to use, for spell checking and grammar proofing. He hasn't to do any other change, all is managed by the object, which provide generic functions SpellChecking and GrammarProofing.

It's instantiated in gramproof.php (to conserve the modularity).

How it works: Apertium-AWI externtool Object.jpg

(Commit 30827) includes:

* Cut language.php in 2 part: environment management and translation system
* Create a PHP Object named translate to managed the translation system
* Create a PHP Object named externtool to managed the link with external tool, and provide generic functions SpellChecking and GrammarProofing
* Modify translate.php, ajax.php to use translate Object
* Modify grampr

oof.php to use externtool Object

* Modify config.php to include the choice of external tool

(Commit 30903) includes:

* After the DeadLine is now support thanks to functions in externtool Object

Functions getATDresult, analyseATDresult and getATDCorrection allow the externtool Object to communicate with After the DeadLine.

The steps are :

- Getting spell or grammar error from local ATD server, or online server (service.afterthedeadline.com, set in config.php: $ATD_link), in language en, fr, pt, de, es (define in externaltool.php: $ATDsupport_lang) or en by default
- Analyse the result (as an XML Object), to create an object correctly formed for gramproof module functions. It means detecting suggestions, text to replace, description, and start and end of the string in the text(may this last part can be do more rigorously, but I don't know how).
- Pass the result to the Object output, and so to gramproof module.

(Commit 30990) includes:

* Split spell checking and grammar proofing into spelling.php, grammar.php
* Make 2 differents PHP Astract Object, named Spelling and Grammar
* Write Abstract class for the use of LanguageTool(grammar_LT.php), Aspell(spelling_aspell.php) and AfterTheDeadLine(gramproof_ATD.php)
* Remove externtool object, and use this new object instead in gramproof.php

How it's work now: Apertium-AWI grammar-spelling Object 06 23.jpg

If you want to try, a good example is "I has probl". Normally, there and a grammar mistake and a spell mistake. You can try with aspell-LanguageTool, ATD-LanguageTool, ATD-ATD or aspell-ATD, if you want to compare programs results.

This part can be considerate as finish, cause the AWI is now able to use different external tool, like aspell, LanguageTool and After the DeadLine. In addition, interface with others program can be "easily" done, by adding necessary functions in new abstracts objects. And it is the same for translation, we can easily use an other program than Apertium, by modifying the translate Object... but it's not the purpose here.

Provide more formatting modules; currently only ODF, OOXML, html and text are supported. Mediawiki (using apertium-mediawiki) and others are wanted.

(Commit 30984) includes:

* Correction of some PHP end of files (?>\n => ?>)
* Correction of markers '.[]' in format.php
* Correction of markers '[\n]'

The support of RTF files has been added.

The lines returns in file are correctly treated.

Current files treatment : Apertium-AWI file treatment 06 32.jpg

(Commit 30998) includes:

* Adding the support of Mediawiki format (with extension .mediawiki)
* Adding the display of extensions supported in home page(index.php)
* The list of this kind of extensions is set in config.php

(Commit 31002) includes:

* Adding the support of PDF format, using pdftohtml and wkhtmltopdf

Pdf management :

Apertium-AWI pdf management 06 24.png

What is see in AWI(firefox 4 on Ubuntu) : Apertium-AWI AWIforPDF 06 24.png

(Commit 31010) includes:

* The temp\/APRE... , due to the title of html document (<TITLE></TITLE>) set automatically by pdftohtml, has been removed.

(Commit 31107) includes:

* Adding the management of image file, using Tesseract as OCR and 'convert' for picture conversion (set in includes/config.php).

(Commit 31109) includes:

* Update the list of supported extensions (add jpeg, jpg, tif, tiff, png).

Pictures management :

Apertium-AWI pictures management 06 27.png

Next step : Try to use the export HTML fonctionnality (jointly with html module ) ? Integrates file treatment for users with Javascript deactivated ?

Localisation, make it possible to translate the interface into different languages

(Commit 31121) includes:

* Allow the user to change the interface language
* Put in place the system to managed this changement
* Put in place the system to integrate these changements with module
* Change index page to allow it to use translation
* Write the english, french translation for index page
* Write the english, french translation for modules FormattedDocumentHandling, SpellGrammarChecking
* Put in place the system for detect and retains the user language choice

(Commit 31163) includes:

* Adding the support of interface translation for all current modules
* Adding the support of interface translation for translate.php
* Write the french translation of all current module
* Write the french translation of translate.php

How it is organised : Apertium-AWI Interface multi-languages 06 28.png


README content:

  1. Apertium - AWI : interface multi-languages ####

Contributed By Mougey Camille <commial@gmail.com>

for Google Summer of Code 2011

Mentors : Arnaud Vié, Luis Villarejo


Summary --- Summary --- I/ Directory Organisation

II/ How to managed interface translation



I/ Directory Organisation



In the root (./templates/), there are this file(README)

and the file "avalaible_languages".


The README explains how the directory is organised, and

how managed interface translation.

The "avalaible_languages" lists languages avalaible,

ie languages for which translation exists.


For each avalaible language, there is a directory named

by the language.

In this directory, there are some file:

index - Contains translation for the home page,

in a specific order

translate - Contains translation for the translate page,

in a specific order

module_name - Contains translation for the module name,

ie the file for the module Logs is

module_Logs, in a specific order


The correspondence between the order in the file, and

the signification, is write in the file

../includes/formats.php: function retrieve_info

For example, a module file is organised as:

name

description

text for button_in

text for button_out



II/ How to managed interface translation



The text is displayed in page by the call of the function

get_text(page, information requested), or

write_text(page, information requested). For example, the

title of the home page is get by

get_text('index', 'title');

If you want to add text in page, or add page, you have to

write the correspondence in the file format.php.

In this example, the correspondence is

index->title : first line of the file index


If you want to add a language, for example the espanol,

you have to write all file corresponding, who contain

translation, in a directory es/, ie

es/index, es/translate, es/module_Logs, ...

and add the name of language in ./avalaible_languages.


To modify a language, you just have to open the

adequate file, write your own translation, and save.

If you want to modify the description of the Logs module,

open en/module_Logs and change the second line.


Do not forget to commit your changes, to allow the

apertium community to use your work.


For any problem/question, you can contact me, or go on

the Apertium IRC Channel (#apertium).

(Commit 31164) includes:

* The file README explains how the directory is organised, and how you can set, change current translation.

Should add new languages, but I haven't the level in other languages than French.

Improve overall design

(Commit 31204) includes:

* Modify the current design of page index.php and translate.php

The Apertium website will pass on WordPress system, so the overall design will be taken. But for now, I make a design similar to the Apertium current website ( in colour, simplicity, .. ).

Index page: Apertium-AWI Index 06 29.png

Translate page: Apertium-AWI Translate 06 29.png

(Older version)


Current Jobs

Fix possibly remaining bugs and optimize

(Commit 31206) includes:

* In system.php:75, out1 and out2 were undefined
* In translate.php:260,261,262 , input_doc, input_doc_type, input_doc_name were undefined
* Resize images/yes.png and images/no.png from 800x800 to 40x40 to speed-up page loading (Win 2 * 44,7Ko, Google speed-page (58/100 -> 87/100))
* Reorganize script order to allow the parallelization of the load of CSS and Javascript (CSS has to be before Javascript)
* Set correctly the label of <select> in includes/template.php, to allow Chrome browser to show 'fr', 'en' and not 'new_lang'
* In includes/template.php, HTTP_ACCEPT_LANGUAGE can be undefined
* The charset UTF-8 was not well defined (bad meta tags)
* The onChange attribute has to be in the <select>, not the <form>, and has to be write onchange (lower)
* The style 'language_select' is use 2 times, use 'class' instead of 'id'
* '< center >' tag is not any more in the XHTML Strict 1.0, nor Transitional 1.0. style='text-align: center;' is use at the place

(Commit 31212) includes:

* Fix an infinite loop in logs.js:1150

(Commit 31234) includes:

* Add a script for put the AWI in place(compress/uncompress  CSS/JS files, generate JS files from templates, delete itself)
* Gain of around 30KB of data to download, and a faster analyse (translate.php : 99/100 on Google Page Speed)

Description of publish.php : Apertium-AWI Publish script 06 30.png

(Commit 31239) includes:

* Getting index.php W3C Valid
* Compress with no lose images/yes.png and images/no.png

(Commit 31242) includes:

* If the server allows register_globals, a malicious user have a shell access, by using source_language, target_language, through checkforMistakes, through GetGrammarCorrection or GetSpellCorrection, and then executeCommand. So, Add some verification in includes/gramproof.php, includes/translation.php
* If you make a request POST on the file translate.php, you can have a shell access. Fix this issue.

(Commit 31271) includes:

* Fix includes/gramproof.php: checkforMistakes, bad filter

(Commit 31274) includes:

* Fix a bad dependencies for textEditor.js

(Commit 31280) includes:

* Fix the XSS issue in ajax.php

(Commit 31377) includes:

* Fix the checkForMistakes error with hr tags. Now, the input_text is divided into text1(hr tag)text2 ... and text1, text2 are treated separately. 
* Modify the security test in ajax.php (a case was treat 2 times)

(Commit 31378) includes:

* Fix checkForMistakes. Simple text chain was not correctly treated.

(Commit 31379) includes:

* Fix the TMX generation