User:Commial/AWI

From Apertium
Jump to navigation Jump to search

APERTIUM : Improvements to postedition interface Google Summer Of Code 2011



Jobs already done[edit]

Port the code for all recent php versions[edit]

I have replace all file_put_contents by fopen, fwrite, fclose, which are supported by older php versions. (Commit 30690)

Rewrite the PHP & Javascript as separate modules so that it be easy to decide which tools to enable or disable in the interface[edit]

The file "modules.php" is designed to manage modules. It includes :

Configuration of modules :

* Module structure : 
* name : Module's name
* description : Short description of the module
* default : Recommanded module
* javascript : an array of javascript dependencies (with path javascript/)
* php : an array of php dependencies (with path includes/)
* button_in : HTML code for the interface, under the input, when the module is activated
* button_out : HTML code for the interface, under the output, when the module is activated

For example : 'SpellGrammarChecking':

'name' => 'Spell and Grammar checking',
'description' => 'Integrate the ability to check both input and output texts for mistakes, with a button “Check for mistakes”.
When pressed, it runs spell checking and grammar checking on the text and underlines mistakes in different colours (red for spelling, blue for grammar).',
'default' => TRUE,
'javascript' => array('textEditor.js', 'main.js'),
'php' => array('gramproof.php', 'strings.php', 'system.php'),
'button_in' => '<input type="submit" name="check_input" value="Check for mistakes" />',
'button_out' => '<input type="submit" name="check_output" value="Check for mistakes" />'

It includes too :

Functions for modules management :

* Check dependencies of a module
* Load PHP dependencies of a module
* Write Interface of a module
* Load/Unload a module
* List of module loaded, list of recommended modules
* Write the list of Javascript libraries needed by loaded modules

How I see the current separations of modules : Apertium-AWI Organisation 06.15.jpg (Concerning PHP files)

The current interface for Load/Unload Modules (in Firefox 4, Ubuntu) : Apertium-AWI ModulesInterface 06.15.jpg

(Commit 30719) includes :

* Creation of module structure
* Creation of the content and form to easily Load/Unload modules
* Cut existing files to make independent modules (the module 'Logs' isn't currently independent)
* Modification of the interface display to allow the Add/Remove of buttons
* Few changes to make the code more readable (repetitive code)

(Commit 30724) includes :

* Creation of gramproof.js, which contains javascript functions used by the module Spell checking and Grammar checking
* Creation of logs.js, which contains javascript functions used by the module Logs
* Some functions in textEditor.js are now in gramproof.js and logs.js
* Functions of nodes.js, paste_event.js, logging_lowlevel.js and logging.js are now in logs.js (gain in number of browser request)
* The module Logs is now independent

How I see the current separations of modules in javascript : Apertium-AWI ModulesInterface 06.16.jpg

(Commit 30738) includes :

* Correction of an error in gramproof.js. The function displaySuggestionsList was defined two time.

(Commit 30792) includes :

* Creation of main.tmpl, textEditor.tmpl, which are template files for main.js and textEditor.js
* Update index.php to launch the rewrite of main.js and textEditor.js when modules are load/unload
* Add the function which rewrite file using template (named gen_templateJS(source, target)), in modules.php
* Displacement of "is_installed" function, which is a function for the environment, not the translation core

An example of how the function gen_templateJS works : Apertium-AWI ModulesInterface 06.19.jpg

(Commit 30819) includes :

* Modules to load are now set by the web-master, in config.php
* The home page show which modules are loaded, and their description
* The generation of main.js, textEditor.js has to be launched by the web-master, by uncommented a region in modules.php
* Modules loaded are retains in a global variable $modules_load, not Sessions anymore

(Commit 30925) includes :

* translate.php has to use global variable for php translate Object($trans), not Sessions

All modules are now separate, and it's easy to add, load/unload, alter and remove module.

Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell and LanguageTool[edit]

"language.php" has been separate in 2 parts : environment management and translation system.

The translation system is managed by a PHP Object named "translate", define in includes/translation.php. It's instantiated in language.php.

How it works: Apertium-AWI translate Object 06.20.jpg

The link with external tool such as LanguageTool, Aspell, After the DeadLine is now made by an object named "externtool", define in includes/externaltool.php.

The webmaster can choose, in includes/config.php, which tool he wants to use, for spell checking and grammar proofing. He hasn't to do any other change, all is managed by the object, which provide generic functions SpellChecking and GrammarProofing.

It's instantiated in gramproof.php (to conserve the modularity).

How it works: Apertium-AWI externtool Object.jpg

(Commit 30827) includes:

* Cut language.php in 2 part: environment management and translation system
* Create a PHP Object named translate to managed the translation system
* Create a PHP Object named externtool to managed the link with external tool, and provide generic functions SpellChecking and GrammarProofing
* Modify translate.php, ajax.php to use translate Object
* Modify grampr

oof.php to use externtool Object

* Modify config.php to include the choice of external tool

(Commit 30903) includes:

* After the DeadLine is now support thanks to functions in externtool Object

Functions getATDresult, analyseATDresult and getATDCorrection allow the externtool Object to communicate with After the DeadLine.

The steps are :

- Getting spell or grammar error from local ATD server, or online server (service.afterthedeadline.com, set in config.php: $ATD_link), in language en, fr, pt, de, es (define in externaltool.php: $ATDsupport_lang) or en by default
- Analyse the result (as an XML Object), to create an object correctly formed for gramproof module functions. It means detecting suggestions, text to replace, description, and start and end of the string in the text(may this last part can be do more rigorously, but I don't know how).
- Pass the result to the Object output, and so to gramproof module.

(Commit 30990) includes:

* Split spell checking and grammar proofing into spelling.php, grammar.php
* Make 2 differents PHP Astract Object, named Spelling and Grammar
* Write Abstract class for the use of LanguageTool(grammar_LT.php), Aspell(spelling_aspell.php) and AfterTheDeadLine(gramproof_ATD.php)
* Remove externtool object, and use this new object instead in gramproof.php

How it's work now: Apertium-AWI grammar-spelling Object 06 23.jpg

If you want to try, a good example is "I has probl". Normally, there and a grammar mistake and a spell mistake. You can try with aspell-LanguageTool, ATD-LanguageTool, ATD-ATD or aspell-ATD, if you want to compare programs results.

This part can be considerate as finish, cause the AWI is now able to use different external tool, like aspell, LanguageTool and After the DeadLine. In addition, interface with others program can be "easily" done, by adding necessary functions in new abstracts objects. And it is the same for translation, we can easily use an other program than Apertium, by modifying the translate Object... but it's not the purpose here.

Provide more formatting modules; currently only ODF, OOXML, html and text are supported. Mediawiki (using apertium-mediawiki) and others are wanted.[edit]

(Commit 30984) includes:

* Correction of some PHP end of files (?>\n => ?>)
* Correction of markers '.[]' in format.php
* Correction of markers '[\n]'

The support of RTF files has been added.

The lines returns in file are correctly treated.

Current files treatment : Apertium-AWI file treatment 06 32.jpg

(Commit 30998) includes:

* Adding the support of Mediawiki format (with extension .mediawiki)
* Adding the display of extensions supported in home page(index.php)
* The list of this kind of extensions is set in config.php

(Commit 31002) includes:

* Adding the support of PDF format, using pdftohtml and wkhtmltopdf

Pdf management :

Apertium-AWI pdf management 06 24.png

What is see in AWI(firefox 4 on Ubuntu) : Apertium-AWI AWIforPDF 06 24.png

(Commit 31010) includes:

* The temp\/APRE... , due to the title of html document (<TITLE></TITLE>) set automatically by pdftohtml, has been removed.

(Commit 31107) includes:

* Adding the management of image file, using Tesseract as OCR and 'convert' for picture conversion (set in includes/config.php).

(Commit 31109) includes:

* Update the list of supported extensions (add jpeg, jpg, tif, tiff, png).

Pictures management :

Apertium-AWI pictures management 06 27.png

Next step : Try to use the export HTML fonctionnality (jointly with html module ) ? Integrates file treatment for users with Javascript deactivated ?

Localisation, make it possible to translate the interface into different languages[edit]

(Commit 31121) includes:

* Allow the user to change the interface language
* Put in place the system to managed this changement
* Put in place the system to integrate these changements with module
* Change index page to allow it to use translation
* Write the english, french translation for index page
* Write the english, french translation for modules FormattedDocumentHandling, SpellGrammarChecking
* Put in place the system for detect and retains the user language choice

(Commit 31163) includes:

* Adding the support of interface translation for all current modules
* Adding the support of interface translation for translate.php
* Write the french translation of all current module
* Write the french translation of translate.php

How it is organised : Apertium-AWI Interface multi-languages 06 28.png


README content:

  1. Apertium - AWI : interface multi-languages ####

Contributed By Mougey Camille <commial@gmail.com>

for Google Summer of Code 2011

Mentors : Arnaud Vié, Luis Villarejo


Summary --- Summary --- I/ Directory Organisation

II/ How to managed interface translation



I/ Directory Organisation



In the root (./templates/), there are this file(README)

and the file "avalaible_languages".


The README explains how the directory is organised, and

how managed interface translation.

The "avalaible_languages" lists languages avalaible,

ie languages for which translation exists.


For each avalaible language, there is a directory named

by the language.

In this directory, there are some file:

index - Contains translation for the home page,

in a specific order

translate - Contains translation for the translate page,

in a specific order

module_name - Contains translation for the module name,

ie the file for the module Logs is

module_Logs, in a specific order


The correspondence between the order in the file, and

the signification, is write in the file

../includes/formats.php: function retrieve_info

For example, a module file is organised as:

name

description

text for button_in

text for button_out



II/ How to managed interface translation



The text is displayed in page by the call of the function

get_text(page, information requested), or

write_text(page, information requested). For example, the

title of the home page is get by

get_text('index', 'title');

If you want to add text in page, or add page, you have to

write the correspondence in the file format.php.

In this example, the correspondence is

index->title : first line of the file index


If you want to add a language, for example the espanol,

you have to write all file corresponding, who contain

translation, in a directory es/, ie

es/index, es/translate, es/module_Logs, ...

and add the name of language in ./avalaible_languages.


To modify a language, you just have to open the

adequate file, write your own translation, and save.

If you want to modify the description of the Logs module,

open en/module_Logs and change the second line.


Do not forget to commit your changes, to allow the

apertium community to use your work.


For any problem/question, you can contact me, or go on

the Apertium IRC Channel (#apertium).

(Commit 31164) includes:

* The file README explains how the directory is organised, and how you can set, change current translation.

Should add new languages, but I haven't the level in other languages than French.

Improve overall design[edit]

(Commit 31204) includes:

* Modify the current design of page index.php and translate.php

The Apertium website will pass on WordPress system, so the overall design will be taken. But for now, I make a design similar to the Apertium current website ( in colour, simplicity, .. ).

Index page: Apertium-AWI Index 06 29.png

Translate page: Apertium-AWI Translate 06 29.png

(Older version)

Make it possible to input a TMX to help for a translation[edit]

(Commit 31413) includes:

* Add the management of external TMX files in translate object
* Add in the interface the possibility to specify which external TMX files use
* Modify Javascript files to allow the modification of TMX files on the fly

(Commit 31414) includes:

* Manage the case of pretrans(manual replacement) + use of an external TMX file
* Better filter on external TMX file name (local file, URL)

TMX integration : Apertium-AWI TMX integration 07 05.png

Provide a module to use the apertium.org web service instead of a local Apertium installation[edit]

(Commit 31490) includes:

* Add the possibility to use Apertium.org translation system instead of the local server
* Modify the translate object to grab the list of avalaible language and display it on the interface
* Show a checkbox to easily switch between local translation and translation by Apertium.org
* Modify the translate object to generate a POST request to the Apertium.org server
* Modify the javascript main.js (and his template main.tmpl) to keep the current interface
* Allow no-javascript user to use this option too
* Add the config variable apertiumorg_homeurl to set the URL of Apertium.org

(Commit 31491) includes:

* Small reorganisation of includes/language.php's code

How it works : Apertium-AWI Apertium.ORG integration 07 07 11.png

The user can now easily switch, just by checked/unchecked a case, between local translation system and Apertium.org translation system.

The list of available language is automatically update, and nothing seems change for the user in the interface. The click on the button translate text will work with AJAX, and ask the server defined in includes/config.php: apertiumorg_homeurl for the translation.

All of the others buttons of the interface still working, like the Check For Mistakes, ...

Just the Input TMX become useless.

(Commit 31496) includes:

* The streamer on index.php wasn't correctly displayed

(Commit 31502) includes:

* Add an image during the load of an ajax request
* Display it below language pair selection
* Display it/Hide it in ajax.js

(Commit 31519) includes:

* Add the file management on index.php (using Apertium.org website)
* Add apertiumorg_docurl and apertiumorg_traddoc in includes/config.php
* Add checkbox in index.php to easily activate/deactivate the use of Apertium.org

The file management via Apertium.org is now integrated.

(Commit 31560) includes:

* Replace the button for switch between local installation and Apertium.Org by a line in includes/config.php: use_apertiumorg
* Remove the form from apertiumORG in index.php (file management)

(Commit 31578) includes:

* Divide the translate object into 2 object and an abstract class
* The first PHP Object is Translate_ApertiumORG, includes/translation_apertiumorg.php
* The second is Translate_Apertium, include/translation_apertium.php
* Remove the javascript management of 'check/uncheck' for the use of Apertium local server or Apertium.org
* Adapt includes/language.php, translate.php to use the new PHP Abstract Object 'Translate'

How it's defined : Apertium-AWI translate Object 07 08 11.png

(Commit 31988) includes:

* Fix a few bugs in apertiumorg translate PHP Object
* Remove useless config lines

(Commit 32007) includes:

* TMServer boundary comes random
* Translate_apertium.ORG PHP Object: Send the input as an HTML formatted file

Creates a server-side TM database[edit]

Creation of a server named TMServer, which aims to host Translation Unit. It's constructed to allow several Translation Memory standards, but only TMX is implemented for now.

(Commit 31757) includes:

* Create repositories needs to TMServer
* Add the base of TMServer
* Add the design and design index.php
* Add the TMmanage abstract object
* Add the TMmanage_TMX php object
* Write functions to returns avalaible TMX files
* Write functions to returns avalaible TMX language pairs
* Write the management of language pairs in index.php
* Add an exemple to test the parsing db/sample.tmx
* Add functions for link with the system (includes/system.php)
* Add a config file (includes/config.php)

(Commit 31793) includes:

* Add the abstract add_TM function in TMmanage PHP Object
* Add the add_TM function in TMmanage_TMX PHP Object
* Add the management of multi-TMX (using TMXMerge)
* Add a function to generate a TMX file
* Add a function to manage the current file in database (db/): which file write, which file merge
* Add a parser, able to detect the source language, the Translation Memory segment, the different target language
* Fix some security issues
* Modify the home page to allow the user to add a TMX file

How it's work : Apertium-AWI TMServer 07 12 11.png

(Commit 31817) includes:

* Add functions for Web API

(Commit 31820) includes:

* Add functions for Web API

There is now a server able to manage TMX files. A developer can easily add an other format by adding PHP Object.

In addition, there is a few method to retrieve easily language pair list(to avoid a preg_match research) and the link to the TMX file(same reasons).

Make a link with existing server-side TMX database[edit]

So that the memory generated after a translation be stored and reused automatically for next translations in this language pair.

(Commit 31827) includes:

* Create a PHP Object TMServer (includes/TMServer.php) which do the link with a TMServer server
* Add in config.php the choice of type of Translation Memory Server, and the choice of the URL of the server
* Add in translate.php the possibility to easily get the language pair list avalaible, load a TMX file from the TMServer and add a translation memory unit in the TMServer
* Write the design part

The interface for loading a TMX file from the TMServer looks like: ApertiumAwi Input TMServer 07 13 11.png

And for adding a TMX file to the TMServer (here with the success message): ApertiumAwi Output TMServer 07 13 11.png

In includes/config.php, the webmaster can choose the type of server (currently just TMServer is supported), and the server's URL.

So, the TMServer is now correctly integrated with the PostEditing-tool interface, the user can easily load TMX from the server, and add his own to the server.

(Commit 31888) includes:

* Fix a bug: If TMServer was loaded, the user could not use other URL than the server
* Add the possibility for the user to use TMServer, URL or local FILE (upload)

(Commit 32028) includes:

* Passing the interface for loading TMX file from translate.php to index.php
* Add a correct detection of what kind of method is used for add TMX file (URL, TMServer, local file) (no more radio button)
* Transfer the content of the TMX file instead of the link of the TMX file (some problem with temporary files)
* Pass the TMX file content in argument for AJAX translate request

Make it faster and cross-browser compatible[edit]

(Commit 32014) includes:

* Reorganizing translate.php (indentation, functions,..)

(Commit 32118) includes:

* Add french/english translation for index.php
* Add french/english translation for translate.php
* Fix a bug with the display of post manual replacement
* Fix XHTML bugs (option: checked => selected; 'ul' issues)

1/ I run W3C CSS and W3C XHTML validation on index.php, translate.php. Errors which still exists doesn't influence the compatibility between browser, apart the 'contenteditable' attribute. In fact, this attribute comes with HTML 5, so just recent browser support it. Old browsers cannot use the Check For Mistakes function, but they can easily translate some texts/documents. This is the main issue.


2/ To quickly see the aspect in a lot of browser, I've used [browsershot]. The interface seems work in every browser, even in lynx!(console browser).


3/ I've tested all functions in popular browsers. The website seems work perfectly. I've tested:

On Ubuntu 10.10:

Firefox (up-to-date)
Chrome browser (up-to-date)
Opera (up-to-date)

On Windows XP:

Firefox (up-to-date)
IE 6.0: "Check for mistakes" doesn't work.

4/ The YUI Compressor, launch with publish.php script, make faster the page rendering (See the part on bug fixing above).

(Commit 32143) includes:

* Fix few bug for non Javascript user, there was some problem with the scope of variable $spell and $grammar
* ajax.php included everytime gramproof.php, instead of modules.php. This is now corrected.
* The site works now for non-Javascript browser, and old browser.

Create a script to easily configure config.php and download external packages[edit]

[Commit 32167] includes:

* Create the install script.
* Add some automatically tests and recommendations.

[Commit 32168] includes:

* Add a better system for recommendation. Just recommendation to do are displayed.

[Commit 32177] includes:

* Add new tests
* Change the display of recommendations and test (add colour to easily distinguish change to do)
* Add the structure to manage available configuration
* Display the current configuration with editable menu
* Function to get new value
* Change the way of testing the existence of a command (use whereis)

[Commit 32222] includes:

* Add new tests
* Add functions to change includes/config.php and write the new config

[Commit 32229] includes:

* Add a security issue (password to set) in install.php to avoid a bad use of the script

Install script structure: Apertium-AWI install script 07 28 11.png

I've tried to maintain a structure in this script, to allow everybody to easily add new test, recommendations, modules, and config variable.

Next step: Add the possibility to download external packages

Fix possibly remaining bugs and optimize[edit]

(Commit 31206) includes:

* In system.php:75, out1 and out2 were undefined
* In translate.php:260,261,262 , input_doc, input_doc_type, input_doc_name were undefined
* Resize images/yes.png and images/no.png from 800x800 to 40x40 to speed-up page loading (Win 2 * 44,7Ko, Google speed-page (58/100 -> 87/100))
* Reorganize script order to allow the parallelization of the load of CSS and Javascript (CSS has to be before Javascript)
* Set correctly the label of <select> in includes/template.php, to allow Chrome browser to show 'fr', 'en' and not 'new_lang'
* In includes/template.php, HTTP_ACCEPT_LANGUAGE can be undefined
* The charset UTF-8 was not well defined (bad meta tags)
* The onChange attribute has to be in the <select>, not the <form>, and has to be write onchange (lower)
* The style 'language_select' is use 2 times, use 'class' instead of 'id'
* '< center >' tag is not any more in the XHTML Strict 1.0, nor Transitional 1.0. style='text-align: center;' is use at the place

(Commit 31212) includes:

* Fix an infinite loop in logs.js:1150

(Commit 31234) includes:

* Add a script for put the AWI in place(compress/uncompress  CSS/JS files, generate JS files from templates, delete itself)
* Gain of around 30KB of data to download, and a faster analyse (translate.php : 99/100 on Google Page Speed)

Description of publish.php : Apertium-AWI Publish script 06 30.png

(Commit 31239) includes:

* Getting index.php W3C Valid
* Compress with no lose images/yes.png and images/no.png

(Commit 31242) includes:

* If the server allows register_globals, a malicious user have a shell access, by using source_language, target_language, through checkforMistakes, through GetGrammarCorrection or GetSpellCorrection, and then executeCommand. So, Add some verification in includes/gramproof.php, includes/translation.php
* If you make a request POST on the file translate.php, you can have a shell access. Fix this issue.

(Commit 31271) includes:

* Fix includes/gramproof.php: checkforMistakes, bad filter

(Commit 31274) includes:

* Fix a bad dependencies for textEditor.js

(Commit 31280) includes:

* Fix the XSS issue in ajax.php

(Commit 31377) includes:

* Fix the checkForMistakes error with hr tags. Now, the input_text is divided into text1(hr tag)text2 ... and text1, text2 are treated separately. 
* Modify the security test in ajax.php (a case was treat 2 times)

(Commit 31378) includes:

* Fix checkForMistakes. Simple text chain was not correctly treated.

(Commit 31379) includes:

* Fix the TMX generation

(Commit 31747) includes:

* Remove line returns at php file's end
* Fix a security issue in includes/translation.php

(Commit 32373) includes:

* Add link to download external packages in install.php
* Add an image (extern.png) for these links
* Fix 2 Warning, (file_get_contents, undefined var)
* Remove Button for no-javascript user, in index.php and translate.php
* Change the interface of index.php to make it more easier, and add some javascript
* Change the way that translate.php manage the input from index.php (none, file, wiki)
* Change some translations to fix langages mistakes, and remove unused translation

(Commit 32431) includes:

* Change the interface of index.php
* Add a few javascript effect (javascript/index.js)
* Modify some translation (templates/en/index, fr/index)
* Modify CSS style of translation button (CSS/style.css)

(Commit 32432) includes:

* Fix a few bug on index.php

(Commit 32525) includes:

* Fix a problem of display in index.php
* Add JavaScript to install.php to disable innapropriate field
* Add test to detect the presence of LanguageTool
* Add function to generate the list of supported format
* Add more link to download external packages (with apt link for Debian/Ubuntu users)

(Commit 32596) includes:

* Change README
* Change INSTALL
* Add the possibility to automatically download external package with install.php

(Commit 33476) includes:

* Update INSTALL File

Current Jobs[edit]