Difference between revisions of "User:Commial/AWI"
Line 180: | Line 180: | ||
What is see in AWI(firefox 4 on Ubuntu) : |
What is see in AWI(firefox 4 on Ubuntu) : |
||
[[File:Apertium-AWI AWIforPDF 06 24.png]] |
[[File:Apertium-AWI AWIforPDF 06 24.png]] |
||
The temp\/APRE... is due to the title of html document (<TITLE></TITLE>) set automatically by pdftohtml. |
|||
Next step : Provide a module using Tesseract ( who is able to recognize multiple languages and maintain a basic layout ) for Pictures. The export HTML fonctionnality can be used (jointly with html module ). |
Next step : Provide a module using Tesseract ( who is able to recognize multiple languages and maintain a basic layout ) for Pictures. The export HTML fonctionnality can be used (jointly with html module ). |
Revision as of 12:53, 24 June 2011
APERTIUM : Improvements to postedition interface Google Summer Of Code 2011
Contents
Jobs already done
Port the code for all recent php versions
I have replace all file_put_contents by fopen, fwrite, fclose, which are supported by older php versions. (Commit 30690)
Rewrite the PHP & Javascript as separate modules so that it be easy to decide which tools to enable or disable in the interface
The file "modules.php" is designed to manage modules. It includes :
Configuration of modules :
* Module structure : * name : Module's name * description : Short description of the module * default : Recommanded module * javascript : an array of javascript dependencies (with path javascript/) * php : an array of php dependencies (with path includes/) * button_in : HTML code for the interface, under the input, when the module is activated * button_out : HTML code for the interface, under the output, when the module is activated
For example : 'SpellGrammarChecking':
'name' => 'Spell and Grammar checking', 'description' => 'Integrate the ability to check both input and output texts for mistakes, with a button “Check for mistakes”. When pressed, it runs spell checking and grammar checking on the text and underlines mistakes in different colours (red for spelling, blue for grammar).', 'default' => TRUE, 'javascript' => array('textEditor.js', 'main.js'), 'php' => array('gramproof.php', 'strings.php', 'system.php'), 'button_in' => '<input type="submit" name="check_input" value="Check for mistakes" />', 'button_out' => '<input type="submit" name="check_output" value="Check for mistakes" />'
It includes too :
Functions for modules management :
* Check dependencies of a module * Load PHP dependencies of a module * Write Interface of a module * Load/Unload a module * List of module loaded, list of recommended modules * Write the list of Javascript libraries needed by loaded modules
How I see the current separations of modules : (Concerning PHP files)
The current interface for Load/Unload Modules (in Firefox 4, Ubuntu) :
(Commit 30719) includes :
* Creation of module structure * Creation of the content and form to easily Load/Unload modules * Cut existing files to make independent modules (the module 'Logs' isn't currently independent) * Modification of the interface display to allow the Add/Remove of buttons * Few changes to make the code more readable (repetitive code)
(Commit 30724) includes :
* Creation of gramproof.js, which contains javascript functions used by the module Spell checking and Grammar checking * Creation of logs.js, which contains javascript functions used by the module Logs * Some functions in textEditor.js are now in gramproof.js and logs.js * Functions of nodes.js, paste_event.js, logging_lowlevel.js and logging.js are now in logs.js (gain in number of browser request) * The module Logs is now independent
How I see the current separations of modules in javascript :
(Commit 30738) includes :
* Correction of an error in gramproof.js. The function displaySuggestionsList was defined two time.
(Commit 30792) includes :
* Creation of main.tmpl, textEditor.tmpl, which are template files for main.js and textEditor.js * Update index.php to launch the rewrite of main.js and textEditor.js when modules are load/unload * Add the function which rewrite file using template (named gen_templateJS(source, target)), in modules.php * Displacement of "is_installed" function, which is a function for the environment, not the translation core
An example of how the function gen_templateJS works :
(Commit 30819) includes :
* Modules to load are now set by the web-master, in config.php * The home page show which modules are loaded, and their description * The generation of main.js, textEditor.js has to be launched by the web-master, by uncommented a region in modules.php * Modules loaded are retains in a global variable $modules_load, not Sessions anymore
(Commit 30925) includes :
* translate.php has to use global variable for php translate Object($trans), not Sessions
All modules are now separate, and it's easy to add, load/unload, alter and remove module.
Rewrite language.php file as an abstract script, and interface modules for Apertium, Aspell and LanguageTool
"language.php" has been separate in 2 parts : environment management and translation system.
The translation system is managed by a PHP Object named "translate", define in includes/translation.php. It's instantiated in language.php.
The link with external tool such as LanguageTool, Aspell, After the DeadLine is now made by an object named "externtool", define in includes/externaltool.php.
The webmaster can choose, in includes/config.php, which tool he wants to use, for spell checking and grammar proofing. He hasn't to do any other change, all is managed by the object, which provide generic functions SpellChecking and GrammarProofing.
It's instantiated in gramproof.php (to conserve the modularity).
(Commit 30827) includes:
* Cut language.php in 2 part: environment management and translation system * Create a PHP Object named translate to managed the translation system * Create a PHP Object named externtool to managed the link with external tool, and provide generic functions SpellChecking and GrammarProofing * Modify translate.php, ajax.php to use translate Object * Modify grampr
oof.php to use externtool Object
* Modify config.php to include the choice of external tool
(Commit 30903) includes:
* After the DeadLine is now support thanks to functions in externtool Object
Functions getATDresult, analyseATDresult and getATDCorrection allow the externtool Object to communicate with After the DeadLine.
The steps are :
- Getting spell or grammar error from local ATD server, or online server (service.afterthedeadline.com, set in config.php: $ATD_link), in language en, fr, pt, de, es (define in externaltool.php: $ATDsupport_lang) or en by default - Analyse the result (as an XML Object), to create an object correctly formed for gramproof module functions. It means detecting suggestions, text to replace, description, and start and end of the string in the text(may this last part can be do more rigorously, but I don't know how). - Pass the result to the Object output, and so to gramproof module.
(Commit 30990) includes:
* Split spell checking and grammar proofing into spelling.php, grammar.php * Make 2 differents PHP Astract Object, named Spelling and Grammar * Write Abstract class for the use of LanguageTool(grammar_LT.php), Aspell(spelling_aspell.php) and AfterTheDeadLine(gramproof_ATD.php) * Remove externtool object, and use this new object instead in gramproof.php
If you want to try, a good example is "I has probl". Normally, there and a grammar mistake and a spell mistake. You can try with aspell-LanguageTool, ATD-LanguageTool, ATD-ATD or aspell-ATD, if you want to compare programs results.
This part can be considerate as finish, cause the AWI is now able to use different external tool, like aspell, LanguageTool and After the DeadLine. In addition, interface with others program can be "easily" done, by adding necessary functions in new abstracts objects. And it is the same for translation, we can easily use an other program than Apertium, by modifying the translate Object... but it's not the purpose here.
Current Jobs
Provide more formatting modules; currently only ODF, OOXML, html and text are supported. Mediawiki (using apertium-mediawiki) and others are wanted.
(Commit 30984) includes:
* Correction of some PHP end of files (?>\n => ?>) * Correction of markers '.[]' in format.php * Correction of markers '[\n]'
The support of RTF files has been added.
The lines returns in file are correctly treated.
(Commit 30998) includes:
* Adding the support of Mediawiki format (with extension .mediawiki) * Adding the display of extensions supported in home page(index.php) * The list of this kind of extensions is set in config.php
(Commit 31002) includes:
* Adding the support of PDF format, using pdftohtml and wkhtmltopdf
Pdf management :
What is see in AWI(firefox 4 on Ubuntu) :
The temp\/APRE... is due to the title of html document (<TITLE></TITLE>) set automatically by pdftohtml.
Next step : Provide a module using Tesseract ( who is able to recognize multiple languages and maintain a basic layout ) for Pictures. The export HTML fonctionnality can be used (jointly with html module ). Integrates file treatment for users with Javascript deactivated ?