Difference between revisions of "ScaleMT"

From Apertium
Jump to navigation Jump to search
 
(45 intermediate revisions by 10 users not shown)
Line 1: Line 1:
  +
{{Github-unmigrated-tool}}
  +
 
=Introduction=
 
=Introduction=
   
This is the wiki page of '''ScaleMT''', a scalable architecture to provide translation web services based on Apertium and other machine translation engines.
+
This is the wiki page of '''ScaleMT''', a scalable architecture to provide translation web services based on Apertium and other machine translation engines. It is based on previous works to develop an Apertium web service: [[Apertium_scalable_service]]. The web service has two different APIs: XML-RPC, a lightweight remote procedure call method using XML and HTTP, and JSON REST, that allows you to easily use the service from any website.
   
  +
=Architecture=
=API=
 
   
  +
ScaleMT makes the translation engines more efficient by turning them into ''[[daemon]]s'' (that is, processes running in the background rather than under the interaction of a user). Besides that, it is able to run on multiple servers thanks to an algorithm which decides which daemons should run on each server and a load balancing method that decides which server should process each request. ScaleMT consists of two main Java applications:
==API Key==
 
  +
*'''ScaleMTSlave''' runs on a machine with the translation engine installed and manages a set of running translation engine instances (''daemons''); it performs the requested translations by sending them to the right daemon.
  +
*'''ScaleMTRouter''' (request router) runs on a web server; it processes the translation requests and sends them to the right ScaleMTSlave instance.
   
  +
[[Image:Architecture.png|Architecture]]
Machine translation web service usage is limited to a certain amount of requests per IP. If you want to overcome, please register and get an API key. This way we can track where our traffic come from. Registered users enjoy a more generous traffic limit.
 
   
  +
=Downloading=
To register, please visit:
 
   
  +
The source code can be downloaded from our subversion repository:
http://api.apertium.org/register.jsp
 
   
Once the registration process is finished, you will receive an API key that must be included in all the requests you perform.
 
 
==JSONP AJAX API==
 
 
This API is very similar to Google AJAX Language API to make as easy as possible switching to Apertium JSON API.
 
For more information about Google AJAX Language API, see http://code.google.com/intl/en/apis/ajaxlanguage/documentation/reference.html#_intro_fonje .
 
 
There are two resources:
 
<pre>
 
http://api.apertium.org/json/translate
 
</pre>
 
and
 
 
<pre>
 
<pre>
  +
svn co https://svn.code.sf.net/p/apertium/svn/trunk/scaleMT
http://api.apertium.org/json/listPairs
 
 
</pre>
 
</pre>
   
  +
=Fast set up=
The first one translates pieces of plain text or html code, and the second one lists the available language pairs.
 
   
  +
==Compiling==
Both resources admit GET and POST http methods. The value of arguments must be properly escaped (e.g., via the functional equivalent of Javascript's encodeURIComponent() method).
 
  +
To compile the source code you'll need:
  +
* A Java Development Kit compatible with Java version 6. It can be Sun/Oracle's implementation or any other implementation that follows the specification (see [http://en.wikipedia.org/wiki/Java_Development_Kit#Other_JDKs]).
  +
* Maven 2. If you don't have Maven installed, simply [http://maven.apache.org/download.html download] it, unzip it, and be sure that the ''bin'' directory is in your PATH.
   
  +
Once you are sure you have Java JDK and Maven, you can compile the three projects you have downloaded:
===Common arguments and response format===
 
   
These arguments are all optional and common to both resources:
 
* '''key''' : User's personal API key. Requests from registered users have higher priority.
 
* '''callback''' : Alters the response format, adding a call to a Javascript function. See description below.
 
* '''context''' : If callback parameter is supplied too, adds additional arguments to the function call. See description below.
 
 
If nor callback neither context arguments are supplied, this is the JSON object returned by both resources:
 
 
<pre>
 
<pre>
  +
cd ScaleMTRMIInterfaces
{ "responseData" : JSON Object with the requested data , "responseStatus" : Response numeric code , "responseDetails" : Error description }
 
  +
mvn install
 
</pre>
 
</pre>
 
If callback argument is supplied, a call to a function named by the callback value is returned. For instance, if callback argument's value is ''foo'',
 
this is the JavaScript code returned:
 
 
<pre>
 
<pre>
  +
cd ScaleMTSlave
foo({ "responseData" : JSON Object with the requested data , "responseStatus" : Response numeric code , "responseDetails" : Error description })
 
  +
mvn package
 
</pre>
 
</pre>
 
If both callback and context arguments are supplied, the returned function call has more arguments. If callback's value is ''foo'' and context's value
 
is '''bar'':
 
 
<pre>
 
<pre>
  +
cd ScaleMTRouter
foo('bar',JSON Object with the requested data , Response numeric code , Error description )
 
  +
mvn package
 
</pre>
 
</pre>
   
  +
==Configuring==
===listPairs resource===
 
   
  +
===Router===
This resource only accepts the common arguments.
 
  +
Then, configure ''ScaleMTRouter''. To do so, unzip the file ''ScaleMTRouter.war'', present in ''ScaleMTRouter/target'' to whichever directory you want:
 
The response data returned is an array of language pairs, following this format:
 
 
<pre>
 
<pre>
  +
cd ScaleMTRouter/target
[{"sourceLanguage": source language code ,"targetLanguage": target language code }, ... ]
 
  +
unzip ScaleMTRouter.war -d /tmp/mywar/
 
</pre>
 
</pre>
   
  +
Go to the directory where you unzipped the file and open the file WEB-INF/classes/configuration.properties with your favourite editor. Change the value of the property ''requestrouter_rmi_host'' to the public name of the computer where you are going to run the request router. If you are going to run the router with only a slave on the same machine, you don't need to change this property.
responseStatus is always 200, that means the request was processed OK, and responseDetails has a null value.
 
   
  +
Now, zip the file again:
So if we call this resource with no arguments:
 
 
<pre>
 
<pre>
  +
cd /tmp/mywar
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/listPairs'
 
  +
zip -r ScaleMTRouter.war *
</pre>
 
we get, for example:
 
<pre>
 
{"responseData":[{"sourceLanguage":"ca","targetLanguage":"oc"},{"sourceLanguage":"en","targetLanguage":"es"}],"responseDetails":null,"responseStatus":200}
 
 
</pre>
 
</pre>
   
===translate resource===
+
===Slave===
  +
To install the ''ScaleMTSlave'' instances, you have to repeat these steps for each machine you want to act as slave:
  +
  +
Unzip the the file ''ScaleMTSlave-1.0-assembled.zip'', present in ''ScaleMTSlave/target'' to
  +
the installation directory you have chosen.
   
  +
Run the script that installs Apertium. Be sure that the machine has Internet connection, because the installation script will download Apertium from its SVN repository.
This resource accepts the common arguments mentioned above, plus the following specific arguments:
 
* '''q''' : Source text or HTML code to be translated. Compulsory argument.
 
* '''langpair''' : Source language code and target language code, separated by '|' character, which is escaped as '%7C'. Compulsory argument.
 
* '''format''' : Source format. ''txt'' for plain text and ''html'' for HTML code. This argument is optional. If this argument is missing it is assumed that source is plain text.
 
* '''markUnknown''' : ''yes'' for placing an asterisk next to each unknown word, ''no'' for not placing it. This argument is optional. If this argument is missing, an asterisk is placed next to each unknown word.
 
   
  +
Then run the script ''installApertiumAndPairs.sh'' with:
The response data is JSON object following this format:
 
 
<pre>
 
<pre>
  +
./installApertiumAndPairs.sh
{ "translatedText" : translated text }
 
 
</pre>
 
</pre>
   
  +
or
Many different response status codes can be returned. This is the list with all the codes and their meaning:
 
* '''200''' : Text has been translated successfully, ''responseDetails'' field is null.
 
* '''400''' : Bad parameters. A compulsory argument is missing, or there is an argument with wrong format. A more accurate description can be found in ''responseDetails'' field.
 
* '''451''' : Not supported pair. The translation engine can't translate with the requested language pair.
 
* '''452''' : Not supported format. The translation engine doesn't recognize the requested format.
 
* '''500''' : Unexpected error. An unexpected error happened. Depending on the error, a more accurate description can be found in ''responseDetails'' field.
 
* '''552''' : The traffic limit for your IP or your user has been reached.
 
   
Here is a simple example. Requesting a translation with:
 
 
<pre>
 
<pre>
  +
bash installApertiumAndPairs.sh
curl 'http://api.apertium.org/json/translate?q=hello%20world&langpair=en%7Ces&callback=foo'
 
</pre>
 
the result is:
 
<pre>
 
foo({"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200})
 
</pre>
 
And if we add the context parameter:
 
<pre>
 
curl 'http://api.apertium.org/json/translate?q=hello%20world&langpair=en%7Ces&callback=foo&context=a'
 
</pre>
 
we get
 
<pre>
 
foo('a',{"translatedText":"hola Mundo"},200,null)
 
 
</pre>
 
</pre>
   
  +
By default it will download and install Apertium and all the stable pairs, and install them under /home/youruser/local. You can change these this options with the following parameters:
===Batch interface===
 
  +
* '''-p''' Installation_prefix : Changes the installation prefix. If you run the script with the options ''-p /foo/bar'' it will install executables under /foo/bar/bin, libraries under /foo/bar/lib, etc.
  +
* '''-l''' pair1,pair2,pair3... : Installs only the specified language pairs. The list of pairs must be a subset of the list of stable pairs that can be found in [http://wiki.apertium.org/wiki/Main_Page Apertium wiki main page]. Note that the language order must be the same that the one in main page, although translators in both ways will be installed, e.g. ''-p en-es'' will install translators from Spanish to English and from English to Spanish, but ''-p es-en'' won't install any translator. There are pairs that only install a translator in one way, see the arrows in Main page.
   
  +
When installation is complete, you can safely remove ''apertium'' directory. ''ScaleMTSlave'' can't work with an existing Apertium installation, because it modifies Apertium modes files.
More than one translation can be performed in the same request if we use more than one ''q'' argument or more than one ''langpair''. If there is only one
 
''q'' argument and more than one ''langpair'' arguments, the same input string is translated with different language pairs. If there is only one ''langpair'' argument and more than one ''q'' arguments, the different input strings are translated with the same language pair. And if both arguments are supplied more than one time, and they are repeated exactly the same times, the first ''q'' is translated with the first ''langpair'', the second ''q'' with the second ''langpair'', etc.
 
   
  +
Once Apertium is installed, it's time to change the last configuration files. Edit ''ScaleMTSlave-1.0/conf/configuration.properties'' and change the value of the property ''requestrouter_host'' to the public host name of the machine where ScaleMTRouter will run. Additionally, you also have to edit ''ScaleMTSlave-1.0/conf/translation-engines.xml''. If you didn't install all the stable language pairs with the installation script, remove from the ''<pairs>'' section all the language pairs that you didn't install. Finally, in the ''<pipeline>'' replace ''/usr/local'' with the Apertium installation prefix you have chosen.
The returned JSON changes a bit when using the batch interface. Now the field ''responseData'' contains an array of JSON objects, each one with the usual fields: ''responseData'', ''responseStatus'' and ''responseDetails''. Note that we have particular values of ''responseStatus'' and ''responseDetails'' for each translation, but global values too. If all the translation are OK, these values match, but if there is an error in any translation, global values of these fields take the value of the erroneous translation. If there is more than one erroneous translation, global fields take the value of one the the erroneus translations.
 
   
  +
==Running==
These examples show the described behaviour:
 
<pre>
 
curl 'http://api.apertium.org/json/translate?q=hello%20world&q=bye&langpair=en%7Ces'
 
</pre>
 
<pre>
 
{"responseData":[{"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200},
 
{"responseData":{"translatedText":"adiós"},"responseDetails":null,"responseStatus":200}],"responseDetails":null,"responseStatus":200}
 
</pre>
 
   
  +
Firstly, run <pre>rmiregistry 1098</pre> on the machine where you are going to run ''ScaleMTRouter'' to start ''rmiregistry'' . Then run ''ScaleMTRouter'' by deploying your re-zipped ''ScaleMTRouter.war'' in your Java web server. For example, in Apache Tomcat, put that file in the directory called ''webapps''.
   
  +
'''Give link to how to set up Tomcat(?)'''
<pre>
 
curl 'http://api.apertium.org/json/translate?q=hello%20world&langpair=en%7Ces&langpair=en%7Cca&callback=foo'
 
</pre>
 
<pre>
 
foo({"responseData":[{"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200},
 
{"responseData":{"translatedText":"Món d'hola"},"responseDetails":null,"responseStatus":200}],"responseDetails":null,"responseStatus":200})
 
</pre>
 
 
   
  +
Then, run ''ScaleMTSlave'' on each of the servers you want to use to perform translations. Use the script ''run-apertium-server.sh'' and add a parameter with the name of the host where ''ScaleMTSlave'' runs:
 
<pre>
 
<pre>
  +
bash run-scaleMT.sh hostname
curl 'http://api.apertium.org/json/translate?q=hello%20world&q=goodbye&langpair=en%7Ces&langpair=en%7Cca&callback=foo&context=bar'
 
 
</pre>
 
</pre>
  +
The first time you run it, it will calculate the server's capacity by performing a series of translations and store it in ''conf/capacity.properties''. If you want the system to calculate the capacity each time it starts, use the argument -''reCalculateCapacity'':
 
<pre>
 
<pre>
  +
bash run-scaleMT.sh hostname -reCalculateCapacity
foo('bar',[{"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200},
 
{"responseData":{"translatedText":"adéu"},"responseDetails":null,"responseStatus":200}],200,null)
 
 
</pre>
 
</pre>
  +
After reading or calculating capacity, it contacts ''ScaleMTRouter'' and starts to receive translation requests.
  +
Of course, servers can be stopped (with Ctrl+C) or started at any time.
   
  +
The JSON services will be available at:
==Javascript library==
 
 
If you plan to use the JSON AJAX API from a Javascript client (that's the use the API is intended for), we have a Javascript library that will make that task easier. It is very similar to the Google AJAX Language API one:
 
 
http://code.google.com/intl/en/apis/ajaxlanguage/documentation/reference.html#GlobalMethods
 
 
To use it, firstly include the library code with:
 
 
<pre>
 
<pre>
  +
http://router_machine_host:web_server_port/ScaleMTRouter/json/listPairs
<script type="text/javascript" src="http://api.apertium.org/JSLibrary.jsp?key=YOURAPIKEY"></script>
 
  +
http://router_machine_host:web_server_port/ScaleMTRouter/json/translate
 
</pre>
 
</pre>
   
  +
And the XML-RPC one at:
Replace YOURAPIKEY with the actual value of your API key. If you don't have an API key, it is not necessary to include the query parameter ''key''.
 
 
It will import an object called '''apertium'''. These are the operations supported by this object:
 
 
{| class="wikitable" border="1"
 
|-
 
! Method
 
! Return type
 
! Description
 
|-
 
|apertium.translate(content,sourceLang,targetLang,callback)
 
| None
 
| Translates the given content. It is an asynchronous function: instead of directly returning the translation, the callback function is called when the translation is available.
 
* content: Can be a simple string, which means that the source is treated as plain text, or an object with two fields:
 
** type: the format of the text to be translated. Its values are defined in the object apertium.ContentType. Currently only plan text and HTML are supported.
 
** text: simple string containing the txt or html code to be translated.
 
* sourceLang: Source language code, e.g. "es", "en", ...
 
* targetLang: Target language code, e.g. "es", "en", ...
 
* callback: Function to be called when the translation is available. It receives one argument: the translation result. It is an object with the following fields:
 
** error. Optional field, only present if there has been any problem with the translation. It has two fields:
 
*** code: Error code. The value described previously: [[ScaleMT#translate_resource]].
 
*** message: Description of the error in English.
 
|-
 
| apertium.isTranslatablePair(sourceLang,targetLang)
 
| Boolean
 
| Returns true if the translation engine can translate the given language pair. Returns false otherwise. Arguments:
 
* sourceLang: Source language code, e.g. "es", "en", ...
 
* targetLang: Target language code, e.g. "es", "en", ...
 
|-
 
| apertium.isTranslatable(sourceLang)
 
| Boolean
 
| Returns true if the translation engine can translate from the given source language to any target language. Returns false otherwise. Arguments:
 
* sourceLang: Source language code, e.g. "es", "en", ...
 
|-
 
| apertium.getSourceLanguages()
 
| Array
 
| Returns an array containing the language codes of all the supported source languages.
 
|-
 
| apertium.getTargetLanguages(sourceLang)
 
| Array
 
| Returns an array containing the language codes of all the target languages supported for the given source language.
 
* sourceLang: Source language code, e.g. "es", "en", ...
 
|-
 
| apertium.getSupportedLanguagePairs()
 
| Array
 
| Returns an array containing the codes of all the supported language pairs. The array is made of objects. Each object contains the following fields:
 
* sourceLanguage: Source language code, e.g. "es", "en", ...
 
* targetLanguage: Target language code, e.g. "es", "en", ...
 
|}
 
 
Object apertium.ContentType:
 
<pre>
 
apertium.ContentType = {
 
'TEXT' : 'txt',
 
'HTML' : 'html'
 
};
 
</pre>
 
 
==XML-RPC API==
 
 
XML-RPC is a remote procedure call protocol which uses XML to encode its calls and HTTP as a transport mechanism. The (quite simple) specification can be browsed at http://www.xmlrpc.com/spec.
 
 
There are plenty of libraries for different programming languages that you can use to call the service:
 
 
http://en.wikipedia.org/wiki/XML-RPC#Implementations
 
 
To perform a remote call to the service, firstly you need to tell to the library which is the proxy URL (also known as endpoint). It is:
 
 
 
<pre>
 
<pre>
http://api.apertium.org/xmlrpc
+
http://router_machine_host:web_server_port/ScaleMTRouter/xmlrpc
 
</pre>
 
</pre>
   
  +
If you are using Apache Tomcat, and wish to test the service locally, probably you could loac this URL to get the list of available language pairs:
Then you can call methods if you know the object that provides them and theirs name. The object name is:
 
 
<pre>
 
<pre>
  +
http://localhost:8080/ScaleMTRouter/json/listPairs
service
 
 
</pre>
 
</pre>
  +
For more informacion about the API, check http://wiki.apertium.org/wiki/Apertium_scalable_service (used to live at http://api.apertium.org but that's for [[Apy]] now).
   
  +
=Wish-list=
And this is the list of supported methods:
 
  +
* Fix apertium is-en
  +
* Estimate the capacity of a server with any language pair; not only es-ca.
  +
* Moses support.
  +
* Optimize the processing carried out by a daemon: avoid regular expressions, creation of unused directories, etc.
  +
* Explicitly define duplicated daemons. In some situations it is useful to have multiple daemons for the same language pairs to be able to perform multiple translations of big texts simultaneously.
  +
* Currently ScaleMT always sends the text to the translation engine encoded in UTF-8. Add a configuration parameter to choose the text encoding.
   
  +
=References=
{| class="wikitable" border="1"
 
  +
* "ScaleMT: a free/open-source framework for building scalable machine translation web services". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. Open Source Tools for Machine Translation, MT Marathon 2010, Dublin, Ireland, 2010. The Prague Bulletin of Mathematical Linguistics 93, p. 97-106. [[http://www.dlsi.ua.es/~japerez/pub/pdf/mtmarathon2010-scalemt.pdf pdf]]
|-
 
  +
* "An open-source highly scalable web service architecture for the Apertium machine translation engine". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. First International Workshop on Free/Open-Source Rule-Based Machine Translation, Alicante, Spain, 2009, p. 51-58. [[http://rua.ua.es/dspace/bitstream/10045/12030/1/paper7.pdf pdf]]
! Method
 
! Return type
 
! Description
 
|-
 
| service.translate(string text,string format,string sourceLang,string targetLang,string key)
 
| string
 
| Returns the translation of the given text.
 
* text: source text to be translated.
 
* format: the format of the text to be translated. Use the value "txt" for plain text and "html" for HTML pages or fragments.
 
* sourceLang: Source language code, e.g. "es", "en", ...
 
* targetLang: Target language code, e.g. "es", "en", ...
 
* key: your API key. If you don't have an API key, use an empty string.
 
If there is any error, the standard XML-RPC error response is returned. See section ''Fault example'' from http://www.xmlrpc.com/spec . The error codes returned have the same meaning as the ones defined in [[ScaleMT#translate_resource]].
 
|-
 
| service.translateDocument(base64 sourceDocument,string format,string sourceLang,string targetLang,string key)
 
| base64
 
| Returns the translation of the given document. This method is very similar to the previous one but intended to translate rtf and odt documents.
 
* sourceDocument: document to be translated, treated as a binary file.
 
* format: the format of the document to be translated. Use the value "odt" for ODT documents and "rtf" for RTF documents.
 
* sourceLang: Source language code, e.g. "es", "en", ...
 
* targetLang: Target language code, e.g. "es", "en", ...
 
* key: your API key. If you don't have an API key, use an empty string.
 
|-
 
| service.getSupportedLanguagePairs()
 
| array
 
| Returns an array containing the codes of all the supported language pairs. The array is made of struct elements. Each struct contains the following fields:
 
* sourceLanguage: Source language code, e.g. "es", "en", ...
 
* targetLanguage: Target language code, e.g. "es", "en", ...
 
|}
 
   
  +
[[Category:Tools]]
Important:
 
At the moment not all XML-RPC the clients work with this service. We are returning a HTTP Redirect response code and some clients aren't able to deal with it. We are working to solve this problem.
 

Latest revision as of 02:20, 9 March 2018

Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.

Introduction[edit]

This is the wiki page of ScaleMT, a scalable architecture to provide translation web services based on Apertium and other machine translation engines. It is based on previous works to develop an Apertium web service: Apertium_scalable_service. The web service has two different APIs: XML-RPC, a lightweight remote procedure call method using XML and HTTP, and JSON REST, that allows you to easily use the service from any website.

Architecture[edit]

ScaleMT makes the translation engines more efficient by turning them into daemons (that is, processes running in the background rather than under the interaction of a user). Besides that, it is able to run on multiple servers thanks to an algorithm which decides which daemons should run on each server and a load balancing method that decides which server should process each request. ScaleMT consists of two main Java applications:

  • ScaleMTSlave runs on a machine with the translation engine installed and manages a set of running translation engine instances (daemons); it performs the requested translations by sending them to the right daemon.
  • ScaleMTRouter (request router) runs on a web server; it processes the translation requests and sends them to the right ScaleMTSlave instance.

Architecture

Downloading[edit]

The source code can be downloaded from our subversion repository:

svn co https://svn.code.sf.net/p/apertium/svn/trunk/scaleMT

Fast set up[edit]

Compiling[edit]

To compile the source code you'll need:

  • A Java Development Kit compatible with Java version 6. It can be Sun/Oracle's implementation or any other implementation that follows the specification (see [1]).
  • Maven 2. If you don't have Maven installed, simply download it, unzip it, and be sure that the bin directory is in your PATH.

Once you are sure you have Java JDK and Maven, you can compile the three projects you have downloaded:

cd ScaleMTRMIInterfaces
mvn install
cd ScaleMTSlave
mvn package
cd ScaleMTRouter
mvn package

Configuring[edit]

Router[edit]

Then, configure ScaleMTRouter. To do so, unzip the file ScaleMTRouter.war, present in ScaleMTRouter/target to whichever directory you want:

cd ScaleMTRouter/target
unzip ScaleMTRouter.war -d /tmp/mywar/

Go to the directory where you unzipped the file and open the file WEB-INF/classes/configuration.properties with your favourite editor. Change the value of the property requestrouter_rmi_host to the public name of the computer where you are going to run the request router. If you are going to run the router with only a slave on the same machine, you don't need to change this property.

Now, zip the file again:

cd /tmp/mywar
zip -r ScaleMTRouter.war *

Slave[edit]

To install the ScaleMTSlave instances, you have to repeat these steps for each machine you want to act as slave:

Unzip the the file ScaleMTSlave-1.0-assembled.zip, present in ScaleMTSlave/target to the installation directory you have chosen.

Run the script that installs Apertium. Be sure that the machine has Internet connection, because the installation script will download Apertium from its SVN repository.

Then run the script installApertiumAndPairs.sh with:

./installApertiumAndPairs.sh

or

bash installApertiumAndPairs.sh

By default it will download and install Apertium and all the stable pairs, and install them under /home/youruser/local. You can change these this options with the following parameters:

  • -p Installation_prefix : Changes the installation prefix. If you run the script with the options -p /foo/bar it will install executables under /foo/bar/bin, libraries under /foo/bar/lib, etc.
  • -l pair1,pair2,pair3... : Installs only the specified language pairs. The list of pairs must be a subset of the list of stable pairs that can be found in Apertium wiki main page. Note that the language order must be the same that the one in main page, although translators in both ways will be installed, e.g. -p en-es will install translators from Spanish to English and from English to Spanish, but -p es-en won't install any translator. There are pairs that only install a translator in one way, see the arrows in Main page.

When installation is complete, you can safely remove apertium directory. ScaleMTSlave can't work with an existing Apertium installation, because it modifies Apertium modes files.

Once Apertium is installed, it's time to change the last configuration files. Edit ScaleMTSlave-1.0/conf/configuration.properties and change the value of the property requestrouter_host to the public host name of the machine where ScaleMTRouter will run. Additionally, you also have to edit ScaleMTSlave-1.0/conf/translation-engines.xml. If you didn't install all the stable language pairs with the installation script, remove from the <pairs> section all the language pairs that you didn't install. Finally, in the <pipeline> replace /usr/local with the Apertium installation prefix you have chosen.

Running[edit]

Firstly, run

rmiregistry 1098

on the machine where you are going to run ScaleMTRouter to start rmiregistry . Then run ScaleMTRouter by deploying your re-zipped ScaleMTRouter.war in your Java web server. For example, in Apache Tomcat, put that file in the directory called webapps.

Give link to how to set up Tomcat(?)

Then, run ScaleMTSlave on each of the servers you want to use to perform translations. Use the script run-apertium-server.sh and add a parameter with the name of the host where ScaleMTSlave runs:

bash run-scaleMT.sh hostname

The first time you run it, it will calculate the server's capacity by performing a series of translations and store it in conf/capacity.properties. If you want the system to calculate the capacity each time it starts, use the argument -reCalculateCapacity:

bash run-scaleMT.sh hostname -reCalculateCapacity

After reading or calculating capacity, it contacts ScaleMTRouter and starts to receive translation requests. Of course, servers can be stopped (with Ctrl+C) or started at any time.

The JSON services will be available at:

http://router_machine_host:web_server_port/ScaleMTRouter/json/listPairs
http://router_machine_host:web_server_port/ScaleMTRouter/json/translate

And the XML-RPC one at:

http://router_machine_host:web_server_port/ScaleMTRouter/xmlrpc

If you are using Apache Tomcat, and wish to test the service locally, probably you could loac this URL to get the list of available language pairs:

http://localhost:8080/ScaleMTRouter/json/listPairs

For more informacion about the API, check http://wiki.apertium.org/wiki/Apertium_scalable_service (used to live at http://api.apertium.org but that's for Apy now).

Wish-list[edit]

  • Fix apertium is-en
  • Estimate the capacity of a server with any language pair; not only es-ca.
  • Moses support.
  • Optimize the processing carried out by a daemon: avoid regular expressions, creation of unused directories, etc.
  • Explicitly define duplicated daemons. In some situations it is useful to have multiple daemons for the same language pairs to be able to perform multiple translations of big texts simultaneously.
  • Currently ScaleMT always sends the text to the translation engine encoded in UTF-8. Add a configuration parameter to choose the text encoding.

References[edit]

  • "ScaleMT: a free/open-source framework for building scalable machine translation web services". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. Open Source Tools for Machine Translation, MT Marathon 2010, Dublin, Ireland, 2010. The Prague Bulletin of Mathematical Linguistics 93, p. 97-106. [pdf]
  • "An open-source highly scalable web service architecture for the Apertium machine translation engine". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. First International Workshop on Free/Open-Source Rule-Based Machine Translation, Alicante, Spain, 2009, p. 51-58. [pdf]