Difference between revisions of "ScaleMT"

From Apertium
Jump to navigation Jump to search
 
(17 intermediate revisions by 9 users not shown)
Line 1: Line 1:
  +
{{Github-unmigrated-tool}}
  +
 
=Introduction=
 
=Introduction=
   
Line 5: Line 7:
 
=Architecture=
 
=Architecture=
   
ScaleMT makes the translation engines more efficient by turning them into ''daemons'' (that is, processes running in the background rather than under the interaction of a user). Besides that, it is able to run on multiple servers thanks to an algorithm which decides which daemons should run on each server and a load balancing method that decides which server should process each request. ScaleMT consists of two main Java applications:
+
ScaleMT makes the translation engines more efficient by turning them into ''[[daemon]]s'' (that is, processes running in the background rather than under the interaction of a user). Besides that, it is able to run on multiple servers thanks to an algorithm which decides which daemons should run on each server and a load balancing method that decides which server should process each request. ScaleMT consists of two main Java applications:
 
*'''ScaleMTSlave''' runs on a machine with the translation engine installed and manages a set of running translation engine instances (''daemons''); it performs the requested translations by sending them to the right daemon.
 
*'''ScaleMTSlave''' runs on a machine with the translation engine installed and manages a set of running translation engine instances (''daemons''); it performs the requested translations by sending them to the right daemon.
 
*'''ScaleMTRouter''' (request router) runs on a web server; it processes the translation requests and sends them to the right ScaleMTSlave instance.
 
*'''ScaleMTRouter''' (request router) runs on a web server; it processes the translation requests and sends them to the right ScaleMTSlave instance.
Line 16: Line 18:
   
 
<pre>
 
<pre>
svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/scaleMT
+
svn co https://svn.code.sf.net/p/apertium/svn/trunk/scaleMT
 
</pre>
 
</pre>
   
Line 23: Line 25:
 
==Compiling==
 
==Compiling==
 
To compile the source code you'll need:
 
To compile the source code you'll need:
* A Java Development Kit compatible with Java version 6. It can be Sun's implementation or any other implementation that follows the specification (see [http://en.wikipedia.org/wiki/Java_Development_Kit#Other_JDKs]).
+
* A Java Development Kit compatible with Java version 6. It can be Sun/Oracle's implementation or any other implementation that follows the specification (see [http://en.wikipedia.org/wiki/Java_Development_Kit#Other_JDKs]).
* Maven. If you don't have Maven installed, simply [http://maven.apache.org/download.html download] it, unzip it, and be sure that the ''bin'' directory is in your PATH.
+
* Maven 2. If you don't have Maven installed, simply [http://maven.apache.org/download.html download] it, unzip it, and be sure that the ''bin'' directory is in your PATH.
   
 
Once you are sure you have Java JDK and Maven, you can compile the three projects you have downloaded:
 
Once you are sure you have Java JDK and Maven, you can compile the three projects you have downloaded:
Line 64: Line 66:
 
the installation directory you have chosen.
 
the installation directory you have chosen.
   
Run the script that install Apertium. Be sure that the machine has Internet connection, because the installation script will download Apertium from its SVN repository.
+
Run the script that installs Apertium. Be sure that the machine has Internet connection, because the installation script will download Apertium from its SVN repository.
   
 
Then run the script ''installApertiumAndPairs.sh'' with:
 
Then run the script ''installApertiumAndPairs.sh'' with:
Line 89: Line 91:
 
Firstly, run <pre>rmiregistry 1098</pre> on the machine where you are going to run ''ScaleMTRouter'' to start ''rmiregistry'' . Then run ''ScaleMTRouter'' by deploying your re-zipped ''ScaleMTRouter.war'' in your Java web server. For example, in Apache Tomcat, put that file in the directory called ''webapps''.
 
Firstly, run <pre>rmiregistry 1098</pre> on the machine where you are going to run ''ScaleMTRouter'' to start ''rmiregistry'' . Then run ''ScaleMTRouter'' by deploying your re-zipped ''ScaleMTRouter.war'' in your Java web server. For example, in Apache Tomcat, put that file in the directory called ''webapps''.
   
  +
'''Give link to how to set up Tomcat(?)'''
   
 
Then, run ''ScaleMTSlave'' on each of the servers you want to use to perform translations. Use the script ''run-apertium-server.sh'' and add a parameter with the name of the host where ''ScaleMTSlave'' runs:
 
Then, run ''ScaleMTSlave'' on each of the servers you want to use to perform translations. Use the script ''run-apertium-server.sh'' and add a parameter with the name of the host where ''ScaleMTSlave'' runs:
Line 105: Line 108:
 
http://router_machine_host:web_server_port/ScaleMTRouter/json/listPairs
 
http://router_machine_host:web_server_port/ScaleMTRouter/json/listPairs
 
http://router_machine_host:web_server_port/ScaleMTRouter/json/translate
 
http://router_machine_host:web_server_port/ScaleMTRouter/json/translate
<pre>
+
</pre>
   
 
And the XML-RPC one at:
 
And the XML-RPC one at:
 
<pre>
 
<pre>
 
http://router_machine_host:web_server_port/ScaleMTRouter/xmlrpc
 
http://router_machine_host:web_server_port/ScaleMTRouter/xmlrpc
<pre>
+
</pre>
   
 
If you are using Apache Tomcat, and wish to test the service locally, probably you could loac this URL to get the list of available language pairs:
 
If you are using Apache Tomcat, and wish to test the service locally, probably you could loac this URL to get the list of available language pairs:
 
<pre>
 
<pre>
 
http://localhost:8080/ScaleMTRouter/json/listPairs
 
http://localhost:8080/ScaleMTRouter/json/listPairs
<pre>
+
</pre>
For more informacion about the API, check http://api.apertium.org
+
For more informacion about the API, check http://wiki.apertium.org/wiki/Apertium_scalable_service (used to live at http://api.apertium.org but that's for [[Apy]] now).
  +
  +
=Wish-list=
  +
* Fix apertium is-en
  +
* Estimate the capacity of a server with any language pair; not only es-ca.
  +
* Moses support.
  +
* Optimize the processing carried out by a daemon: avoid regular expressions, creation of unused directories, etc.
  +
* Explicitly define duplicated daemons. In some situations it is useful to have multiple daemons for the same language pairs to be able to perform multiple translations of big texts simultaneously.
  +
* Currently ScaleMT always sends the text to the translation engine encoded in UTF-8. Add a configuration parameter to choose the text encoding.
   
 
=References=
 
=References=
 
* "ScaleMT: a free/open-source framework for building scalable machine translation web services". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. Open Source Tools for Machine Translation, MT Marathon 2010, Dublin, Ireland, 2010. The Prague Bulletin of Mathematical Linguistics 93, p. 97-106. [[http://www.dlsi.ua.es/~japerez/pub/pdf/mtmarathon2010-scalemt.pdf pdf]]
 
* "ScaleMT: a free/open-source framework for building scalable machine translation web services". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. Open Source Tools for Machine Translation, MT Marathon 2010, Dublin, Ireland, 2010. The Prague Bulletin of Mathematical Linguistics 93, p. 97-106. [[http://www.dlsi.ua.es/~japerez/pub/pdf/mtmarathon2010-scalemt.pdf pdf]]
 
* "An open-source highly scalable web service architecture for the Apertium machine translation engine". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. First International Workshop on Free/Open-Source Rule-Based Machine Translation, Alicante, Spain, 2009, p. 51-58. [[http://rua.ua.es/dspace/bitstream/10045/12030/1/paper7.pdf pdf]]
 
* "An open-source highly scalable web service architecture for the Apertium machine translation engine". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. First International Workshop on Free/Open-Source Rule-Based Machine Translation, Alicante, Spain, 2009, p. 51-58. [[http://rua.ua.es/dspace/bitstream/10045/12030/1/paper7.pdf pdf]]
  +
  +
[[Category:Tools]]

Latest revision as of 02:20, 9 March 2018

Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.

Introduction[edit]

This is the wiki page of ScaleMT, a scalable architecture to provide translation web services based on Apertium and other machine translation engines. It is based on previous works to develop an Apertium web service: Apertium_scalable_service. The web service has two different APIs: XML-RPC, a lightweight remote procedure call method using XML and HTTP, and JSON REST, that allows you to easily use the service from any website.

Architecture[edit]

ScaleMT makes the translation engines more efficient by turning them into daemons (that is, processes running in the background rather than under the interaction of a user). Besides that, it is able to run on multiple servers thanks to an algorithm which decides which daemons should run on each server and a load balancing method that decides which server should process each request. ScaleMT consists of two main Java applications:

  • ScaleMTSlave runs on a machine with the translation engine installed and manages a set of running translation engine instances (daemons); it performs the requested translations by sending them to the right daemon.
  • ScaleMTRouter (request router) runs on a web server; it processes the translation requests and sends them to the right ScaleMTSlave instance.

Architecture

Downloading[edit]

The source code can be downloaded from our subversion repository:

svn co https://svn.code.sf.net/p/apertium/svn/trunk/scaleMT

Fast set up[edit]

Compiling[edit]

To compile the source code you'll need:

  • A Java Development Kit compatible with Java version 6. It can be Sun/Oracle's implementation or any other implementation that follows the specification (see [1]).
  • Maven 2. If you don't have Maven installed, simply download it, unzip it, and be sure that the bin directory is in your PATH.

Once you are sure you have Java JDK and Maven, you can compile the three projects you have downloaded:

cd ScaleMTRMIInterfaces
mvn install
cd ScaleMTSlave
mvn package
cd ScaleMTRouter
mvn package

Configuring[edit]

Router[edit]

Then, configure ScaleMTRouter. To do so, unzip the file ScaleMTRouter.war, present in ScaleMTRouter/target to whichever directory you want:

cd ScaleMTRouter/target
unzip ScaleMTRouter.war -d /tmp/mywar/

Go to the directory where you unzipped the file and open the file WEB-INF/classes/configuration.properties with your favourite editor. Change the value of the property requestrouter_rmi_host to the public name of the computer where you are going to run the request router. If you are going to run the router with only a slave on the same machine, you don't need to change this property.

Now, zip the file again:

cd /tmp/mywar
zip -r ScaleMTRouter.war *

Slave[edit]

To install the ScaleMTSlave instances, you have to repeat these steps for each machine you want to act as slave:

Unzip the the file ScaleMTSlave-1.0-assembled.zip, present in ScaleMTSlave/target to the installation directory you have chosen.

Run the script that installs Apertium. Be sure that the machine has Internet connection, because the installation script will download Apertium from its SVN repository.

Then run the script installApertiumAndPairs.sh with:

./installApertiumAndPairs.sh

or

bash installApertiumAndPairs.sh

By default it will download and install Apertium and all the stable pairs, and install them under /home/youruser/local. You can change these this options with the following parameters:

  • -p Installation_prefix : Changes the installation prefix. If you run the script with the options -p /foo/bar it will install executables under /foo/bar/bin, libraries under /foo/bar/lib, etc.
  • -l pair1,pair2,pair3... : Installs only the specified language pairs. The list of pairs must be a subset of the list of stable pairs that can be found in Apertium wiki main page. Note that the language order must be the same that the one in main page, although translators in both ways will be installed, e.g. -p en-es will install translators from Spanish to English and from English to Spanish, but -p es-en won't install any translator. There are pairs that only install a translator in one way, see the arrows in Main page.

When installation is complete, you can safely remove apertium directory. ScaleMTSlave can't work with an existing Apertium installation, because it modifies Apertium modes files.

Once Apertium is installed, it's time to change the last configuration files. Edit ScaleMTSlave-1.0/conf/configuration.properties and change the value of the property requestrouter_host to the public host name of the machine where ScaleMTRouter will run. Additionally, you also have to edit ScaleMTSlave-1.0/conf/translation-engines.xml. If you didn't install all the stable language pairs with the installation script, remove from the <pairs> section all the language pairs that you didn't install. Finally, in the <pipeline> replace /usr/local with the Apertium installation prefix you have chosen.

Running[edit]

Firstly, run

rmiregistry 1098

on the machine where you are going to run ScaleMTRouter to start rmiregistry . Then run ScaleMTRouter by deploying your re-zipped ScaleMTRouter.war in your Java web server. For example, in Apache Tomcat, put that file in the directory called webapps.

Give link to how to set up Tomcat(?)

Then, run ScaleMTSlave on each of the servers you want to use to perform translations. Use the script run-apertium-server.sh and add a parameter with the name of the host where ScaleMTSlave runs:

bash run-scaleMT.sh hostname

The first time you run it, it will calculate the server's capacity by performing a series of translations and store it in conf/capacity.properties. If you want the system to calculate the capacity each time it starts, use the argument -reCalculateCapacity:

bash run-scaleMT.sh hostname -reCalculateCapacity

After reading or calculating capacity, it contacts ScaleMTRouter and starts to receive translation requests. Of course, servers can be stopped (with Ctrl+C) or started at any time.

The JSON services will be available at:

http://router_machine_host:web_server_port/ScaleMTRouter/json/listPairs
http://router_machine_host:web_server_port/ScaleMTRouter/json/translate

And the XML-RPC one at:

http://router_machine_host:web_server_port/ScaleMTRouter/xmlrpc

If you are using Apache Tomcat, and wish to test the service locally, probably you could loac this URL to get the list of available language pairs:

http://localhost:8080/ScaleMTRouter/json/listPairs

For more informacion about the API, check http://wiki.apertium.org/wiki/Apertium_scalable_service (used to live at http://api.apertium.org but that's for Apy now).

Wish-list[edit]

  • Fix apertium is-en
  • Estimate the capacity of a server with any language pair; not only es-ca.
  • Moses support.
  • Optimize the processing carried out by a daemon: avoid regular expressions, creation of unused directories, etc.
  • Explicitly define duplicated daemons. In some situations it is useful to have multiple daemons for the same language pairs to be able to perform multiple translations of big texts simultaneously.
  • Currently ScaleMT always sends the text to the translation engine encoded in UTF-8. Add a configuration parameter to choose the text encoding.

References[edit]

  • "ScaleMT: a free/open-source framework for building scalable machine translation web services". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. Open Source Tools for Machine Translation, MT Marathon 2010, Dublin, Ireland, 2010. The Prague Bulletin of Mathematical Linguistics 93, p. 97-106. [pdf]
  • "An open-source highly scalable web service architecture for the Apertium machine translation engine". Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz. First International Workshop on Free/Open-Source Rule-Based Machine Translation, Alicante, Spain, 2009, p. 51-58. [pdf]