Difference between revisions of "Apertium scalable service"

From Apertium
Jump to navigation Jump to search
(Redirected page to Apertium-apy)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
#redirect[[apertium-apy]]
=Introduction=
 
 
This is the wiki page of '''ApertiumScalableServer''', a scalable architecture to provide translation web services based on Apertium.
 
 
=User manual=
 
 
==System architecture==
 
 
There are two main applications that make the web service work:
 
 
* '''ApertiumServerRouter''': Runs on a JavaEE web container (like [http://tomcat.apache.org/ Apache Tomcat]) and processes the HTTP translation requests. Spreads them between the different translation servers (that have Apertium installed). It also manages the different Apertium daemons running on the translation servers and, under certain circumstances, can start and stop translation servers.
 
 
* '''ApertiumServerSlave''' : It's a simple Java application that runs on the translation servers. These servers must have Apertium installed. Receives translation requests from ''ApertiumServerRouter'' and sends them to the running Apertium instances. Note that the system is designed to run many ''ApertiumServerSlave'' instances (one per server) and only one ''ApertiumServerRouter'' instance.
 
 
==Getting it==
 
 
At the moment, the only way to get the applications is downloading its source code and compiling them. You'll need to download the source code of three projects from the Apertium svn repository. Before executing the following command, be sure you have [http://subversion.tigris.org/ Subversion] installed.
 
<pre>
 
svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-scalable-service
 
</pre>
 
 
To compile the source code you'll need:
 
* A Java Development Kit compatible with Java version 6. It can be Sun's implementation or any other implementation that follows the specification (see [http://en.wikipedia.org/wiki/Java_Development_Kit#Other_JDKs]).
 
* Maven. If you don't have Maven installed, simply [http://maven.apache.org/download.html download] it, unzip it, and be sure that the ''bin'' directory is in your PATH.
 
 
Once you are sure you have Java JDK and Maven, you can build the applications.
 
* Build ''ApertiumServerRMIInterfaces''. This project contains the common classes of ''ApertiumServerSlave'' and ''ApertiumServerRouter'':
 
<pre>
 
cd ApertiumServerRMIInterfaces
 
mvn install
 
</pre>
 
* Build ''ApertiumServerSlave'':
 
<pre>
 
cd ApertiumServerSlave
 
mvn package
 
</pre>
 
The compiled project can be found in target/ApertiumServerSlave-1.0-assembled.zip
 
* Build ''ApertiumServerRouter''
 
<pre>
 
cd ApertiumServerRouter
 
mvn package
 
</pre>
 
The compiled project can be found in target/ApertiumServerRouter.war
 
 
* If you need the javadoc of any of the projects, from its root directory execute:
 
<pre>
 
mvn javadoc:javadoc
 
</pre>
 
And the javadoc website will be generated in target/site/apidocs
 
 
==Installing==
 
 
===ApertiumServerSlave===
 
 
Unzip ''ApertiumServerSlave-1.0-assembled.zip'' to the directory where you want to install it. Be sure that the machine has Internet connection, because the installation script will download Apertium from its SVN repository.
 
 
Then run the script ''installApertiumAndPairs.sh'' with:
 
<pre>
 
./installApertiumAndPairs.sh
 
</pre>
 
 
or
 
 
<pre>
 
bash installApertiumAndPairs.sh
 
</pre>
 
 
By default it will download and install Apertium and all the stable pairs, and install them under /home/youruser/local. You can change these this options with the following parameters:
 
* '''-p''' Installation_prefix : Changes the installation prefix. If you run the script with the options ''-p /foo/bar'' it will install executables under /foo/bar/bin, libraries under /foo/bar/lib, etc.
 
* '''-l''' pair1,pair2,pair3... : Installs only the specified language pairs. The list of pairs must be a subset of the list of stable pairs that can be found in [http://wiki.apertium.org/wiki/Main_Page Apertium wiki main page]. Note that the language order must be the same that the one in main page, although translators in both ways will be installed, e.g. ''-p en-es'' will install translators from Spanish to English and from English to Spanish, but ''-p es-en'' won't install any translator. There are pairs that only install a translator in one way, see the arrows in Main page.
 
 
When installation is complete, you can safely remove ''apertium'' directory. ''ApertiumServerSlave'' can't work with an existing Apertium installation, because it modifies Apertium modes files to make it run as a daemon.
 
 
===ApertiumServerRouter===
 
 
As this application is packaged as a ready-to-deploy war file, there is no need to installation. To run it simply follow the instructions of your Java web container. But before running it, you'll probably need to configure it.
 
 
==Configuring==
 
 
===ApertiumServerSlave===
 
 
Application options can be changed by editing ''INSTALLATION_DIRECTORY/conf/configuration.properties''. These are the options that can be changed and their meaning:
 
* '''requestrouter_host''': Name of the host where ''ApertiumServerRouter'' is running. When this application starts, it contacts ''ApertiumServerRouter'' to tell that the server is ready to perform translations. '''This is the only property you'll need to change to make the system work'''.
 
* '''requestrouter_port''': Port of ''requestrouter_host'' on which rmiregistry is listening. Default value is 1098.
 
* '''requestrouter_objectname''': Name of the RMI object exported by ''ApertiumSeverRouter''. If you don't modify it in''ApertiumSeverRouter'' 's configuration, the default value is OK.
 
* '''memoryrate_64bit''': It is known that programs generally need more memory in 64-bit operative systems than on 32-bit ones. If the application is running on a 64-bit operative system, its free memory is multiplied by the value of this property. The default value is 0.6087. It is not recommended to change it. See the calibration section to know how to change this value.
 
* '''daemon_frozen_time''': If an Apertium instance doesn't emit any output during this time (in milliseconds), having received an input, we assume it is frozen. The default value, 20 seconds should be OK. Change it only if the system reports false frozen daemons.
 
*'''daemon_check_status_period''': Daemon status checking period, in milliseconds. A very low period can cause system overload, so there is no need to change this value.
 
* '''apertium_timeout''': Maximum time, in milliseconds, Apertium can take to perform a translation. If this time is exceeded, an error is returned to ''ApertiumServerRouter''. Its default value is very high, so timeouts are only reported when there are unexpected errors.
 
* '''apertium_max_deformat''': Maximum number of simultaneously running Apertium deformatters. To tranlate a text, first it is deformatted launching an instance of the corresponding apertium deformatter (text deformatter or html deformatter), then it is sent to the right daemon, and finally, the daemon result is reformatted launching an instance of the corresponding apertium reformatter. The system's bottleneck is in the daemons, so the default value for this property is 1.
 
* '''apertium_max_reformat''': Maximum number of simultaneously running Apertium reformatters. The default value is 1.
 
* '''apertium_null_mode_suffix''': Suffix that all the modes that allow Apertium running as a daemon share. Don't change it.
 
* '''apertium_supported_pairs''': Comma-separated list of language pairs the system can translate with (because they can work as daemons). In this case the first code is the source language and the second code, the target language. So, we'll have both ''en-es'' and ''es-en''. Don't modify this property. Its value is set by the installation script described above.
 
* '''apertium_path''': Prefix of the directories where Apertium is installed. Don't modify this property. Its value is set by the installation script described above. If you change this value to point to an existing Apertium installation, it won't work, because the Apertium installatin needs to be made with the provided installation script, that creates new modes files.
 
 
===ApertiumServerRouter===
 
 
Editing ''ApertiumServerRouter'' properties is a bit more difficult. You'll need to unzip ''ApertiumServerRouter.war'', change the desired configuration properties and zip its content again. Main configuration options are located in file ''WEB-INF/classes/configuration.properties''. These
 
are the options present in this file:
 
* '''requestrouter_rmi_host''': Name of the host where ''ApertiumServerRouter'' will run. '''This is the only property you'll need to change to make the system work'''.
 
* '''rmi_registry_port''': Port on which ''rmiregistry'' is listening. Default value is 1098, so you'll need to manually start ''rmiregistry'' on port 1098. Remember that ''rmiregistry'' must run on the machine where ''ApertiumServerRouter'' runs, as well as on machines running ''ApertiumServerSlave''. The difference is that ''ApertiumServerSlave'' starts RMIRegistry automatically, but ''ApertiumServerRouter'' doesn't, because of the restrictions of running in a Java web container.
 
* '''requestrouter_rmi_name''': Name of the RMI remote object exported by ''ApertiumServerRouter''. The default value is OK if you don't modify the ''requestrouter_objectname'' property of ''ApertiumServerSlave''.
 
* '''requestrouter_rmi_port''': Port on which RMI remote object exported by ''ApertiumServerRouter'' will listen. There is no need to modify it, unless you get an exception saying "port not available".
 
* '''admissioncontrol_interval''': Period, in milliseconds, of Admission control updating. Admission control is the subsystem that decides whether a request should be accepted or not, depending on system's load. Don't change this value unless you really know what you are doing.
 
* '''admissioncontrol_treshold''': If system "calculated load" is over this threshold, requests won't be accepted. The default value has been tested and should work OK, but if requests are rejected while the system is not overloaded, try to increase this value.
 
* '''admissioncontrol_k''': We get "calculated load" by combining real load and "calculated load" in the previous instant: calculated_load = real_load*k+previous_load*(1-k). The default value have been tested and it is not recommended to change it.
 
* '''placement_controller_execution_period''': Period, in milliseconds, of Placement controller execution. Placement controller decides which language pairs run on each translation server. This is a critic value. Changing it could make the system crash, so it is better to leave the default value.
 
* '''server_status_updater_execution_period''': Period, in milliseconds, of server status checking. It is recommended to leave the default value.
 
* '''scheduler_maxcharacters_in_daemon_queue''': If the number of characters of a language pair being translated by a server is lower than this value. a translation request of that language pair is sent to the server.It is recommended to leave the default value.
 
* '''scheduler_maxelements_in_daemon_queue''': If the number of request of a language pair being translated by a server is lower than this value. a translation request of that language pair is sent to the server.It is recommended to leave the default value.
 
* '''scheduler_not_registered_priority_increment''': The higher, the less priority unregistered users have.
 
* '''scheduler_timeout''': Maximum time, in milliseconds, a server can take to perform a translation. If this time is exceeded, an error is returned. Its default value is very high, so timeouts are only reported when there are unexpected errors.
 
* '''load_prediction_alpha''': It is very similar to admission control k. The predicted load of the different language pairs is calculated by combining the amount (and size) of requests received during a period of time, and the predicted load before this period, so predicted_load = measured_load*alpha+previous_prediceted_load*(1-alpha). Default value has been tested and it is not recommended to change it.
 
* '''request_k''': Constant CPU cost of processing a request. The CPU cost of a translation request is calculated by adding this value to the number of characters of the request. Don't change it.
 
 
To keep track of registered users and give them higher priority, their data are stored in a MySQL database. Database connection properties are configured in [http://java.sun.com/javaee/technologies/persistence.jsp?intcmp=3282 JPA] configuration file:''WEB-INF/classes/META-INF/persistence.xml''. By default, it connects to a database called '''ApertiumWSUsers''' on localhost, with username '''apertium''' and password '''apertium'''.
 
 
===Port summary===
 
 
Be sure these ports are reachable, since they are needed by the system to work.
 
 
Machine running ''ApertiumServerRouter'':
 
* RMIRegistry port. By default, it is 1098. If you want to use another port for running RMIRegistry, change the property ''rmi_registry_port''.
 
* RMI remote object port. The port where the object that communicates with ''ApertiumServerSlave'' instances listens. By default it is 1432, but can be changed by editing the property ''requestrouter_rmi_port''.
 
* HTTP port. The port which the web server listens to.
 
 
Machine running ''ApertiumServerSlave'':
 
* RMIRegistry port: 1099.
 
* RMI remote object port. The port where the object that communicates with ''ApertiumServerRouter'' listens. By default it is 1331, but it can be changed with the option ''-RMIPort <port-number>'' when running ''ApertiumServerSlave''.
 
 
===Logging===
 
 
Both applications use [http://logging.apache.org/log4j/1.2/index.html Apache log4j] to manage application messages. ''ApertiumServerRouter'' 's log messages are stored in ''/tmp/ApertiumServerRouter.log'' and ''ApertiumServerSlave'''s ones in ''/tmp/ApertiumServerSlave.log''. The name of these files, along with many other logging properties can be changed editing the configuration file ''log4j.properties''.
 
 
==Running==
 
Firstly, run <pre>rmiregistry 1098</pre> on the machine where you are going to run ''ApertiumServerRouter'' to start ''rmiregistry'' . Then run ''ApertiumServerRouter'' by deploying your re-zipped ''ApertiumServerRouter.war'' in your Java web server. For example, in Apache Tomcat, put that file in the directory called ''webapps''.
 
 
 
Then, run ''ApertiumServerSlave'' on each of the servers you want to use to perform translations. Use the script ''run-apertium-server.sh'' and add a parameter with the name of the host where ''ApertiumServerSlave'' runs:
 
<pre>
 
bash run-apertium-server.sh hostname
 
</pre>
 
It will calculate the server's capacity by performing a series of translations and store it in ''conf/capacity.properties''. If you have already run ''ApertiumServerSlave'' previously and you don't want to wait for the capacity calculation, add the argument ''-capacityFromConfigFile''. Using this argument capacity is read from ''conf/capacity.properties'' and the startup time decreases.
 
<pre>
 
bash run-apertium-server.sh hostname -capacityFromConfigFile
 
</pre>
 
After reading or calculating capacity, it contacts ''ApertiumServerRouter'' and starts to receive translation requests.
 
You can tune RMI ports and remote object name by editing ''run-apertium-server.sh''. See javadoc of class ''com.gsoc.apertium.translationengines.main.Main'' for more information.
 
 
Of course, servers can be stopped (with Ctrl+C) or started at any time.
 
 
==Dynamic server management: local networks==
 
If you don't want to manually start and stop translation servers, ''ApertiumServerRouter'' can do it for you. It will decide to start or stop servers depending on the translation capacity needed by the incoming requests. You'll only have change some configuration properties, and ''ApertiumServerRouter'' will connect via SSH to the computers of your network where ''ApertiumServerSlave'' is installed, and run or stop it when needed. This is called '''On Demand Server Management''' mode.
 
 
To make ''ApertiumServerRouter'' work in On Demand Server Management mode, you'll have to follow a couple of additional configuration steps. After unzipping ''ApertiumServerRouter.war'' and editing WEB-INF/classes/configuration.properties, and before zipping it again, edit the following files located at ''WEB-INF/classes/'':
 
* '''OnDemandServerInterface.properties''': Contains general options about dynamic server management:
 
** '''class''': Class that contacts servers to start and stop ''ApertiumServerSlave'' instances. Use the default value: ''com.gsoc.apertium.translationengines.router.ondemandservers.LocalNetworkOnDemandServer''.
 
** '''maxServers''': Maximum number of servers started by ''ApertiumServerRouter''. Must be equal or lower than the number of elements in the list of servers in ''LocalNetworkOnDemandServer.properties''.
 
** ''maxInactivityTime'': Maximum time, in milliseconds, a server can run without receiving any load. After this time, the server is stopped.
 
** ''startUpTimeout'': Maximum time, in milliseconds, the system waits for newly started server to contact the request router. The default value should be fine.
 
* '''LocalNetworkOnDemandServer.properties''': Contains options about how to contact new servers when running in On Demand Server Management mode using class ''com.gsoc.apertium.translationengines.router.ondemandservers.LocalNetworkOnDemandServer''.
 
** ''hosts'': Comma-separated list of servers with ''ApertiumServerSlave'' installed. Each element of the list follows this format: ''username:password@hostname:path''. ''Username'' and ''password'' must belong to an existing user on the remote machine. ''Hostname'' is the host name of the remote machine, and ''path'', the path where ''ApertiumServerSlave'' is installed. ''username'', ''password'' and ''path'' are optional. If they are not specified, their values are read from the properties ''defaultUser'', ''defaultPasword'' and ''defaultPath'' respectively.
 
** ''defaultUser'': Default user name.
 
** ''defaultPassword'': Default password.
 
** ''defaultPath'': Default ''ApertiumServerSlave'' installation path.
 
 
==Dynamic server management: Amazon EC2==
 
If you plan to deploy this Apertium web service implementation with dynamic server management on Amazon EC2, it is recommended to change the configuration explained in the previous section. With this new configuration, new server instances will be started and stopped, instead of starting and stopping the application on existing servers.
 
 
After unzipping ''ApertiumServerRouter.war'' and editing WEB-INF/classes/configuration.properties, and before zipping it again, edit the following files located at ''WEB-INF/classes/'':
 
 
* '''OnDemandServerInterface.properties''': Contains general options about dynamic server management:
 
** '''class''': Class that contacts servers to start and stop ''ApertiumServerSlave'' instances. Use: ''com.gsoc.apertium.translationengines.router.ondemandservers.AmazonOnDemandServer''.
 
** '''maxServers''': Maximum number of EC2 server instances started by ''ApertiumServerRouter''.
 
** ''maxInactivityTime'': Maximum time, in milliseconds, a server can run without receiving any load. After this time, the server is stopped.
 
** ''startUpTimeout'': Maximum time, in milliseconds, the system waits for newly started server to contact the request router. The default value should be fine.
 
 
* '''AmazonOnDemandServer.properties''': Contains options about how to start new servers when running in On Demand Server Management mode using class ''com.gsoc.apertium.translationengines.router.ondemandservers.AmazonOnDemandServer''.
 
**'''amazon_id''': Your Amazon Web Service Access Key ID. Compulsory property.
 
**'''amazon_key''': Your Amazon Web Service Secret Access Key. Compulsory property.
 
**'''amazon_image_id''': ID of an AMI that must run ''ApertiumServerSlave'' when started. Compulsory property.
 
**''amazon_security_groups'': comma-separated list of security groups associated with the server instances that will be started. These groups must allow connections to the following ports:
 
***Port on which RMI Registry runs: 1099.
 
***Port on which the ''ApertiumServerSlave'' RMI remote object is exported: 1331.
 
**''amazon_key_name'': Key pair associated with the server instances that will be started. Necessary if you want to manually connect via SSH to the instances and check that everything works as expected.
 
**''amazon_region_url'': URL of the region where the new instances will be launched and the AMI will be looked for. If this property is not present, region EU-West is used.
 
**''amazon_avzone'': availability zone where the new instances will be launched. It is a good idea to launch the request router and the apertium instances in the same availability zone. If you include the scripts explained below in your AMIs, you won't need to edit this property.
 
 
===Building AMIs for Amazon EC2===
 
 
====Bootstrapping====
 
 
To avoid creating new AMIs when a new version of Apertium Web Service is released, it is recommended to use a mechanism called bootstrapping. When the AMI starts, it downloads a package from Amazon S3, unzips it, and executes the script inside the package. The package also contains the lastest version of ''ApertiumServerSlave'' or ''ApertiumServerRouter'', so the script installs it and changes the necessary configuration properties.
 
 
====ApertiumServerRouter AMI====
 
 
We followed these steps to create an AMI that runs ''ApertiumServerRouter'':
 
* Start a clean installation of Ubuntu 9.04 Base.
 
* Install JRE 6.
 
* Install Apache Tomcat. Download it from [http://tomcat.apache.org/download-60.cgi here] and unzip to ''/root/apache-tomcat''.
 
* Install MySQL. Create the database and user specified in JPA configuration file. Give the user the right permissions.
 
* Install s3cmd:<pre>apt-get install s3cmd</pre> As root user, configure it with your Amazon WS ID and secret key:<pre>s3cmd --configure</pre>
 
* Install unzip:<pre>apt-get install unzip</pre>
 
* Prepare bootstrap:
 
** Put the file ''bootstrap'' that can be found in ''ApertiumServerRouter source code root/misc/ec2'' in ''/etc/init.d'', and a symbolic link from ''/etc/rc2.d/S99bootstrap'' to ''/etc/init.d/bootstrap'': <pre>ln -s /etc/init.d/bootstrap etc/rc2.d/S99bootstrap</pre>
 
** Put a file called ''bootstrap.tar.gz'' in a S3 bucket called ''org.apertium.server.router.bootstrap''. This file must contain a folder called ''bootstrap'' containing a version of ApertiumServerRouter.war configured to run on Amazon EC2 and the script ''bootstrap.sh'' that can be found in ''ApertiumServerRouter source code root/misc/ec2''.
 
* Create AMI using EC2 commands.
 
 
====ApertiumServerSlave AMI====
 
We followed these steps to create an AMI that runs ''ApertiumServerRouter'':
 
* Start a clean installation of Ubuntu 9.04 Base.
 
* Install JRE 6.
 
* Install s3cmd:<pre>apt-get install s3cmd</pre> As root user, configure it with your Amazon WS ID and secret key:<pre>s3cmd --configure</pre>
 
* Install libraries needed to run Apertium:<pre>sudo apt-get install subversion build-essential g++ pkg-config libxml2 libxml2-dev libxml2-utils xsltproc flex automake autoconf libtool libpcre3-dev </pre>
 
* Install ICU library:<pre>apt-get install libicu-dev</pre>
 
* Install Apertium. To do so, compile ''ApertiumServerSlave'' and copy ''ApertiumServerSlave-1.0-assembled.zip'' to the running AMI. Unzip it and run ''installApertiumAndPairs.sh''. Remember the values of the properties ''apertium_path'' and ''apertium_supported_pairs'' from ''conf/configuration.properties'', after installing Apertium.
 
* Run ''run-apertium-server.sh''. Wait for server capacity to be calculated and then kill it. Keep the file ''capacity.properties'' that have been created in ''conf'' directory.
 
* Prepare bootstrap:
 
** Put the file ''bootstrap'' that can be found in ''ApertiumServerSlave source code root/misc/ec2'' in ''/etc/init.d'', and a symbolic link from ''/etc/rc2.d/S99bootstrap'' to ''/etc/init.d/bootstrap'': <pre>ln -s /etc/init.d/bootstrap etc/rc2.d/S99bootstrap</pre>
 
** Put a file called ''bootstrap.tar.gz'' in a S3 bucket called ''org.apertium.server.slave.bootstrap''. This file must contain a folder called ''bootstrap'' containing ''ApertiumServerSlave-1.0-assembled.zip'', the script ''bootstrap.sh'' that can be found in ''ApertiumServerSlave source code root/misc/ec2'' and the file ''capacity.properties'' created in the previous step. Note that this file should only be used on an EC2 with the same size than the instance where the file was created. But before packing ''bootstrap.tar.gz'', edit ''bootstrap.sh'' and write the values of the properties mentioned above at the beginning of the script.
 
* You can remove the directory ''ApertiumServerSlave-1.0'' to decrease the size of the AMI.
 
* Create AMI using EC2 commands.
 
 
==Advanced configuration: calibration==
 
 
There are some advanced configuration properties we didn't explain in previous sections. This Apertium Web Service implementation estimates the CPU and memory capacity of each server, and the amount of load (for each language pair) the system will have to process (based on previous requests). Then, starts and stops daemons in the different servers to meet the load requirements. The CPU capacity is measured as the number of characters of a Spanish plain text that the server can translate into Catalan during a second.
 
 
===Load Converter===
 
 
The amount of load predicted for each language pair is based on the number of requests received for that pair during a past period of time and the number of characters of each request.
 
As the CPU capacity needed to translate a fixed amount of characters depends on the language pair, it is necessary to convert the amount of characters of each request to the equivalent Spanish-Catalan amount of characters
 
, i.e. , the amount of characters that needs the same CPU capacity to be translated from Spanish to Catalan than the original amount of characters to translated with the original language pair.
 
Something similar happens with the format. The CPU capacity needed to translate a fixed amount of characters depends on the format. Usually the same amount of characters needs less CPU capacity when it is in HTML format, because HTML tags are not translated. So, the amount of characters of each HTML request is converted to the equivalent amount of plain text characters, i.e. that needs the same CPU capacity to be translated.
 
Applying these two conversions to the number of characters of a request, it can be compared with server's capacity.
 
To convert load between language pairs and formats, the conversion rates are stored in ''LoadConverter.properties'' are used. This file is located in the root of ''ApertiumServerRouter'''s classpath. There are different types of properties in this file:
 
* ''source_language_code''-''target_language_code'': Contains the rate to convert from an amount of characters of the pair named by this property key to the equivalent Spanish-Catalan amount of characters.
 
* ''format_html'': Rate to convert from an amount of characters with HTML format to the equivalent plain text amount of characters.
 
 
These values have been already calculated but in the future they won't be very accurate because the rules of the different language pairs will change and so will their speed. If you want to calculate them again, execute this command from an existing installation of ''ApertiumServerSlave'': <pre>java -jar ApertiumServerSlave-1.0.jar -pairsInformation -comparationPair es-ca -speedFile LoadConverter.properties -memoryFile MemoryRequirements.properties</pre>
 
A new version of ''LoadConverter.properties'' will appear in ''ApertiumServerSlave'' installation directory.
 
 
===Memory requirements===
 
 
To place each daemon on the right server the system needs to know how much memory is needed by each language pair. This information is stored in ''MemoryRequirements.properties''. This file is located in the root of ''ApertiumServerRouter'''s classpath. For each property, the key is a language pair and the value the amount of megabytes of memory it requires.
 
These values have been already calculated but in the future they won't be very accurate because the rules of the different language pairs will change and so will their memory requirements. If you want to calculate them again, execute this command from an existing installation of ''ApertiumServerSlave'': <pre>java -jar ApertiumServerSlave-1.0.jar -pairsInformation -comparationPair es-ca -speedFile LoadConverter.properties -memoryFile MemoryRequirements.properties</pre>
 
A new version of ''MemoryRequirements.properties'' will appear in ''ApertiumServerSlave'' installation directory.
 
 
===64-bit operative systems===
 
 
It is known that programs generally need more memory on 64-bit operative systems than on 32-bit ones. If ''ApertiumServerSlave'' is running on a 64-bit operative system, its free memory is multiplied by the value of the configuration property ''memoryrate_64bit''. The value of this property is calculated as the average of the division between memory needed on 32-bit operative systems and memory needed on 64-bit operative systems for each language pair.
 
 
Having a version of ''MemoryRequirements.properties'' created on a 32-bit operative system and another version created on 64-bit one, the value of ''memoryrate_64bit'' can be calculated running the following command from ''ApertiumServerSlave'' installation directory:
 
<pre>
 
java -jar ApertiumServerSlave-1.0.jar -compareMemory -file32 absolute_path_of_32_bit_memory_requirements_file -file64 absolute_path_of_64_bit_memory_requirements_file
 
</pre>
 
 
===Constant CPU cost of a request===
 
 
The CPU cost of a translation request is calculated by adding the value of the property ''request_k'' to the number of characters of the request (previously converted to the equivalent amount of Spanish-Catalan characters). ''request_k'' represents the computational cost of all the operations needed to process a translation request on ''ApertiumServerSlave'' and not performed by an Apertium daemon, like unmarshalling the RMI request, invoking deformatter and reformatter, managing the queue, etc.
 
Taking into account this CPU cost is very important to manage daemons accurately, because this cost can be higher than the CPU cost of translating the text using an Apertium daemon.
 
 
To estimate the value of ''request_k'' we need to measure some parameters in the system while it is loaded enough to consume all the CPU capacity of the server. It is recommended to use only one server and one language pair, Spanish-Catalan, because the arithmetic operations will be much more simple.
 
During a time period ''t'' we measure the amount of characters processed by the system ''nc'', and the number of served requests, ''np''. If server's capacity is ''C'', theorically the system can process ''C*t'' characters. If we substract ''nc'' from ''C*t'', we get the number of characters equivalent to the computational cost of the constant part of all the request. If we divide this value by ''np'', we have the constant CPU cost of each request.
 
 
To summarize:
 
 
k= ( C*t - nc ) / np
 
 
C = servers's capacity
 
 
t = test time
 
 
nc = number of characters processed during the test time
 
 
np = number of requests processed during test time
 
 
If you want to estimate this value on your own, server's capacity is shown when it starts, and test time and the number of characters and requests processed during that time is written in ''ApertiumServerRouter'' 's log file. Look for a line starting with "LoadPredictor -Requests received during".
 
 
 
=API Specification=
 
 
==Introduction==
 
 
This API is very similar to Google AJAX Language API to make as easy as possible switching to Apertium JSON API.
 
For more information about Google AJAX Language API, see http://code.google.com/intl/en/apis/ajaxlanguage/documentation/reference.html#_intro_fonje .
 
 
There are two resources:
 
<pre>
 
http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/translate
 
</pre>
 
and
 
<pre>
 
http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/listPairs
 
</pre>
 
 
The first one translates pieces of plain text or html code, and the second one lists the available language pairs.
 
 
Both resources admit GET and POST http methods. The value of arguments must be properly escaped (e.g., via the functional equivalent of Javascript's encodeURIComponent() method).
 
 
==Common arguments and response format==
 
 
These arguments are all optional and common to both resources:
 
* '''key''' : User's personal API key. Requests from registered users have higher priority.
 
* '''callback''' : Alters the response format, adding a call to a Javascript function. See description below.
 
* '''context''' : If callback parameter is supplied too, adds additional arguments to the function call. See description below.
 
 
If nor callback neither context arguments are supplied, this is the JSON object returned by both resources:
 
<pre>
 
{ "responseData" : JSON Object with the requested data , "responseStatus" : Response numeric code , "responseDetails" : Error description }
 
</pre>
 
 
If callback argument is supplied, a call to a function named by the callback value is returned. For instance, if callback argument's value is ''foo'',
 
this is the JavaScript code returned:
 
<pre>
 
foo({ "responseData" : JSON Object with the requested data , "responseStatus" : Response numeric code , "responseDetails" : Error description })
 
</pre>
 
 
If both callback and context arguments are supplied, the returned function call has more arguments. If callback's value is ''foo'' and context's value
 
is '''bar'':
 
<pre>
 
foo('bar',JSON Object with the requested data , Response numeric code , Error description )
 
</pre>
 
 
==listPairs resource==
 
 
This resource only accepts the common arguments.
 
 
The response data returned is an array of language pairs, following this format:
 
<pre>
 
[{"sourceLanguage": source language code ,"targetLanguage": target language code }, ... ]
 
</pre>
 
 
responseStatus is always 200, that means the request was processed OK, and responseDetails has a null value.
 
 
So if we call this resource with no arguments:
 
<pre>
 
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/listPairs'
 
</pre>
 
we get, for example:
 
<pre>
 
{"responseData":[{"sourceLanguage":"ca","targetLanguage":"oc"},{"sourceLanguage":"en","targetLanguage":"es"}],"responseDetails":null,"responseStatus":200}
 
</pre>
 
 
==translate resource==
 
 
This resource accepts the common arguments mentioned above, plus the following specific arguments:
 
* '''q''' : Source text or HTML code to be translated. Compulsory argument.
 
* '''langpair''' : Source language code and target language code, separated by '|' character, which is escaped as '%7C'. Compulsory argument.
 
* '''format''' : Source format. ''text'' for plain text and ''html'' for HTML code. This argument is optional. If this argument is missing it is assumed that source is plain text.
 
 
The response data is JSON object following this format:
 
<pre>
 
{ "translatedText" : translated text }
 
</pre>
 
 
Many different response status codes can be returned. This is the list with all the codes and their meaning:
 
* '''200''' : Text has been translated successfully, ''responseDetails'' field is null.
 
* '''400''' : Bad parameters. A compulsory argument is missing, or there is an argument with wrong format. A more accurate description can be found in ''responseDetails'' field.
 
* '''451''' : Not supported pair. Apertium can't translate with the requested language pair.
 
* '''452''' : Not supported format. The translation engine doesn't recognize the requested format.
 
* '''500''' : Unexpected error. An unexpected error happened. Depending on the error, a more accurate description can be found in ''responseDetails'' field.
 
* '''552''' : Overloaded system. The system is overloaded and can't process the request.
 
 
Here is a simple example. Requesting a translation with:
 
<pre>
 
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/translate?q=hello%20world&langpair=en%7Ces&callback=foo'
 
</pre>
 
the result is:
 
<pre>
 
foo({"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200})
 
</pre>
 
And if we add the context parameter:
 
<pre>
 
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/translate?q=hello%20world&langpair=en%7Ces&callback=foo&context=a'
 
</pre>
 
we get
 
<pre>
 
foo('a',{"translatedText":"hola Mundo"},200,null)
 
</pre>
 
 
==Batch interface==
 
 
More than one translation can be performed in the same request if we use more than one ''q'' argument or more than one ''langpair''. If there is only one
 
''q'' argument and more than one ''langpair'' arguments, the same input string is translated with different language pairs. If there is only one ''langpair'' argument and more than one ''q'' arguments, the different input strings are translated with the same language pair. And if both arguments are supplied more than one time, and they are repeated exactly the same times, the first ''q'' is translated with the first ''langpair'', the second ''q'' with the second ''langpair'', etc.
 
 
The returned JSON changes a bit when using the batch interface. Now the field ''responseData'' contains an array of JSON objects, each one with the usual fields: ''responseData'', ''responseStatus'' and ''responseDetails''. Note that we have particular values of ''responseStatus'' and ''responseDetails'' for each translation, but global values too. If all the translation are OK, these values match, but if there is an error in any translation, global values of these fields take the value of the erroneous translation. If there is more than one erroneous translation, global fields take the value of one the the erroneus translations.
 
 
These examples show the described behaviour:
 
<pre>
 
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/translate?q=hello%20world&q=bye&langpair=en%7Ces'
 
</pre>
 
<pre>
 
{"responseData":[{"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200},
 
{"responseData":{"translatedText":"adiós"},"responseDetails":null,"responseStatus":200}],"responseDetails":null,"responseStatus":200}
 
</pre>
 
 
 
<pre>
 
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/translate?q=hello%20world&langpair=en%7Ces&langpair=en%7Cca&callback=foo'
 
</pre>
 
<pre>
 
foo({"responseData":[{"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200},
 
{"responseData":{"translatedText":"Món d'hola"},"responseDetails":null,"responseStatus":200}],"responseDetails":null,"responseStatus":200})
 
</pre>
 
 
 
<pre>
 
curl 'http://ApertiumServerInstallationHost/ApertiumServerRouter/resources/translate?q=hello%20world&q=goodbye&langpair=en%7Ces&langpair=en%7Cca&callback=foo&context=bar'
 
</pre>
 
<pre>
 
foo('bar',[{"responseData":{"translatedText":"hola Mundo"},"responseDetails":null,"responseStatus":200},
 
{"responseData":{"translatedText":"adéu"},"responseDetails":null,"responseStatus":200}],200,null)
 
</pre>
 
 
[[Category:Development]]
 
[[Category:Services]]
 

Latest revision as of 08:39, 7 March 2018

Redirect to: