Difference between revisions of "Apertium-service"

From Apertium
Jump to navigation Jump to search
 
(30 intermediate revisions by 8 users not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}

{{Github-unmigrated-tool}}

==Introduction==
==Introduction==


Apertium-service runs apertium translation pairs as a service and provides '''translate''' and '''detect''' (language recognition) capabilities over an '''XML-RPC''' interface, as well as '''REST''' and '''SOAP''' wrappers.
A paper describing the service, its interfaces and internal architecture can be found here: http://rua.ua.es/dspace/handle/10045/12031

The service is implemented as a multi-threaded C++ program which uses libapertium and liblttoolbox to run translation modes (and works by redirecting the C FILE streams within the libraries, instead of starting separate processes and [[NUL flushing]] – see [[Daemon]] for discussion). It also manages a ''resource pool'' of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high water mark.


A paper describing the service, its interfaces and internal architecture can be found here: http://rua.ua.es/dspace/handle/10045/12031 . The development was also documented on the wiki page [[Apertium going SOA]].


==Compiling and Installing==
==Compiling and Installing==
Line 21: Line 27:
* libapertium3 - library for apertium, a Free / Open-Source machine translation system.
* libapertium3 - library for apertium, a Free / Open-Source machine translation system.


* libtextcat0 - a library implementing the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". '''(optional)'''
* [http://software.wise-guys.nl/libtextcat/ libtextcat0] - a library implementing the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". '''(optional)'''
* libapertiumcombine1 - a library implementing a [[Multi-engine_translation_synthesiser|Multi-Engine Translation Synthesiser]] '''(optional)'''
* libapertiumcombine1 - a library implementing a [[Multi-engine_translation_synthesiser|Multi-Engine Translation Synthesiser]] '''(optional)'''


* libxmlrpc-c3 - a lightweight RPC library based on XML and HTTP for C and C++. (>= 1.16.07-1)
* [http://xmlrpc-c.sourceforge.net/ libxmlrpc-c3] - a lightweight RPC library based on XML and HTTP for C and C++. (>= 1.16.07-1)
* libxml++2 - a C++ interface to libxml2, the GNOME XML library.
* libxml++2 - a C++ interface to libxml2, the GNOME XML library.
* libboost - Boost C++ libraries are a collection of peer-reviewed, Open Source libraries that extend the functionality of C++.
* libboost - Boost C++ libraries are a collection of peer-reviewed, Open Source libraries that extend the functionality of C++. (>= 1.41.0)




Line 38: Line 44:
* libboost-program-options - program options library for C++.
* libboost-program-options - program options library for C++.


===Download===
====Ubuntu====
To install the xml and boost components on Ubuntu, use
<pre>
sudo apt-get install libxml++2.6-dev libxmlrpc-c3-dev libboost-thread-dev libboost-filesystem-dev \
libboost-system-dev libboost-date-time-dev libboost-regex-dev libboost-program-options-dev libcurl4-openssl-dev
</pre>


====Arch Linux====
To install the xml, boost and other components on Arch Linux, first do:
<pre>
sudo pacman -S autoconf automake libtextcat libxml2 libxml++ boost
</pre>
Note: libtextcat is optional.

The other requirements are in AUR. If you have [http://aur.archlinux.org/packages.php?ID=5863 yaourt], you should be able to do:
<pre>
sudo yaourt -S lttoolbox apertium xmlrpc-c-abyss
</pre>
although you might first have to <code>sudo pacman -Rd xmlrpc-c</code> since that (outdated package) conflicts with xmlrpc-c-abyss (also, AMD64 users might have to use [http://aur.archlinux.org/packages.php?ID=32354 this patch]).

To make sure apertium-service finds liblttoolbox, do
<pre>
$ sudo vi /etc/ld.so.conf
</pre>
and append
<pre>/usr/lib</pre>
to the file, then run
<pre>sudo ldconfig</pre>

===Checkout from SVN===
'''Note:''' After Apertium's migration to GitHub, this tool is '''read-only''' on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see [[Migrating tools to GitHub]].


apertium-service can be downloaded from the [[Using_SVN | Apertium SVN repository]] with the following command:
apertium-service can be downloaded from the [[Using_SVN | Apertium SVN repository]] with the following command:
Line 47: Line 82:
</pre>
</pre>


Immediately after a SVN checkout, you can generate the files required for building apertium-service with GNU autotools with the following command:

<pre>
$ ./autogen.sh
</pre>


===Configuring the source tree===
===Configuring the source tree===
Line 108: Line 148:
</pre>
</pre>


The supported fields of the configuration file are the following:
* <code>ServerPort</code> is the port it should listen on

* <code>ApertiumBase</code> is where it can find the modes files.
* <code>ServerPort</code> sets the port where the XML-RPC service should listen on
* <code>ApertiumBase</code> sets where it can find the modes files.

* <code>HighWaterMark</code> sets the high water mark (the maximum number of object that can be allocated for each resource pool).


* <code>MultiEngineMachineTranslation</code> is only if you want to enable the [[Multi-engine_translation_synthesiser|MEMT]] module (not yet stable). Within that,
* <code>MultiEngineMachineTranslation</code> is only if you want to enable the [[Multi-engine_translation_synthesiser|MEMT]] module (not yet stable). Within that,
Line 127: Line 171:


==Consuming the service==
==Consuming the service==

The following samples assume that the service you want to consume is located at the address <code>http://www.neuralnoise.com:6173/RPC2</code>


===Python===
===Python===
Line 137: Line 183:
import xmlrpclib;
import xmlrpclib;


proxy = xmlrpclib.ServerProxy("http://localhost:6173/RPC2");
proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");
res = proxy.translate("Això no és una prova.", "ca", "en");
res = proxy.translate("Això no és una prova.", "ca", "en");
print res["translation"];
print res["translation"];
Line 152: Line 198:


<pre>
<pre>
proxy = xmlrpclib.ServerProxy("http://localhost:6173/RPC2");
proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");


for pair in proxy.languagePairs(): #{
for pair in proxy.languagePairs(): #{
Line 169: Line 215:
server = XMLRPC::Client.new("www.neuralnoise.com", "/RPC2", 6173)
server = XMLRPC::Client.new("www.neuralnoise.com", "/RPC2", 6173)
puts server.call("translate", "This is a test for the machine translation program.", "en", "es")["translation"]
puts server.call("translate", "This is a test for the machine translation program.", "en", "es")["translation"]
</pre>

===Perl===

<pre>
#!/usr/bin/perl

require RPC::XML;
require RPC::XML::Client;

my $client = RPC::XML::Client->new("http://www.neuralnoise.com:6173/RPC2");
my $result = $client->send_request("translate", "This is a test for the machine translation service.", "en", "es");

binmode(STDOUT, ":utf8");

foreach my $key ( sort keys %{$result} ) {
print $key . " = " . $result->value->{$key} . "\n";
}
</pre>

===Java===

<pre>
import java.net.*;
import java.util.*;

import org.apache.xmlrpc.*;
import org.apache.xmlrpc.client.*;

public class TestCase {
public static void main(String[] args) throws MalformedURLException, XmlRpcException {
XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();
config.setServerURL(new URL("http://www.neuralnoise.com:6173/RPC2"));
config.setBasicEncoding("UTF-8");
XmlRpcClient client = new XmlRpcClient();
client.setTransportFactory(new XmlRpcSunHttpTransportFactory(client));
client.setConfig(config);
Object[] params = {
"This is a test for the machine translation service",
"en", "es"};
Map<String, String> ret = (Map<String, String>) client.execute("translate", params);
System.out.println(ret.get("translation"));
}
}
</pre>
</pre>


Line 190: Line 283:
main(int argc, char **) {
main(int argc, char **) {
try {
try {
string const serverUrl("http://localhost:6173/RPC2");
string const serverUrl("http://www.neuralnoise.com:6173/RPC2");
string const methodName("translate");
string const methodName("translate");


Line 209: Line 302:
cerr << "Client threw unexpected error." << endl;
cerr << "Client threw unexpected error." << endl;
}
}

return 0;
return 0;
}
}
</pre>


===Haskell===

<pre>
import Network.XmlRpc.Client

server = "http://www.neuralnoise.com:6173/RPC2"

translate :: String -> String -> String -> String -> IO [(String,String)]
translate url = remote url "translate"

main = do
let x = "This is a test for the machine translation service."
y = "en"
z = "es"
ret <- translate server x y z
print ret
</pre>
</pre>

===Emacs-Lisp===
<pre>
; put http://www.emacswiki.org/emacs/xml-rpc.el into a member of load-path and
(require 'xml-rpc)

(defvar apertium-server "http://www.neuralnoise.com:6173/RPC2")

(xml-rpc-method-call apertium-server 'translate "Això no és una prova." "ca" "en")


(defun clean-apertium-pairs (pairs) ; optionally
"Make a list of pairs ("from" . "to")."
(setq apertium-pairs
(mapcar (lambda (pair)
(cons (cdr (assoc "destLang" pair))
(cdr (assoc "srcLang" pair))))
pairs)))

(clean-apertium-pairs (xml-rpc-method-call
apertium-server 'languagePairs))
;; or async:
(xml-rpc-method-call-async 'clean-apertium-pairs
apertium-server 'languagePairs)
</pre>

==Benchmarks==



==See also==
==See also==
* [[Apertium going SOA]]
* [[Apertium going SOA]] - documentation of the development of <code>apertium-service</code>
* [[Apertium services]]
* [[Apertium services]]


[[Category:Services]]
[[Category:Services]]
[[Category:Documentation]]

Latest revision as of 03:29, 6 November 2019

Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.

Introduction[edit]

Apertium-service runs apertium translation pairs as a service and provides translate and detect (language recognition) capabilities over an XML-RPC interface, as well as REST and SOAP wrappers.

The service is implemented as a multi-threaded C++ program which uses libapertium and liblttoolbox to run translation modes (and works by redirecting the C FILE streams within the libraries, instead of starting separate processes and NUL flushing – see Daemon for discussion). It also manages a resource pool of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high water mark.

A paper describing the service, its interfaces and internal architecture can be found here: http://rua.ua.es/dspace/handle/10045/12031 . The development was also documented on the wiki page Apertium going SOA.

Compiling and Installing[edit]

This document covers compilation and installation of apertium-service on Unix and Unix-like systems only, but it can be compiled also on other systems that meet the requirements.

apertium-service, like many other Open Source projects, uses GNU buildtools (like autoconf and automake) to create a build environment.


Requirements[edit]

You need the following software installed:

  • liblttoolbox3 - library for lttoolbox, a toolbox for lexical processing, morphological analysis and generation of words.
  • libapertium3 - library for apertium, a Free / Open-Source machine translation system.
  • libtextcat0 - a library implementing the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". (optional)
  • libapertiumcombine1 - a library implementing a Multi-Engine Translation Synthesiser (optional)
  • libxmlrpc-c3 - a lightweight RPC library based on XML and HTTP for C and C++. (>= 1.16.07-1)
  • libxml++2 - a C++ interface to libxml2, the GNOME XML library.
  • libboost - Boost C++ libraries are a collection of peer-reviewed, Open Source libraries that extend the functionality of C++. (>= 1.41.0)


In particular, the following components from Boost C++ libraries are required:

  • libboost-thread - for portable C++ multi-threading.
  • libboost-filesystem - for portable filesystem operations in C++.
  • libboost-system - for dealing with system-specific error code values in C++.
  • libboost-date-time - for portable date/time operations in C++.
  • libboost-regex - regular expression library for C++.
  • libboost-program-options - program options library for C++.

Ubuntu[edit]

To install the xml and boost components on Ubuntu, use

sudo apt-get install libxml++2.6-dev libxmlrpc-c3-dev libboost-thread-dev libboost-filesystem-dev \
libboost-system-dev libboost-date-time-dev libboost-regex-dev libboost-program-options-dev libcurl4-openssl-dev

Arch Linux[edit]

To install the xml, boost and other components on Arch Linux, first do:

sudo pacman -S autoconf automake libtextcat libxml2 libxml++ boost 

Note: libtextcat is optional.

The other requirements are in AUR. If you have yaourt, you should be able to do:

sudo yaourt -S lttoolbox apertium xmlrpc-c-abyss

although you might first have to sudo pacman -Rd xmlrpc-c since that (outdated package) conflicts with xmlrpc-c-abyss (also, AMD64 users might have to use this patch).

To make sure apertium-service finds liblttoolbox, do

$ sudo vi /etc/ld.so.conf

and append

/usr/lib

to the file, then run

sudo ldconfig

Checkout from SVN[edit]

Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.

apertium-service can be downloaded from the Apertium SVN repository with the following command:

$ svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-service

Immediately after a SVN checkout, you can generate the files required for building apertium-service with GNU autotools with the following command:

$ ./autogen.sh

Configuring the source tree[edit]

The next step is to configure the apertium-service source tree for your particular platform and personal requirements. This is done using the script configure included in the root directory of the distribution. (Developers downloading an unreleased version of the apertium-service source tree will need to have autoconf and libtool installed and will need to run the script autogen.sh before proceeding with the next steps. This is not necessary for official releases.)

To configure the source tree using all the default options, simply type ./configure. To change the default options, configure accepts a variety of variables and command line options.

The most important option is the location --prefix where the apertium-service is to be installed later, because apertium-service has to be configured for this location to work correctly. More fine-tuned control of the location of files is possible with additional configure options.

In addition, it is sometimes necessary to provide the configure script with extra information about the location of your compiler, libraries, or header files. This is done by passing either environment variables or command line options to configure. For more information, see the configure manual page.

For a short impression of what possibilities you have, here is a typical example which compiles apertium-service for the installation tree /sw/pkg/apertium-service with a particular compiler and flags:

$ CC="pgcc" CFLAGS="-O2" \
./configure --prefix=/sw/pkg/apertium-service

When configure is run it will take a few seconds to test for the availability of features on your system and build Makefiles which will later be used to compile the server.

Details on all the different configure options are available on the configure manual page.


Build[edit]

Now you can build the various parts which form the apertium-service package by simply running the command:

$ make

A base configuration takes a few minutes to compile and the time will vary widely depending on your hardware.


Install[edit]

Now it's time to install the package under the configured installation PREFIX (see --prefix option above) by running:

$ make install


Customise[edit]

Next, you can customise your apertium-service by editing the configuration files under PREFIX/etc/apertium-service/.

$ vi PREFIX/etc/apertium-service/configuration.xml

The users.xml is only if you want access control. The configuration.xml file is fairly straightforward,

<ApertiumServiceConfiguration>
       <ServerPort>6173</ServerPort>
       <ApertiumBase>/usr/local/share/apertium/modes</ApertiumBase>

The supported fields of the configuration file are the following:

  • ServerPort sets the port where the XML-RPC service should listen on
  • ApertiumBase sets where it can find the modes files.
  • HighWaterMark sets the high water mark (the maximum number of object that can be allocated for each resource pool).
  • MultiEngineMachineTranslation is only if you want to enable the MEMT module (not yet stable). Within that,
    • <MonolingualDictionary srcLang="br" destLang="fr">/usr/local/share/apertium/apertium-br-fr/br-fr.automorf.bin</MonolingualDictionary> gives the path to an analyser for a given language. This analyser is used to lemmatise all the input sentences to the MEMT module to improve alignment.
    • <LanguageModel lang="de">/home/pasquale/gsoc/lm/europarl.de.blm</LanguageModel> gives an IRSTLM language model used to score the final hypotheses from the MEMT module.

Test[edit]

Now you can start your apertium-service by immediately running:

$ PREFIX/bin/apertium-service

and then you should be able to make your first XML-RPC query via URL http://localhost:port/RPC2.

Consuming the service[edit]

The following samples assume that the service you want to consume is located at the address http://www.neuralnoise.com:6173/RPC2

Python[edit]

#!/usr/bin/python
# coding=utf-8
# -*- encoding: utf-8 -*-

import xmlrpclib;

proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");
res = proxy.translate("Això no és una prova.", "ca", "en");
print res["translation"];

Should give the output,

$ python test.py 
This is not a test.

Providing you have the ca-en pair installed. You can find which language pairs are detected with the method languagePairs(), for example,

proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");

for pair in proxy.languagePairs(): #{
        sys.stdout.write(pair["srcLang"] + "-" + pair["destLang"] + " ");
#}
print "";

Ruby[edit]

#!/usr/bin/ruby

require 'xmlrpc/client'

server = XMLRPC::Client.new("www.neuralnoise.com", "/RPC2", 6173)
puts server.call("translate", "This is a test for the machine translation program.", "en", "es")["translation"]

Perl[edit]

#!/usr/bin/perl

require RPC::XML;
require RPC::XML::Client;

my $client = RPC::XML::Client->new("http://www.neuralnoise.com:6173/RPC2");
my $result = $client->send_request("translate", "This is a test for the machine translation service.", "en", "es");

binmode(STDOUT, ":utf8");

foreach my $key ( sort keys %{$result} ) {
    print $key . " = " . $result->value->{$key} . "\n";
}

Java[edit]

import java.net.*;
import java.util.*;

import org.apache.xmlrpc.*;
import org.apache.xmlrpc.client.*;

public class TestCase {
	public static void main(String[] args) throws MalformedURLException, XmlRpcException {
		XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();
		config.setServerURL(new URL("http://www.neuralnoise.com:6173/RPC2"));
		config.setBasicEncoding("UTF-8");
		
		XmlRpcClient client = new XmlRpcClient();
		client.setTransportFactory(new XmlRpcSunHttpTransportFactory(client));
		client.setConfig(config);
		
		Object[] params = { 
				"This is a test for the machine translation service",
				"en", "es"};
		
		Map<String, String> ret = (Map<String, String>) client.execute("translate", params);
		System.out.println(ret.get("translation"));
	}
}

C++[edit]

/*
 * g++ test.cc -o test -lxmlrpc_client++ -lxmlrpc++ -lxmlrpc_client -lxmlrpc_cpp -lxmlrpc_xmlparse -lxmlrpc_xmltok -lxmlrpc_server
 */

#include <cstdlib>
#include <string>
#include <iostream>
#include <xmlrpc-c/girerr.hpp>
#include <xmlrpc-c/base.hpp>
#include <xmlrpc-c/client_simple.hpp>

using namespace std;

int
main(int argc, char **) {
    try {
        string const serverUrl("http://www.neuralnoise.com:6173/RPC2");
        string const methodName("translate");

        xmlrpc_c::clientSimple myClient;
        xmlrpc_c::value result;

        myClient.call(serverUrl, methodName, "sss", &result, "test", "en", "es");

        map<string, xmlrpc_c::value> const resultStruct = xmlrpc_c::value_struct(result);
        map<string, xmlrpc_c::value>::const_iterator iter = resultStruct.find("translation");

        string ret = (string)xmlrpc_c::value_string(iter->second);

        cout << "Translation: " << ret << endl;
    } catch (exception const& e) {
        cerr << "Client threw error: " << e.what() << endl;
    } catch (...) {
        cerr << "Client threw unexpected error." << endl;
    }
    return 0;
}

Haskell[edit]

import Network.XmlRpc.Client

server = "http://www.neuralnoise.com:6173/RPC2"

translate :: String -> String -> String -> String -> IO [(String,String)]
translate url = remote url "translate"

main = do
       let x = "This is a test for the machine translation service."
           y = "en"
           z = "es"
       ret <- translate server x y z
       print ret

Emacs-Lisp[edit]

; put http://www.emacswiki.org/emacs/xml-rpc.el into a member of load-path and
(require 'xml-rpc)

(defvar apertium-server "http://www.neuralnoise.com:6173/RPC2")

(xml-rpc-method-call apertium-server 'translate "Això no és una prova." "ca" "en")


(defun clean-apertium-pairs (pairs)          ; optionally
  "Make a list of pairs ("from" . "to")."
  (setq apertium-pairs
	(mapcar (lambda (pair)
		  (cons (cdr (assoc "destLang" pair))
			(cdr (assoc "srcLang" pair))))
		pairs)))

(clean-apertium-pairs (xml-rpc-method-call
		       apertium-server 'languagePairs))
;; or async:
(xml-rpc-method-call-async 'clean-apertium-pairs
			   apertium-server 'languagePairs)

Benchmarks[edit]

See also[edit]