Apertium-service

From Apertium
Revision as of 05:49, 8 March 2018 by Shardulc (talk | contribs) (GitHub migration)
Jump to navigation Jump to search

Introduction

Apertium-service runs apertium translation pairs as a service and provides translate and detect (language recognition) capabilities over an XML-RPC interface, as well as REST and SOAP wrappers.

The service is implemented as a multi-threaded C++ program which uses libapertium and liblttoolbox to run translation modes (and works by redirecting the C FILE streams within the libraries, instead of starting separate processes and NUL flushing – see Daemon for discussion). It also manages a resource pool of e.g. language pair data, both (eagerly) pre-allocated and (lazily) allocated at need, up to a high water mark.

A paper describing the service, its interfaces and internal architecture can be found here: http://rua.ua.es/dspace/handle/10045/12031 . The development was also documented on the wiki page Apertium going SOA.

Compiling and Installing

This document covers compilation and installation of apertium-service on Unix and Unix-like systems only, but it can be compiled also on other systems that meet the requirements.

apertium-service, like many other Open Source projects, uses GNU buildtools (like autoconf and automake) to create a build environment.


Requirements

You need the following software installed:

  • liblttoolbox3 - library for lttoolbox, a toolbox for lexical processing, morphological analysis and generation of words.
  • libapertium3 - library for apertium, a Free / Open-Source machine translation system.
  • libtextcat0 - a library implementing the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". (optional)
  • libapertiumcombine1 - a library implementing a Multi-Engine Translation Synthesiser (optional)
  • libxmlrpc-c3 - a lightweight RPC library based on XML and HTTP for C and C++. (>= 1.16.07-1)
  • libxml++2 - a C++ interface to libxml2, the GNOME XML library.
  • libboost - Boost C++ libraries are a collection of peer-reviewed, Open Source libraries that extend the functionality of C++. (>= 1.41.0)


In particular, the following components from Boost C++ libraries are required:

  • libboost-thread - for portable C++ multi-threading.
  • libboost-filesystem - for portable filesystem operations in C++.
  • libboost-system - for dealing with system-specific error code values in C++.
  • libboost-date-time - for portable date/time operations in C++.
  • libboost-regex - regular expression library for C++.
  • libboost-program-options - program options library for C++.

Ubuntu

To install the xml and boost components on Ubuntu, use

sudo apt-get install libxml++2.6-dev libxmlrpc-c3-dev libboost-thread-dev libboost-filesystem-dev \
libboost-system-dev libboost-date-time-dev libboost-regex-dev libboost-program-options-dev libcurl4-openssl-dev

Arch Linux

To install the xml, boost and other components on Arch Linux, first do:

sudo pacman -S autoconf automake libtextcat libxml2 libxml++ boost 

Note: libtextcat is optional.

The other requirements are in AUR. If you have yaourt, you should be able to do:

sudo yaourt -S lttoolbox apertium xmlrpc-c-abyss

although you might first have to sudo pacman -Rd xmlrpc-c since that (outdated package) conflicts with xmlrpc-c-abyss (also, AMD64 users might have to use this patch).

To make sure apertium-service finds liblttoolbox, do

$ sudo vi /etc/ld.so.conf

and append

/usr/lib

to the file, then run

sudo ldconfig

Checkout from SVN

Note: After Apertium's migration to GitHub, this tool is read-only on the SourceForge repository and does not exist on GitHub. If you are interested in migrating this tool to GitHub, see Migrating tools to GitHub.

apertium-service can be downloaded from the Apertium SVN repository with the following command:

$ svn co http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-service

Immediately after a SVN checkout, you can generate the files required for building apertium-service with GNU autotools with the following command:

$ ./autogen.sh

Configuring the source tree

The next step is to configure the apertium-service source tree for your particular platform and personal requirements. This is done using the script configure included in the root directory of the distribution. (Developers downloading an unreleased version of the apertium-service source tree will need to have autoconf and libtool installed and will need to run the script autogen.sh before proceeding with the next steps. This is not necessary for official releases.)

To configure the source tree using all the default options, simply type ./configure. To change the default options, configure accepts a variety of variables and command line options.

The most important option is the location --prefix where the apertium-service is to be installed later, because apertium-service has to be configured for this location to work correctly. More fine-tuned control of the location of files is possible with additional configure options.

In addition, it is sometimes necessary to provide the configure script with extra information about the location of your compiler, libraries, or header files. This is done by passing either environment variables or command line options to configure. For more information, see the configure manual page.

For a short impression of what possibilities you have, here is a typical example which compiles apertium-service for the installation tree /sw/pkg/apertium-service with a particular compiler and flags:

$ CC="pgcc" CFLAGS="-O2" \
./configure --prefix=/sw/pkg/apertium-service

When configure is run it will take a few seconds to test for the availability of features on your system and build Makefiles which will later be used to compile the server.

Details on all the different configure options are available on the configure manual page.


Build

Now you can build the various parts which form the apertium-service package by simply running the command:

$ make

A base configuration takes a few minutes to compile and the time will vary widely depending on your hardware.


Install

Now it's time to install the package under the configured installation PREFIX (see --prefix option above) by running:

$ make install


Customise

Next, you can customise your apertium-service by editing the configuration files under PREFIX/etc/apertium-service/.

$ vi PREFIX/etc/apertium-service/configuration.xml

The users.xml is only if you want access control. The configuration.xml file is fairly straightforward,

<ApertiumServiceConfiguration>
       <ServerPort>6173</ServerPort>
       <ApertiumBase>/usr/local/share/apertium/modes</ApertiumBase>

The supported fields of the configuration file are the following:

  • ServerPort sets the port where the XML-RPC service should listen on
  • ApertiumBase sets where it can find the modes files.
  • HighWaterMark sets the high water mark (the maximum number of object that can be allocated for each resource pool).
  • MultiEngineMachineTranslation is only if you want to enable the MEMT module (not yet stable). Within that,
    • <MonolingualDictionary srcLang="br" destLang="fr">/usr/local/share/apertium/apertium-br-fr/br-fr.automorf.bin</MonolingualDictionary> gives the path to an analyser for a given language. This analyser is used to lemmatise all the input sentences to the MEMT module to improve alignment.
    • <LanguageModel lang="de">/home/pasquale/gsoc/lm/europarl.de.blm</LanguageModel> gives an IRSTLM language model used to score the final hypotheses from the MEMT module.

Test

Now you can start your apertium-service by immediately running:

$ PREFIX/bin/apertium-service

and then you should be able to make your first XML-RPC query via URL http://localhost:port/RPC2.

Consuming the service

The following samples assume that the service you want to consume is located at the address http://www.neuralnoise.com:6173/RPC2

Python

#!/usr/bin/python
# coding=utf-8
# -*- encoding: utf-8 -*-

import xmlrpclib;

proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");
res = proxy.translate("Això no és una prova.", "ca", "en");
print res["translation"];

Should give the output,

$ python test.py 
This is not a test.

Providing you have the ca-en pair installed. You can find which language pairs are detected with the method languagePairs(), for example,

proxy = xmlrpclib.ServerProxy("http://www.neuralnoise.com:6173/RPC2");

for pair in proxy.languagePairs(): #{
        sys.stdout.write(pair["srcLang"] + "-" + pair["destLang"] + " ");
#}
print "";

Ruby

#!/usr/bin/ruby

require 'xmlrpc/client'

server = XMLRPC::Client.new("www.neuralnoise.com", "/RPC2", 6173)
puts server.call("translate", "This is a test for the machine translation program.", "en", "es")["translation"]

Perl

#!/usr/bin/perl

require RPC::XML;
require RPC::XML::Client;

my $client = RPC::XML::Client->new("http://www.neuralnoise.com:6173/RPC2");
my $result = $client->send_request("translate", "This is a test for the machine translation service.", "en", "es");

binmode(STDOUT, ":utf8");

foreach my $key ( sort keys %{$result} ) {
    print $key . " = " . $result->value->{$key} . "\n";
}

Java

import java.net.*;
import java.util.*;

import org.apache.xmlrpc.*;
import org.apache.xmlrpc.client.*;

public class TestCase {
	public static void main(String[] args) throws MalformedURLException, XmlRpcException {
		XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();
		config.setServerURL(new URL("http://www.neuralnoise.com:6173/RPC2"));
		config.setBasicEncoding("UTF-8");
		
		XmlRpcClient client = new XmlRpcClient();
		client.setTransportFactory(new XmlRpcSunHttpTransportFactory(client));
		client.setConfig(config);
		
		Object[] params = { 
				"This is a test for the machine translation service",
				"en", "es"};
		
		Map<String, String> ret = (Map<String, String>) client.execute("translate", params);
		System.out.println(ret.get("translation"));
	}
}

C++

/*
 * g++ test.cc -o test -lxmlrpc_client++ -lxmlrpc++ -lxmlrpc_client -lxmlrpc_cpp -lxmlrpc_xmlparse -lxmlrpc_xmltok -lxmlrpc_server
 */

#include <cstdlib>
#include <string>
#include <iostream>
#include <xmlrpc-c/girerr.hpp>
#include <xmlrpc-c/base.hpp>
#include <xmlrpc-c/client_simple.hpp>

using namespace std;

int
main(int argc, char **) {
    try {
        string const serverUrl("http://www.neuralnoise.com:6173/RPC2");
        string const methodName("translate");

        xmlrpc_c::clientSimple myClient;
        xmlrpc_c::value result;

        myClient.call(serverUrl, methodName, "sss", &result, "test", "en", "es");

        map<string, xmlrpc_c::value> const resultStruct = xmlrpc_c::value_struct(result);
        map<string, xmlrpc_c::value>::const_iterator iter = resultStruct.find("translation");

        string ret = (string)xmlrpc_c::value_string(iter->second);

        cout << "Translation: " << ret << endl;
    } catch (exception const& e) {
        cerr << "Client threw error: " << e.what() << endl;
    } catch (...) {
        cerr << "Client threw unexpected error." << endl;
    }
    return 0;
}

Haskell

import Network.XmlRpc.Client

server = "http://www.neuralnoise.com:6173/RPC2"

translate :: String -> String -> String -> String -> IO [(String,String)]
translate url = remote url "translate"

main = do
       let x = "This is a test for the machine translation service."
           y = "en"
           z = "es"
       ret <- translate server x y z
       print ret

Emacs-Lisp

; put http://www.emacswiki.org/emacs/xml-rpc.el into a member of load-path and
(require 'xml-rpc)

(defvar apertium-server "http://www.neuralnoise.com:6173/RPC2")

(xml-rpc-method-call apertium-server 'translate "Això no és una prova." "ca" "en")


(defun clean-apertium-pairs (pairs)          ; optionally
  "Make a list of pairs ("from" . "to")."
  (setq apertium-pairs
	(mapcar (lambda (pair)
		  (cons (cdr (assoc "destLang" pair))
			(cdr (assoc "srcLang" pair))))
		pairs)))

(clean-apertium-pairs (xml-rpc-method-call
		       apertium-server 'languagePairs))
;; or async:
(xml-rpc-method-call-async 'clean-apertium-pairs
			   apertium-server 'languagePairs)

Benchmarks

See also