Apertium going SOA

From Apertium
Revision as of 03:59, 22 April 2009 by Deadbeef (talk | contribs)
Jump to navigation Jump to search
Component Diagram of the project: an Apertium Server implements a XML Stream interface (and eventually others), that can be accessed using an Apertium Client or a Web Service (the Web Service can be accessed using a REST or a SOAP interface). It's easy then, for external applications written in various languages, to access Apertium's capabilities by using a Web Service (for example, to integrate translation in an Ajax application, in an IM client, in a Translation Service of a large IT service-oriented and geographically distributed infrastructure (for example, to collaborate easier with engineers in an offshore country), and so on).
A possible high-level view of Apertium Server's internals.
Sequence Diagram showing how an IM Client can use Apertium Server's capabilities (accessed through a Web Service interface) to implement real-time translation (bot in input and output) of instant messages.

The aim of this project is to design and implement a Client-Server architecture for Apertium. Actually, to translate many documents, many Apertium processes are created and each one of them loads transducers, grammars etc. from scratch, causing a waste of resources and, so, a reduction of scalability. To solve this problem, a solution is to implement an Apertium Server (or Daemon) that doesn't need to reload all the resources for every translation task. In addition, this kind of service would be able to handle multiple request at the same time (useful, for example, in a Web 2.0-oriented enviroment), would improve scalability, and could be easily included into existing business processes in an existing IT infrastructure with the minimum effort.

In addition, this project aims also to implement a Web Service acting as a gateway between the Apertium Server and external applications (loading Apertium inside the Web Service itself would be nonsense, since a Web Service is stateless and it wouldn't solve the scalability problem): the Web Service will offer both a SOAP and a REST interface, to make it easier for external applications/services (for example: IM clients, web sites, large IT business processes..) to include translation capabilities without importing the entire Apertium application.

Server-Client communication protocol

A possible communication protocol to invoke Apertium Server's functionalities is XML-RPC, a remote procedure call protocol which uses XML to encode its calls and HTTP as a transport mechanism. The list of methods the Apertium Server will offer will be probably similar to the following:

  • array<string> GetAvailableModes();
  • Translate(string Message, string modeName);

TODO: add methods to get Server's current capabilities and load; this is useful to implement some kind of load balancing in the case of a cluster of Apertium Servers

A possible Use Case: an Healthcare organization

This sample shows how Apertium Server's capabilities can be exposed by a Translation Service inside an existing service-oriented IT infrastructure; in this case, the Actor (a non english-speaking Medic) interacts with a manager of Clinical Documents by storing some Health Records: those records often include natural-language text that needs external tools like MetaMap to be mapped on ontologies or concepts. In this case, the non-English natural-language text is first translated in the English language using a Translation Service using Apertium Server, and the result is then given to MetaMap, that maps the now-English text into concepts (then stored in a Knowledge Base for futher analysis).

In Healthcare Information Systems (HIS), to improve external services' access and integration, there's a general trend to implement IT infrastructure based on a SOA (Service-Oriented Architecture) model; in this use case, I show how an Healthcare Organization of non English speaking countries can greatly benefit of the integration of a Translation Service implemented using Apertium in their IT infrastructure.

MetaMap is an online application that allows mapping text to UMLS Metathesaurus concepts, which is very useful interoperability among different languages and systems within the biomedical domain. MetaMap Transfer (MMTx) is a Java program that makes MetaMap available to biomedical researchers. Currently MetaMap only works effectively on text written in the English language, which difficult the use of UMLS Metathesaurus to extract concepts from non-English biomedical texts.

A possible solution to this problem is to translate the non-English biomedical texts into English, so MetaMap (and similar Text Mining tools) can effectively work on it.