Apertium-apy

Apertium-APy stands for "Apertium API in Python". It's a simple apertium API server written in python, meant as a drop-in replacement for ScaleMT. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).

Installation

First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See Minimal_installation_from_SVN for how to do this. APY uses Tornado as its web framework, install it via pip install tornado or other variants depending on your environment. Then checkout APY from SVN and run it:

svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy
cd apertium-apy
export APERTIUMPATH="/path/to/apertium/svn/trunk"
./servlet.py "$APERTIUMPATH"

Optional arguments include:

-l --langNames: path to database of localized language names
-p --port: port to run server on (2737 by default)
-c --sslCert: path to SSL certificate
-k --sslKey: path to SSL key file

Usage

APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:

curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord

It can also be tested through your browser or through HTTP calls.

URL	Function	Parameters	Example
/listPairs	List available language pairs	None	$ curl http://localhost:2737/listPairs {"responseStatus": 200, "responseData": [ {"sourceLanguage": "kaz", "targetLanguage": "tat"}, {"sourceLanguage": "tat", "targetLanguage": "kaz"}, {"sourceLanguage": "mk", "targetLanguage": "en"} ], "responseDetails": null}
/list	List available mode information	q: type of information to list pairs (alias for /listPairs) analyzers/analysers generators taggers/disambiguators	$ curl http://localhost:2737/list?q=analyzers {"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} $ curl http://localhost:2737/list?q=generators {"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} $ curl http://localhost:2737/list?q=taggers {"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"}
/translate	Translate text	langpair: language pair to use for translation q: text to translate	$ curl 'http://localhost:2737/translate?langpair=kaz\|tat&q=Сен+бардың+ба?' output
/analyze	Morphologically analyze text	mode: language to use for analysis q: text to analyze	$ curl -G --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"],["./.<sent>",".\n"]]
/generate	Generate surface forms from text	mode: language to use for generation q: text to generate	$ curl -G --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate [["сен","^сен<v><tv><imp><p2><sg>$ "]]
/perWord	Perform morphological tasks per word	language: language to use for tasks modes: morphological tasks to perform on text tagger/disambig biltrans translate biltrans+morph (in any order) translate+tagger (in any order) morph+tagger/morph+disambig (in any order) q: text to perform tasks on	$ curl "http://localhost:2737/perWord?lang=en-es&modes=morph&q=light" [{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light"}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=tagger&q=light" [{"analyses": ["light<adj><sint>"], "input": "light"}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=light" [{"input": "light", "translations": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=translate&q=light" [{"input": "light", "translations": ["ligero<adj>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=biltrans+morph&q=light" [{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light", "translations": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=translate+tagger&q=light" [{"analyses": ["light<adj><sint>"], "input": "light", "translations": ["ligero<adj>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=light" [{"ambiguousAnalyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light", "disambiguatedAnalyses": ["light<adj><sint>"]}]
/listLocalizedLanguageNames	Get localized language names	locale: language to get localized language names in languages: list of '+' delimited language codes to retrieve localized names for (optional)	$ curl http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk {"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"}

SSL

APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running:

openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes

Then run APY with --sslKey server.key --sslCert server.crt, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):

curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze

If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for:

curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze

Remember to open port 2737 to your server.

Threading

Currently it uses TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).

Try it out

Try testing with e.g.

   export APERTIUMPATH="/path/to/svn/trunk"
   python3 servlet "$APERTIUMPATH" 2737 &
   
   curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \
   'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output &
   
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'

And see how the last three (after a slight wait) start outputting before the first request is done.

Morphological Analysis and Generation

To analyze text, send a POST or GET request to /analyze with parameters mode and q set. For example:

   $ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
   [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]]

The JSON response will consist of a list of lists each of form [analysis with following non-analyzed text*, original input token]. To receive a list of valid analyzer modes, send a request to /listAnalyzers.

To generate surface forms from an analysis, send a POST or GET request to /generate with parameters mode and q set. For example:

   $ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate
   [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]]

The JSON response will consist of a list of lists each of form [generated form with following non-analyzed text*, original lexical unit input]. To receive a list of valid generator modes, send a request to /listGenerators.

* e.g. whitespace, superblanks

TODO

It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.
http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.
some language pairs still don't work (sme-nob?)
hfst-proc -g doesn't work with null-flushing (or?)

Apertium-apy

Contents

Installation

Usage

SSL

Threading

Try it out

Morphological Analysis and Generation

TODO

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools