Difference between revisions of "Apertium-apy"
(→Usage) |
(→Usage) |
||
Line 29: | Line 29: | ||
! Function |
! Function |
||
! Parameters |
! Parameters |
||
! |
! Output |
||
|- |
|- |
||
| '''/listPairs''' |
| '''/listPairs''' |
||
| List available language pairs |
| List available language pairs |
||
| None |
| None |
||
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an Array of language pair objects with keys <code>sourceLanguage</code> and </code>targetLanguage</code>. |
|||
<pre> |
|||
$ curl http://localhost:2737/listPairs |
$ curl http://localhost:2737/listPairs |
||
Line 52: | Line 53: | ||
** generators |
** generators |
||
** taggers/disambiguators |
** taggers/disambiguators |
||
| The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium). |
|||
<pre> |
|||
$ curl http://localhost:2737/list?q=analyzers |
$ curl http://localhost:2737/list?q=analyzers |
||
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", |
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", |
||
Line 72: | Line 74: | ||
*'''langpair''': language pair to use for translation |
*'''langpair''': language pair to use for translation |
||
*'''q''': text to translate |
*'''q''': text to translate |
||
| Returned ________ ?? |
|||
<pre> |
|||
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' |
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' |
||
output |
output |
||
Line 82: | Line 85: | ||
*'''mode''': language to use for analysis |
*'''mode''': language to use for analysis |
||
*'''q''': text to analyze |
*'''q''': text to analyze |
||
| Returned JS Array contains JS Arrays in the format <code>[analysis, input-text]</code>. |
|||
| |
|||
⚫ | |||
⚫ | |||
white-space: -moz-pre-wrap; |
|||
white-space: -pre-wrap; |
|||
white-space: -o-pre-wrap; |
|||
word-wrap: break-word;"> |
|||
$ curl -G --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
$ curl -G --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
||
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], |
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]] |
||
["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], |
|||
["?/?<sent>","?"],["./.<sent>",".\n"]] |
|||
</pre> |
</pre> |
||
|- |
|- |
||
Line 95: | Line 100: | ||
*'''mode''': language to use for generation |
*'''mode''': language to use for generation |
||
*'''q''': text to generate |
*'''q''': text to generate |
||
| Returned JS Array contains JS Arrays in the format <code>[generated, input-text]</code>. |
|||
⚫ | |||
⚫ | |||
$ curl -G --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate |
$ curl -G --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate |
||
[["сен","^сен<v><tv><imp><p2><sg>$ "]] |
[["сен","^сен<v><tv><imp><p2><sg>$ "]] |
||
Line 110: | Line 116: | ||
** morph |
** morph |
||
*'''q''': text to perform tasks on |
*'''q''': text to perform tasks on |
||
| Returned JS Array contains JS Objects each containing the key <code>input</code> and up to 4 other keys corresponding to the requested modes (<code>tagger</code>, <code>morph</code>, <code>biltrans</code> and <code>translate</code>). |
|||
⚫ | |||
<pre style="white-space: pre-wrap; |
|||
white-space: -moz-pre-wrap; |
white-space: -moz-pre-wrap; |
||
white-space: -pre-wrap; |
white-space: -pre-wrap; |
||
Line 166: | Line 173: | ||
| |
| |
||
*'''locale''': language to get localized language names in |
*'''locale''': language to get localized language names in |
||
*'''languages''': list of '+' delimited language codes to retrieve localized names for (optional) |
*'''languages''': list of '+' delimited language codes to retrieve localized names for (optional - if not specified, all available codes will be returned) |
||
| Return JS Object contains a mapping of requested language codes to localized language names |
|||
| <pre> |
|||
⚫ | |||
$ curl http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk |
$ curl http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk |
||
{"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"} |
{"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"} |
Revision as of 19:19, 21 December 2013
Apertium-APy stands for "Apertium API in Python". It's a simple Apertium API server written in Python, meant as a drop-in replacement for ScaleMT. It is currently found in the SVN under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the one in trunk/apertium-tools/html-tools (which can be easily run using python3 -m http.server
).
Installation
First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See Minimal_installation_from_SVN for how to do this. APY uses Tornado as its web framework, install it via pip install tornado
or other variants depending on your environment. Then checkout APY from SVN and run it:
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy cd apertium-apy export APERTIUMPATH="/path/to/apertium/svn/trunk" ./servlet.py "$APERTIUMPATH"
Optional arguments include:
- -l --langNames: path to database of localized language names
- -p --port: port to run server on (2737 by default)
- -c --sslCert: path to SSL certificate
- -k --sslKey: path to SSL key file
Usage
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:
curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord
It can also be tested through your browser or through HTTP calls. Unfortunately, curl does not decode JSON output by default and to make testing easier, a APY Sandbox is provided in the SVN with Apertium HTML-Tools at /trunk/apertium-tools/apertium-html-tools.
URL | Function | Parameters | Output |
---|---|---|---|
/listPairs | List available language pairs | None | To be consistent with ScaleMT, the returned JS Object contains a responseData key with an Array of language pair objects with keys sourceLanguage and targetLanguage.
$ curl http://localhost:2737/listPairs {"responseStatus": 200, "responseData": [ {"sourceLanguage": "kaz", "targetLanguage": "tat"}, {"sourceLanguage": "tat", "targetLanguage": "kaz"}, {"sourceLanguage": "mk", "targetLanguage": "en"} ], "responseDetails": null} |
/list | List available mode information |
|
The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).
$ curl http://localhost:2737/list?q=analyzers {"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} $ curl http://localhost:2737/list?q=generators {"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} $ curl http://localhost:2737/list?q=taggers {"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
/translate | Translate text |
|
Returned ________ ??
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' output |
/analyze | Morphologically analyze text |
|
Returned JS Array contains JS Arrays in the format [analysis, input-text] .
$ curl -G --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]] |
/generate | Generate surface forms from text |
|
Returned JS Array contains JS Arrays in the format [generated, input-text] .
$ curl -G --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate [["сен","^сен<v><tv><imp><p2><sg>$ "]] |
/perWord | Perform morphological tasks per word |
|
Returned JS Array contains JS Objects each containing the key input and up to 4 other keys corresponding to the requested modes (tagger , morph , biltrans and translate ).
curl http://localhost:2737/perWord?lang=en-es&modes=morph&q=let+there+be+light [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=tagger&q=let+there+be+light [{"input": "let", "tagger": "let<vblex><pp>"}, {"input": "there", "tagger": "there<adv>"}, {"input": "be", "tagger": "be<vbser><inf>"}, {"input": "light", "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=let+there+be+light [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=translate&q=let+there+be+light [{"input": "let", "translate": ["dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=let+there+be+light [{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans&q=let+there+be+light [{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans&q=let+there+be+light [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=tagger+biltrans&q=let+there+be+light [{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=tagger+translate&q=let+there+be+light [{"input": "let", "translate": ["dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+translate&q=let+there+be+light [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans+tagger&q=let+there+be+light [{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans+tagger&q=let+there+be+light [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+translate+tagger&q=let+there+be+light [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans&q=let+there+be+light [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans+tagger&q=let+there+be+light [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
/listLocalizedLanguageNames | Get localized language names |
|
Return JS Object contains a mapping of requested language codes to localized language names
$ curl http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk {"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"} |
SSL
APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running:
openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes
Then run APY with --sslKey server.key --sslCert server.crt
, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze
If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for:
curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze
Remember to open port 2737 to your server.
Threading
Currently it uses TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).
Try it out
Try testing with e.g.
export APERTIUMPATH="/path/to/svn/trunk" python3 servlet "$APERTIUMPATH" 2737 & curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \ 'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output & curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den' curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den' curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
And see how the last three (after a slight wait) start outputting before the first request is done.
Morphological Analysis and Generation
To analyze text, send a POST or GET request to /analyze
with parameters mode
and q
set. For example:
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]]
The JSON response will consist of a list of lists each of form [analysis with following non-analyzed text*, original input token]
. To receive a list of valid analyzer modes, send a request to /listAnalyzers
.
To generate surface forms from an analysis, send a POST or GET request to /generate
with parameters mode
and q
set. For example:
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]]
The JSON response will consist of a list of lists each of form [generated form with following non-analyzed text*, original lexical unit input]
. To receive a list of valid generator modes, send a request to /listGenerators
.
* e.g. whitespace, superblanks
TODO
- It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
- It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.
- http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.
- some language pairs still don't work (sme-nob?)
- hfst-proc -g doesn't work with null-flushing (or?)