Difference between revisions of "Apertium-apy"
(→Usage) |
(→Usage) |
||
Line 15: | Line 15: | ||
== Usage == |
== Usage == |
||
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested: |
|||
<code> |
|||
<pre>curl --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre> |
|||
</code>Note that this sends a POST request, using curl or your browser to send a GET request is also possible. |
|||
{| class="wikitable" border="1" |
{| class="wikitable" border="1" |
||
|- |
|- |
||
! URL |
! URL |
||
! Function |
|||
! Parameters |
! Parameters |
||
! Example |
! Example |
||
|- |
|- |
||
| '''/listPairs''' |
| '''/listPairs''' |
||
| List available language pairs |
|||
| None |
| None |
||
| <pre> |
| <pre> |
||
$ curl |
$ curl http://localhost:2737/listPairs |
||
output |
|||
{"responseStatus": 200, "responseData": [ |
|||
{"sourceLanguage": "kaz", "targetLanguage": "tat"}, |
|||
{"sourceLanguage": "tat", "targetLanguage": "kaz"}, |
|||
{"sourceLanguage": "mk", "targetLanguage": "en"} |
|||
], "responseDetails": null} |
|||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/list''' |
| '''/list''' |
||
| List available mode information |
|||
| |
| |
||
*'''q''': type of information to list |
*'''q''': type of information to list |
||
Line 36: | Line 49: | ||
** taggers/disambiguators |
** taggers/disambiguators |
||
| <pre> |
| <pre> |
||
$ curl |
$ curl http://localhost:2737/list?q=analyzers |
||
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", |
|||
output |
|||
"tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} |
|||
</pre> |
</pre> |
||
<pre> |
<pre> |
||
$ curl |
$ curl http://localhost:2737/list?q=generators |
||
{"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} |
|||
output |
|||
</pre> |
</pre> |
||
<pre> |
<pre> |
||
$ curl |
$ curl http://localhost:2737/list?q=taggers |
||
{"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", |
|||
output |
|||
"tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
|||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/translate''' |
| '''/translate''' |
||
| Translate text |
|||
| |
| |
||
*'''langpair''': language pair to use for translation |
*'''langpair''': language pair to use for translation |
||
Line 57: | Line 73: | ||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/analyze''' |
| '''/analyze''' |
||
| Morphologically analyze text |
|||
| |
| |
||
*'''mode''': language to use for analysis |
*'''mode''': language to use for analysis |
||
*'''q''': text to analyze |
*'''q''': text to analyze |
||
| |
| |
||
<pre> |
|||
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
||
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], |
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], |
||
["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], |
["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], |
||
["?/?<sent>","?"],["./.<sent>",".\n"]] |
["?/?<sent>","?"],["./.<sent>",".\n"]] |
||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/generate''' |
| '''/generate''' |
||
| Generate surface forms from text |
|||
| |
| |
||
*'''mode''': language to use for generation |
*'''mode''': language to use for generation |
||
*'''q''': text to generate |
*'''q''': text to generate |
||
| <pre> |
| <pre> |
||
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg |
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate |
||
[["сен |
[["сен","^сен<v><tv><imp><p2><sg>$ "]] |
||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/perWord''' |
| '''/perWord''' |
||
| Perform morphological tasks per word |
|||
| |
| |
||
*'''language''': language to use for tasks |
*'''language''': language to use for tasks |
||
Line 88: | Line 108: | ||
** morph+tagger/morph+disambig (in any order) |
** morph+tagger/morph+disambig (in any order) |
||
*'''q''': text to perform tasks on |
*'''q''': text to perform tasks on |
||
| < |
| <pre> |
||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=morph&q=light" |
|||
[{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light"}] |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=tagger&q=light" |
|||
[{"analyses": ["light<adj><sint>"], "input": "light"}] |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=light" |
|||
[{"input": "light", "translations": ["luz<n><f><sg>", |
|||
"ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=translate&q=light" |
|||
[{"input": "light", "translations": ["ligero<adj>"]}] |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=biltrans+morph&q=light" |
|||
[{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], |
|||
"input": "light", "translations": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", |
|||
"encender<vblex><pres>"]}] |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=translate+tagger&q=light" |
|||
[{"analyses": ["light<adj><sint>"], "input": "light", "translations": ["ligero<adj>"]}] |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=light" |
|||
[{"ambiguousAnalyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], |
|||
"input": "light", "disambiguatedAnalyses": ["light<adj><sint>"]}] |
|||
</pre> |
|||
|- |
|- |
||
|} |
|} |
Revision as of 01:31, 19 December 2013
Apertium-APy stands for "Apertium API in Python". It's a simple apertium API server written in python, meant as a drop-in replacement for ScaleMT. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).
Contents
Installation
First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See Minimal_installation_from_SVN for how to do this. Then
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy cd apertium-apy export APERTIUMPATH="/path/to/apertium/svn/trunk" ./servlet.py "$APERTIUMPATH"
Optional arguments include:
- --langNamesDB: path to database of localized language names
- -port --port: port to run server on (2737 by default)
- --ssl: path to SSL certificate
Usage
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:
curl --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord
Note that this sends a POST request, using curl or your browser to send a GET request is also possible.
URL | Function | Parameters | Example |
---|---|---|---|
/listPairs | List available language pairs | None | $ curl http://localhost:2737/listPairs {"responseStatus": 200, "responseData": [ {"sourceLanguage": "kaz", "targetLanguage": "tat"}, {"sourceLanguage": "tat", "targetLanguage": "kaz"}, {"sourceLanguage": "mk", "targetLanguage": "en"} ], "responseDetails": null} |
/list | List available mode information |
|
$ curl http://localhost:2737/list?q=analyzers {"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} $ curl http://localhost:2737/list?q=generators {"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} $ curl http://localhost:2737/list?q=taggers {"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
/translate | Translate text |
|
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' output |
/analyze | Morphologically analyze text |
|
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"],["./.<sent>",".\n"]] |
/generate | Generate surface forms from text |
|
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate [["сен","^сен<v><tv><imp><p2><sg>$ "]] |
/perWord | Perform morphological tasks per word |
|
$ curl "http://localhost:2737/perWord?lang=en-es&modes=morph&q=light" [{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light"}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=tagger&q=light" [{"analyses": ["light<adj><sint>"], "input": "light"}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=light" [{"input": "light", "translations": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=translate&q=light" [{"input": "light", "translations": ["ligero<adj>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=biltrans+morph&q=light" [{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light", "translations": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=translate+tagger&q=light" [{"analyses": ["light<adj><sint>"], "input": "light", "translations": ["ligero<adj>"]}] $ curl "http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=light" [{"ambiguousAnalyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light", "disambiguatedAnalyses": ["light<adj><sint>"]}] |
Threading
Currently it uses TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).
Try it out
Try testing with e.g.
export APERTIUMPATH="/path/to/svn/trunk" python3 servlet "$APERTIUMPATH" 2737 & curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \ 'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output & curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den' curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den' curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
And see how the last three (after a slight wait) start outputting before the first request is done.
Morphological Analysis and Generation
To analyze text, send a POST or GET request to /analyze
with parameters mode
and q
set. For example:
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]]
The JSON response will consist of a list of lists each of form [analysis with following non-analyzed text*, original input token]
. To receive a list of valid analyzer modes, send a request to /listAnalyzers
.
To generate surface forms from an analysis, send a POST or GET request to /generate
with parameters mode
and q
set. For example:
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]]
The JSON response will consist of a list of lists each of form [generated form with following non-analyzed text*, original lexical unit input]
. To receive a list of valid generator modes, send a request to /listGenerators
.
* e.g. whitespace, superblanks
SSL
To test with a self-signed signature:
openssl req -new -x509 -keyout server.pem -out server.pem -days 365 -nodes
Then run with --ssl server.pem, and test with https and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):
curl -k --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze
If you have a signed signature for e.g. apache, it's likely to be split into two files, one .key and one .crt. You can cat them together into one to use with servlet.py:
cat server.key server.crt > server.keycrt
Now you should be able to use curl without -k for the domain which the certificate is signed for:
curl --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze
Remember to open port 2737 to your server.
TODO
- It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
- It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.
- http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.
- some language pairs still don't work (sme-nob?)
- hfst-proc -g doesn't work with null-flushing (or?)