Difference between revisions of "Apertium-apy"
|  (port) |  (add to usage and installation) | ||
| Line 7: | Line 7: | ||
| cd apertium-apy | cd apertium-apy | ||
| export APERTIUMPATH="/path/to/apertium/svn/trunk" | export APERTIUMPATH="/path/to/apertium/svn/trunk" | ||
| ./servlet.py "$APERTIUMPATH" | ./servlet.py "$APERTIUMPATH" | ||
| </pre> | </pre> | ||
| Optional arguments include: | |||
| *'''--langNamesDB''': path to database of localized language names | |||
| *'''-port --port''': port to run server on (2737 by default) | |||
| *'''--ssl:''' path to SSL certificate | |||
| == Usage == | |||
| {| class="wikitable" border="1" | |||
| |- | |||
| ! URL | |||
| ! Parameters | |||
| ! Example | |||
| |- | |||
| | '''/listPairs''' - List available pairs | |||
| | None | |||
| | <pre> | |||
| $ curl 'http://localhost:2737/listPairs' | |||
| output | |||
| </pre> | |||
| |- | |||
| | '''/list''' - List available mode information | |||
| |  | |||
| *'''q''': type of information to list | |||
| ** pairs (alias for /listPairs) | |||
| ** analyzers/analysers | |||
| ** generators | |||
| ** taggers/disambiguators | |||
| | <pre> | |||
| $ curl 'http://localhost:2737/list?q=analyzers' | |||
| output | |||
| </pre> | |||
| <pre> | |||
| $ curl 'http://localhost:2737/list?q=generators' | |||
| output | |||
| </pre> | |||
| <pre> | |||
| $ curl 'http://localhost:2737/list?q=taggers' | |||
| output | |||
| </pre> | |||
| |- | |||
| | '''/translate''' - Translate text | |||
| |  | |||
| *'''langpair''': language pair to use for translation | |||
| *'''q''': text to translate | |||
| | <pre> | |||
| $ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' | |||
| output | |||
| </pre> | |||
| |- | |||
| | '''/analyze''' - Morphologically analyze text | |||
| |  | |||
| *'''mode''': language to use for analysis | |||
| *'''q''': text to analyze | |||
| | <pre> | |||
| $ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze | |||
| [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], | |||
| ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], | |||
| ["?/?<sent>","?"],["./.<sent>",".\n"]] | |||
| </pre> | |||
| |- | |||
| | '''/generate''' - Generate surface forms from text | |||
| |  | |||
| *'''mode''': language to use for generation | |||
| *'''q''': text to generate | |||
| | <pre> | |||
| $ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate | |||
| [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]] | |||
| </pre> | |||
| |- | |||
| | '''/perWord''' - Perform morphological tasks per word | |||
| |  | |||
| *'''language''': language to use for tasks | |||
| *'''modes''': morphological tasks to perform on text | |||
| ** tagger/disambig | |||
| ** biltrans | |||
| ** translate | |||
| ** biltrans+morph (in any order) | |||
| ** translate+tagger (in any order) | |||
| ** morph+tagger/morph+disambig (in any order) | |||
| *'''q''': text to perform tasks on | |||
| | <pre> | |||
| |- | |||
| |} | |||
| == Threading == | == Threading == | ||
Revision as of 19:00, 18 December 2013
Apertium-APy stands for "Apertium API in Python". It's a simple apertium API server written in python, meant as a drop-in replacement for ScaleMT. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).
Installation
First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See Minimal_installation_from_SVN for how to do this. Then
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy cd apertium-apy export APERTIUMPATH="/path/to/apertium/svn/trunk" ./servlet.py "$APERTIUMPATH"
Optional arguments include:
- --langNamesDB: path to database of localized language names
- -port --port: port to run server on (2737 by default)
- --ssl: path to SSL certificate
Usage
| URL | Parameters | Example | 
|---|---|---|
| /listPairs - List available pairs | None | $ curl 'http://localhost:2737/listPairs' output | 
| /list - List available mode information | 
 | $ curl 'http://localhost:2737/list?q=analyzers' output $ curl 'http://localhost:2737/list?q=generators' output $ curl 'http://localhost:2737/list?q=taggers' output | 
| /translate - Translate text | 
 | $ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' output | 
| /analyze - Morphologically analyze text | 
 | $ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"],["./.<sent>",".\n"]] | 
| /generate - Generate surface forms from text | 
 | $ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]] | 
| /perWord - Perform morphological tasks per word | 
 | 
|-
|}
== Threading ==
Currently it uses TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).
===Try it out===
Try testing with e.g.
    
    export APERTIUMPATH="/path/to/svn/trunk"
    python3 servlet "$APERTIUMPATH" 2737 &
    
    curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \
    'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output &
    
    curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den'
    curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den'
    curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den'
    
And see how the last three (after a slight wait) start outputting before the first request is done.
===Morphological Analysis and Generation===
To analyze text, send a POST or GET request to <code>/analyze</code> with parameters <code>mode</code> and <code>q</code> set. For example: 
    $ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
    [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]]
The JSON response will consist of a list of lists each of form <code>[analysis with following non-analyzed text*, original input token]</code>. To receive a list of valid analyzer modes, send a request to <code>/listAnalyzers</code>.
To generate surface forms from an analysis, send a POST or GET request to <code>/generate</code> with parameters <code>mode</code> and <code>q</code> set. For example: 
    $ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate
    [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]]
The JSON response will consist of a list of lists each of form <code>[generated form with following non-analyzed text*, original lexical unit input]</code>. To receive a list of valid generator modes, send a request to <code>/listGenerators</code>.
* e.g. whitespace, superblanks
===SSL===
To test with a self-signed signature:
<pre>
openssl req -new -x509 -keyout server.pem -out server.pem -days 365 -nodes
Then run with --ssl server.pem, and test with https and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures): curl -k --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze 
 cat server.key server.crt > server.keycrt Now you should be able to use curl without -k for the domain which the certificate is signed for: curl --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze Remember to open port 2737 to your server. TODO
 | 

