Difference between revisions of "Apertium-apy"

Revision as of 20:29, 1 December 2013

Apertium-APy stands for "Apertium API in Python". It's a simple apertium API server written in python, meant as a drop-in replacement for ScaleMT. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).

Installation

First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See Minimal_installation_from_SVN for how to do this. Then

svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy
cd apertium-apy
export APERTIUMPATH="/path/to/apertium/svn/trunk"
./servlet.py "$APERTIUMPATH" 2737

Threading

Currently it uses TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).

Try it out

Try testing with e.g.

   export APERTIUMPATH="/path/to/svn/trunk"
   python3 servlet "$APERTIUMPATH" 2737 &
   
   curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \
   'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output &
   
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'

And see how the last three (after a slight wait) start outputting before the first request is done.

Morphological Analysis and Generation

To analyze text, send a POST or GET request to /analyze with parameters mode and q set. For example:

   $ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
   [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]]

The JSON response will consist of a list of lists each of form [analysis with following non-analyzed text*, original input token]. To receive a list of valid analyzer modes, send a request to /listAnalyzers.

To generate surface forms from an analysis, send a POST or GET request to /generate with parameters mode and q set. For example:

   $ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate
   [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]]

The JSON response will consist of a list of lists each of form [generated form with following non-analyzed text*, original lexical unit input]. To receive a list of valid generator modes, send a request to /listGenerators.

* e.g. whitespace, superblanks

TODO

It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.
http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.
some language pairs still don't work (sme-nob?)
hfst-proc -g doesn't work with null-flushing (or?)

@@ Line 28: / Line 28: @@
 And see how the last three (after a slight wait) start outputting before the first request is done.
+===Morphological Analysis and Generation===
+To analyze text, send a POST or GET request to <code>/analyze</code> with parameters <code>mode</code> and <code>q</code> set. For example:
+    $ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
+    [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]]
+The JSON response will consist of a list of lists each of form <code>[analysis with following non-analyzed text*, original input token]</code>. To receive a list of valid analyzer modes, send a request to <code>/listAnalyzers</code>.
+To generate surface forms from an analysis, send a POST or GET request to <code>/generate</code> with parameters <code>mode</code> and <code>q</code> set. For example:
+    $ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate
+    [["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]]
+The JSON response will consist of a list of lists each of form <code>[generated form with following non-analyzed text*, original lexical unit input]</code>. To receive a list of valid generator modes, send a request to <code>/listGenerators</code>.
+<nowiki>*</nowiki> e.g. whitespace, superblanks
 ==TODO==

Difference between revisions of "Apertium-apy"

Revision as of 20:29, 1 December 2013

Contents

Installation

Threading

Try it out

Morphological Analysis and Generation

TODO

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools