Difference between revisions of "Apertium-apy"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple apertium API server written in python, meant as a drop-in replacement for [[ScaleMT]]. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).
 
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple apertium API server written in python, meant as a drop-in replacement for [[ScaleMT]]. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).
  +
   
 
== Threading ==
 
== Threading ==
  +
Currently it TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).
Currently the server accepts a translation request, sends it through the translate() function (which is currently pointing to translateMode() and bypassing the other possible translate*() functions). The problem is that if multiple requests are received too quickly (which happens often even with a single user, since the web interface we're using updates ~live while the user is typing), the server is still processing a previous request and gets "stuck". I.e., it'll continue accepting requests, but cannot return them.
 
  +
  +
http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for diffs) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.
  +
  +
Try testing with e.g.
  +
  +
python3 servlet "$APERTIUMPATH" 2737 &
  +
  +
curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \
  +
'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output &
  +
  +
curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den'
  +
curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den'
  +
curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den'
  +
  +
And see how the last three (after a slight wait) start outputting before the first request is done.
   
  +
==TODO==
So the behaviour we would like is that the server accept requests, and queue them until each translation is ready to send back. It should send translations back in the order in which they were received. It should also be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along. So we probably need something like the Queuing example here at tutorialspoint dot com /python/python_multithreading.htm . Assistance welcome.
 
  +
* It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
  +
* It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.
   
   

Revision as of 13:07, 10 October 2013

Apertium-APy stands for "Apertium API in Python". It's a simple apertium API server written in python, meant as a drop-in replacement for ScaleMT. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal).


Threading

Currently it TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text).

http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for diffs) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.

Try testing with e.g.

   python3 servlet "$APERTIUMPATH" 2737 &
   
   curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \
   'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output &
   
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   curl 'http://localhost:2737/translate?langpair=nb%7Cnn&q=men+ikke+den'
   

And see how the last three (after a slight wait) start outputting before the first request is done.

TODO

  • It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
  • It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.