Apertium-apy/Translation
How translation currently works in APY:
translate.py
The function translate() is the main entry point to translation. It expects a text string, a lock (threading.Rlock) and a pipeline (a pair of input/output file descriptors). The lock is there to make sure only one process can do translation on a certain pipeline at any one time. Each pipeline should have its own Rlock.
The function translateNULFlush() first deformats the text, then sends that into the input fd, then reads the output fd until it sees a NUL byte. Then it reformats and returns. It's wrapped in "with translock" so we make sure we don't read translations based on other people's input …
We could have defined translate() as just return translateNULFlush(toTranslate, translock, pipeline)
, but then what if someone sends in a huge text? We'd lock up that pipeline for too long, and everyone else would have to wait. <smaller>Also, we'd fill up the FIFO buffers: since we don't read the output of translation until we've sent in all the input, we would be trying to push in more data into the input file descriptor, but the buffer would be full and the program would hang until someone read off the output file descriptor.</smaller> So to solve that, we split the input text up into chunks, and send one chunk at a time into translateNULflush. So translate() calls translateSplitting() once which calls translateNULFlush() a bunch of times (or only once for very short texts).
servlet.py
When you start servlet, you choose the number of processes with the -j switch. Each process has its own dict of pipelines and pipeline_locks. So in one servlet process, there might be an TranslateHandler.pielines[("eng","kaz")]=(fd_in, fd_out) and a corresponding TranslateHandler.pieline_locks[("eng","kaz")]=threading.Rlock() (this is the pipeline and translock, respectively, sent to translate.py's translate() function).
When TranslateHandler.get is called, it first ensures e.g. eng-kaz is started (the function runPipeline()), the first and last process of the pipeline will be assigned to self.pipelines[("eng","kaz")] when the pipeline is started.