One of our GSoC project ideas is for a daemon mode; this page collects ideas and suggestions to the potential implementor of this.
Apertium is implemented as a set of separate programs, each performing their individual tasks separately, communicating in the usual Unix pipeline manner.
Each linguistic package contains a 'modes' XML file, which specifies which programs are invoked, in which order, and specifies the parameters and datafiles specific to that language pair. Each language pair can contain a number of modes; most of these are used for debugging each stage of the pipeline. At the moment, the modes are converted to a shell script, which is called by the apertium script.
Each program is effectively implemented as a library; the main() function sets up the environment, parses arguments, and calls a function which performs the task at hand.
Apertium's pipeline approach is extremely flexible, allowing new modules to be added to the system easily, but this implementation can be quite resource intensive, especially when Apertium is being used as the translation backend on a server.
Work to date
Wynand, who developed apertium-dbus, started work towards daemon-like operation. lt-proc has a 'null flush' feature; this allows it to remain running, flushing its buffers when it receives a null character. A similar feature would need to be added to the rest of the programs in the pipeline. In addition, transfer, interchunk, and postchunk would need to reread the variables section (optimally, caching the location in the XML file on the first read).
- Reuse thread.c from memcached to handle worker threads
- Read the modes.xml file directly, and generate the pipeline from it
- Where possible, link to the apertium functions directly, rather than spawning separate processes (though that will still be required by some language modes)
- Add a sentence splitter: preferably with SRX support (to allow for translation caching)
- Make the deformatters work as libraries