Difference between revisions of "Apertium-apy"
(→Usage) |
Harikrishna (talk | contribs) |
||
(79 intermediate revisions by 10 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for [[ScaleMT]]. It is currently found in |
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for [[ScaleMT]]. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in [https://github.com/apertium/apertium-apy GitHub], where [https://github.com/apertium/apertium-apy/blob/master/servlet.py servlet.py] contains the relevant web server bits. The server is used by front ends like [[apertium-html-tools]] (on apertium.org) and [https://www.mediawiki.org/wiki/Content_translation Mediawiki Content Translation]. |
||
The |
The https://apertium.org page uses an installation which currently only runs ''released'' language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that. |
||
== Test it! == |
|||
<pre> |
|||
$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse |
|||
[["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]] |
|||
</pre> |
|||
== Installation == |
== Installation == |
||
<span style="color: #f00;">''See [[/Debian]] for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.''</span> |
|||
First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See [[Minimal_installation_from_SVN]] for how to do this. |
|||
First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See [[Installation]] for how to do this. |
|||
APY uses [http://www.tornadoweb.org/en/stable/ Tornado] as its web framework. Ensure that you install the Python 3 versions of any dependencies. On Debian/Ubuntu, you can do |
|||
You should have Python '''3.4''' or newer (though 3.2 has been reported to work as of 324a185). |
|||
APY uses [http://www.tornadoweb.org/en/stable/ Tornado 3.1 or newer] as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do |
|||
<pre> |
<pre> |
||
sudo apt-get install python3- |
sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion |
||
sudo pip3 install --upgrade tornado |
|||
</pre> |
</pre> |
||
Or you can install it via <code>pip3 install tornado</code> or other variants depending on your environment. |
|||
Then |
Then clone APY from github and run it: |
||
<pre> |
<pre> |
||
git clone git@github.com:apertium/apertium-apy.git |
|||
cd apertium-apy |
cd apertium-apy |
||
./servlet.py /usr |
./servlet.py /usr/share/apertium # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs |
||
</pre> |
</pre> |
||
See '''./servlet.py --help''' for documentation on how to start APY. Here are some popular optional arguments: |
|||
*'''-l --lang-names''': path to sqlite3 database of localized language names (see [[#List localised language names]]; you should include this if you're using [[apertium-html-tools]]) |
|||
Optional arguments include: |
|||
*'''-l --lang-names''': path to sqlite database of localized language names (<code>unicode.db</code> by default) |
|||
*'''-p --port''': port to run server on (2737 by default) |
*'''-p --port''': port to run server on (2737 by default) |
||
*'''-c --ssl-cert:''' path to SSL certificate |
*'''-c --ssl-cert:''' path to SSL certificate |
||
*'''-k --ssl-key:''' path to SSL key file |
*'''-k --ssl-key:''' path to SSL key file |
||
*'''- |
*'''-j --num-processes:''' number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs) |
||
*'''-s --nonpairs-path:''' include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium |
*'''-s --nonpairs-path:''' include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout) |
||
*'''-f --missing-freqs:''' path to sqlite3 database of words that were unknown (requires <code>sudo apt-get install sqlite3</code>) |
|||
*'''-i --max-pipes-per-pair:''' how many pipelines we can have per language pair (per http server), default = 1 |
|||
*'''-u --max-users-per-pipe:''' if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5) |
|||
*'''-m --max-idle-secs:''' after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM) |
|||
*'''-n --min-pipes-per-pair:''' when shutting down idle pairs, keep at least this many open (default = 0) |
|||
*'''-r --restart-pipe-after:''' if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests |
|||
===Installing dependencies without root=== |
|||
If you don't have root, you can still install the python dependencies with |
|||
<pre> |
|||
$ pip3 install --user --upgrade tornado |
|||
</pre> |
|||
(But your server still needs <code>build-essential python3-dev python3-pip zlib1g-dev</code> installed.) |
|||
Then you just need to run <pre>PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH</pre> before starting APY. |
|||
===Installing dependencies without root nor pip3=== |
|||
Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev</code>), but this is simpler if you don't want to mess with pip. |
|||
Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do |
|||
<pre> |
|||
cd apertium-apy |
|||
tar xf ~/Nedlastingar/tornado-4.3.tar.gz |
|||
( cd tornado-4.3 && python3 setup.py build ) |
|||
ln -s tornado-4.3/build/lib*/tornado tornado |
|||
</pre> |
|||
===Optional features=== |
===Optional features=== |
||
====List localised language names==== |
====List localised language names==== |
||
If you use [[apertium-html-tools]], you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's <code>sudo apt-get install sqlite3</code>), then do |
|||
If you have sqlite3, you can do |
|||
<pre> |
<pre> |
||
make |
make |
||
</pre> |
</pre> |
||
to create the |
to create the langNames.db used for the <code>/listLanguageNames</code> function. |
||
====Language identification==== |
====Language identification==== |
||
Line 57: | Line 95: | ||
<code> |
<code> |
||
<pre>curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre> |
<pre>curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre> |
||
</code> It can also be tested through your browser or through HTTP calls. Unfortunately, curl does '''not''' decode JSON output by default and to make testing easier, a APY Sandbox is provided |
</code> It can also be tested through your browser or through HTTP calls. Unfortunately, curl does '''not''' decode JSON output by default and to make testing easier, a APY Sandbox is provided with [[Apertium-html-tools]]. |
||
{| class="wikitable" border="1" |
{| class="wikitable" border="1" |
||
Line 110: | Line 148: | ||
*'''langpair''': language pair to use for translation |
*'''langpair''': language pair to use for translation |
||
*'''q''': text to translate |
*'''q''': text to translate |
||
*'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words |
|||
*'''deformat''': deformatter to be used: one of html (default), txt, rtf |
|||
*'''reformat''': deformatter to be used: one of html, html-noent (default), txt, rtf |
|||
*'''format''': if deformatter and reformatter are the same, they can be specified here |
|||
For more about formatting, please see [http://wiki.apertium.org/wiki/Format_handling Format Handling]. |
|||
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an JS Object that has key <code>translatedText</code> that contains the translated text. |
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an JS Object that has key <code>translatedText</code> that contains the translated text. |
||
<pre> |
<pre> |
||
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' |
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' |
||
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} |
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} |
||
$ echo Сен бардың ба? > myfile |
|||
$ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat' |
|||
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} |
|||
</pre> |
|||
The following two queries contain nonstandard whitespace characters and are equivalent: |
|||
<pre> |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&deformat=txt&reformat=txt' |
|||
{"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&format=txt' |
|||
{"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} |
|||
</pre> |
|||
The following two queries illustrate the difference between the <code>html</code> and <code>html-noent</code> reformatter: |
|||
<pre> |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html' |
|||
{"responseData": {"translatedText": "Qu&eacute; hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent' |
|||
{"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} |
|||
</pre> |
|||
|- |
|||
| '''/translateDoc''' |
|||
| Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex) |
|||
| |
|||
*'''langpair''': language pair to use for translation |
|||
*'''file''': document to translate |
|||
*'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words |
|||
| Returns the translated document. |
|||
<pre> |
|||
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt |
|||
</pre> |
</pre> |
||
|- |
|- |
||
Line 241: | Line 314: | ||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/stats''' |
|||
| Return some statistics about pair usage, uptime, portion of time spent actively translating |
|||
| |
|||
*'''requests=N''' (optional): limit period-based stats to last N requests |
|||
| Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py) |
|||
<pre> |
|||
$ curl -Ss localhost:2737/stats|jq .responseData |
|||
{ |
|||
"holdingPipes": 0, |
|||
"periodStats": { |
|||
"totTimeSpent": 10.760803, |
|||
"ageFirstRequest": 19.609394, |
|||
"totChars": 2718, |
|||
"requests": 8, |
|||
"charsPerSec": 252.58 |
|||
}, |
|||
"runningPipes": { |
|||
"eng-spa": 1 |
|||
}, |
|||
"useCount": { |
|||
"eng-spa": 8 |
|||
}, |
|||
"uptime": 26 |
|||
} |
|||
</pre> |
|||
|- |
|||
| '''/spellCheck''' |
|||
| '''Note: This endpoint is not yet available in the main branch.''' Handles spell-checking requests using Voikko or Divvun spell checkers. |
|||
| |
|||
*'''q''': The text to be spell-checked (String, Required, e.g., `қазақша билмеймін`) |
|||
*'''lang''': The language of the text (String, Required, e.g., `kaz`) |
|||
*'''spellchecker''': The spell checker to use (String, Optional, Defaults to `voikko`, e.g., `divvun`) |
|||
| The output is a JSON array where each element represents a token from the input text. Each token includes the following information: |
|||
<pre> |
|||
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz' |
|||
[ |
|||
{"token": "қазақша", "known": true, "sugg": []}, |
|||
{"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]} |
|||
] |
|||
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun' |
|||
[ |
|||
{"token": "қазақша", "known": true, "sugg": []}, |
|||
{"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]} |
|||
] |
|||
</pre> |
|||
|} |
|} |
||
Line 249: | Line 369: | ||
</pre> |
</pre> |
||
Then run APY with <code>-- |
Then run APY with <code>--ssl-key server.key --ssl-cert server.crt</code>, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures): |
||
<pre> |
<pre> |
||
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze |
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze |
||
Line 261: | Line 381: | ||
== Gateway == |
== Gateway == |
||
A gateway for APY is located in the [https:// |
A gateway for APY is located in the [https://github.com/apertium/apertium-apy same directory] and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding <code>/list</code> requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers. |
||
A list of APY servers is a required positional argument; an example server list is [https:// |
A list of APY servers is a required positional argument; an example server list is [https://github.com/apertium/apertium-apy/blob/master/serverlist-example provided] in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one. |
||
The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each <code>(mode, language)</code> and forwards requests to the fastest server as measured in units of response time per response length. |
The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each <code>(mode, language)</code> and forwards requests to the fastest server as measured in units of response time per response length. |
||
== |
==Running on init== |
||
===Systemd=== |
|||
See [[Apy/Debian]] for the quickstart. |
|||
====Running as a --user unit==== |
|||
If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do: |
|||
<pre> |
|||
sudo apt-get install dbus libpam-systemd # or dnf on Fedora etc. |
|||
sudo loginctl enable-linger tussenvoegsel |
|||
</pre> |
|||
To read the logs without sudo, admin will also have to enable persistent logs (see [[Apertium-apy#Persistent_logs|below]]). |
|||
Then as your "tussenvoegsel" user, do |
|||
<pre> |
|||
mkdir -p ~/.config/systemd/user/ |
|||
git clone https://github.com/apertium/apertium-apy |
|||
cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/ |
|||
</pre> |
|||
Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy. |
|||
Here's a full example apy.service file: |
|||
<pre> |
|||
$ cat ~/.config/systemd/user/apy.service |
|||
[Unit] |
|||
Description=Translation server and API for Apertium |
|||
Documentation=http://wiki.apertium.org/wiki/Apertium-apy |
|||
After=network.target |
|||
[Service] |
|||
WorkingDirectory=/home/tussenvoegsel/apertium-apy |
|||
ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes |
|||
Restart=always |
|||
WatchdogSec=10s |
|||
[Install] |
|||
WantedBy=multi-user.target |
|||
</pre> |
|||
You should now be able to do: |
|||
<pre> |
|||
systemctl --user daemon-reload # re-read the edited apy.service file |
|||
systemctl --user start apy # start apy immediately |
|||
systemctl --user stop apy # stop apy immediately |
|||
systemctl --user enable apy # make apy start after next reboot |
|||
systemctl --user status apy # check if apy is running |
|||
journalctl -f --user-unit apy # follow the apy logs |
|||
journalctl -n100 --user-unit apy # show last 100 lines of apy logs |
|||
curl 'localhost:2737/listPairs' # show installed pairs |
|||
curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words |
|||
</pre> |
|||
====Persistent logs==== |
|||
By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this: |
|||
<pre> |
|||
sudo mkdir /var/log/journal |
|||
sudo systemctl restart systemd-journald |
|||
</pre> |
|||
===Upstart=== |
|||
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: <code>sudo apt-get install upstart</code> |
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: <code>sudo apt-get install upstart</code> |
||
Line 353: | Line 531: | ||
View the status and PID: <code>sudo status JOB</code> |
View the status and PID: <code>sudo status JOB</code> |
||
===Logging=== |
====Logging==== |
||
The log files of the processes can be found in the <code>/var/log/upstart/</code> folder. |
The log files of the processes can be found in the <code>/var/log/upstart/</code> folder. |
||
Line 372: | Line 550: | ||
==TODO== |
==TODO== |
||
* hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/ |
|||
* It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along. |
|||
* It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running. |
|||
* http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted. |
|||
* some language pairs still don't work (sme-nob?) |
|||
* hfst-proc -g doesn't work with null-flushing (or?) |
|||
* translation cache |
* translation cache |
||
* |
* variants like ca_valencia, oc_aran and pt_BR look odd on the web page? |
||
* gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …) |
|||
* http://apy.projectjj.com/ currently shows a 404, / should show some sort of general info about the server and a link to this wiki page |
|||
==Troubleshooting== |
==Troubleshooting== |
||
=== CRITICAL:root:apy.py APy needs a UTF-8 locale, please set … === |
|||
Do <pre> export LC_ALL=C.UTF-8</pre> |
|||
and put that line in your ~/.bashrc |
|||
See also [[Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22]]. |
|||
=== listen tcp 0.0.0.0:2737: bind: address already in use === |
|||
Probably apy is already running, or some other program is holding the port open. |
|||
See what programs are using port 2737 with |
|||
<pre> |
|||
lsof -i :2737 |
|||
</pre> |
|||
or |
|||
<pre> |
|||
netstat -pna | grep 2737 |
|||
</pre> |
|||
If you're using docker, you may have to <code>sudo</code> those commands (lsof and netstat don't write anything, so that Should Be Safe™) |
|||
===forking problems on systemd 228 === |
|||
If you get errors like |
|||
<pre> |
|||
HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'}) |
|||
Traceback (most recent call last): |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute |
|||
result = yield result |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run |
|||
yielded = self.gen.throw(*exc_info) |
|||
File "/home/apertium/apertium-apy/servlet.py", line 389, in get |
|||
self.get_argument('markUnknown', default='yes')) |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run |
|||
yielded = self.gen.throw(*exc_info) |
|||
File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond |
|||
translated = yield pipeline.translate(toTranslate, nosplit) |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run |
|||
yielded = self.gen.throw(*exc_info) |
|||
File "/home/apertium/apertium-apy/translation.py", line 69, in translate |
|||
parts = yield [translateNULFlush(part, self) for part in all_split] |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback |
|||
result_list.append(f.result()) |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run |
|||
yielded = self.gen.send(value) |
|||
File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush |
|||
proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE) |
|||
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__ |
|||
restore_signals, start_new_session) |
|||
File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child |
|||
restore_signals, start_new_session, preexec_fn) |
|||
BlockingIOError: [Errno 11] Resource temporarily unavailable |
|||
</pre> |
|||
on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings. |
|||
===logging errors=== |
|||
If you encounter errors involving <code>enable_pretty_logging()</code> while starting APY, comment out the line with a leading <code>#</code> to solve the issue. |
If you encounter errors involving <code>enable_pretty_logging()</code> while starting APY, comment out the line with a leading <code>#</code> to solve the issue. |
||
: What was the error? This should be possible to fix / work around. |
: What was the error? This should be possible to fix / work around. |
||
===High IO usage=== |
|||
If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file. |
|||
==='return' with argument inside generator on python 3.2 or older=== |
|||
<pre> |
|||
Traceback (most recent call last): |
|||
File "./servlet.py", line 25, in <module> import translation |
|||
File "translation.py", line 132 |
|||
return proc_reformat.communicate()[0].decode('utf-8') |
|||
SyntaxError: 'return' with argument inside generator |
|||
</pre> |
|||
Solution: upgrade to Python 3.3 or newer. |
|||
==Docs== |
==Docs== |
||
* [[/Translation]] |
* [[/Translation]] |
||
* [[/Debian]] – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc. |
|||
* [[/Threading]] |
|||
* [[/Fedora]] – quickstart installation guide for running your very own APY server on Fedora |
|||
== Please cite == |
|||
* https://www.aclweb.org/anthology/W18-2207/ |
|||
[[Category:Tools]] |
[[Category:Tools]] |
||
[[Category:Services]] |
|||
[[Category:Documentation]] |
Latest revision as of 19:54, 1 August 2024
Apertium-APy stands for "Apertium API in Python". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for ScaleMT. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in GitHub, where servlet.py contains the relevant web server bits. The server is used by front ends like apertium-html-tools (on apertium.org) and Mediawiki Content Translation.
The https://apertium.org page uses an installation which currently only runs released language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that.
Test it![edit]
$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse [["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]]
Installation[edit]
See /Debian for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.
First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See Installation for how to do this.
You should have Python 3.4 or newer (though 3.2 has been reported to work as of 324a185).
APY uses Tornado 3.1 or newer as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do
sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion sudo pip3 install --upgrade tornado
Then clone APY from github and run it:
git clone git@github.com:apertium/apertium-apy.git cd apertium-apy ./servlet.py /usr/share/apertium # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs
See ./servlet.py --help for documentation on how to start APY. Here are some popular optional arguments:
- -l --lang-names: path to sqlite3 database of localized language names (see #List localised language names; you should include this if you're using apertium-html-tools)
- -p --port: port to run server on (2737 by default)
- -c --ssl-cert: path to SSL certificate
- -k --ssl-key: path to SSL key file
- -j --num-processes: number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs)
- -s --nonpairs-path: include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout)
- -f --missing-freqs: path to sqlite3 database of words that were unknown (requires
sudo apt-get install sqlite3
) - -i --max-pipes-per-pair: how many pipelines we can have per language pair (per http server), default = 1
- -u --max-users-per-pipe: if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5)
- -m --max-idle-secs: after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM)
- -n --min-pipes-per-pair: when shutting down idle pairs, keep at least this many open (default = 0)
- -r --restart-pipe-after: if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests
Installing dependencies without root[edit]
If you don't have root, you can still install the python dependencies with
$ pip3 install --user --upgrade tornado
(But your server still needs build-essential python3-dev python3-pip zlib1g-dev
installed.)
Then you just need to run
PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH
before starting APY.
Installing dependencies without root nor pip3[edit]
Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev), but this is simpler if you don't want to mess with pip.
Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do
cd apertium-apy tar xf ~/Nedlastingar/tornado-4.3.tar.gz ( cd tornado-4.3 && python3 setup.py build ) ln -s tornado-4.3/build/lib*/tornado tornado
Optional features[edit]
List localised language names[edit]
If you use apertium-html-tools, you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's sudo apt-get install sqlite3
), then do
make
to create the langNames.db used for the /listLanguageNames
function.
Language identification[edit]
The /identifyLang
function can provide language identification.
If you install Compact Language Detection 2 (CLD2), you get fast and fairly accurate language detection. Installation can be a bit tricky though.
- Ubuntu: see http://blog.xanda.org/2014/04/02/installing-compact-language-detection-2-cld2-on-ubuntu/
- Arch Linux: install python-cld2-hg from AUR.
Alternatively, you can start servlet.py with the -s argument pointing to a directory of language pairs with analyser modes, in which case APY will try to do language detection by analysing the text and finding which analyser had the least unknowns. This is a bit slow though :-)
APY will prefer using CLD2 if it's available, otherwise fall back to analyser coverage.
Usage[edit]
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:
curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord
It can also be tested through your browser or through HTTP calls. Unfortunately, curl does not decode JSON output by default and to make testing easier, a APY Sandbox is provided with Apertium-html-tools.
URL | Function | Parameters | Output |
---|---|---|---|
/listPairs | List available language pairs |
|
To be consistent with ScaleMT, the returned JS Object contains a responseData key with an Array of language pair objects with keys sourceLanguage and targetLanguage .
$ curl 'http://localhost:2737/listPairs' {"responseStatus": 200, "responseData": [ {"sourceLanguage": "kaz", "targetLanguage": "tat"}, {"sourceLanguage": "tat", "targetLanguage": "kaz"}, {"sourceLanguage": "mk", "targetLanguage": "en"} ], "responseDetails": null} |
/list | List available mode information |
|
The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).
$ curl 'http://localhost:2737/list?q=analyzers' {"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} $ curl 'http://localhost:2737/list?q=generators' {"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} $ curl 'http://localhost:2737/list?q=taggers' {"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
/translate | Translate text |
For more about formatting, please see Format Handling. |
To be consistent with ScaleMT, the returned JS Object contains a responseData key with an JS Object that has key translatedText that contains the translated text.
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' {"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} $ echo Сен бардың ба? > myfile $ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat' {"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} The following two queries contain nonstandard whitespace characters and are equivalent: $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&deformat=txt&reformat=txt' {"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&format=txt' {"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} The following two queries illustrate the difference between the $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html' {"responseData": {"translatedText": "Qué hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent' {"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} |
/translateDoc | Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex) |
|
Returns the translated document.
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt |
/analyze or /analyse | Morphologically analyze text |
|
The returned JS Array contains JS Arrays in the format [analysis, input-text] .
$ curl -G --data "lang=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]] |
/generate | Generate surface forms from text |
|
The returned JS Array contains JS Arrays in the format [generated, input-text] .
$ curl -G --data "lang=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate [["сен","^сен<v><tv><imp><p2><sg>$ "]] |
/perWord | Perform morphological tasks per word |
|
The returned JS Array contains JS Objects each containing the key input and up to 4 other keys corresponding to the requested modes (tagger , morph , biltrans and translate ).
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger&q=let+there+be+light' [{"input": "let", "tagger": "let<vblex><pp>"}, {"input": "there", "tagger": "there<adv>"}, {"input": "be", "tagger": "be<vbser><inf>"}, {"input": "light", "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=translate&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=let+there+be+light' [{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+biltrans&q=let+there+be+light' [{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+translate&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans+tagger&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans+tagger&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+tagger&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans+tagger&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
/listLanguageNames | Get localized language names |
|
The returned JS Object contains a mapping of requested language codes to localized language names
$ curl 'http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk' {"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"} |
/calcCoverage | Get coverage of a language on a text |
|
The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage.
$ curl 'http://localhost:2737/getCoverage?lang=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind' [0.9230769230769231] |
/identifyLang | Return a list of languages with probabilities of the text being in that language. Uses CLD2 if that's installed, otherwise will try any analyser modes. |
|
The returned JS Object contains a mapping from language codes to probabilities.
$ curl 'http://localhost:2737/identifyLang?q=This+is+a+piece+of+text.' {"ca": 0.19384234, "en": 0.98792465234, "kk": 0.293442432, "zh": 0.002931001} |
/stats | Return some statistics about pair usage, uptime, portion of time spent actively translating |
|
Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py)
$ curl -Ss localhost:2737/stats|jq .responseData { "holdingPipes": 0, "periodStats": { "totTimeSpent": 10.760803, "ageFirstRequest": 19.609394, "totChars": 2718, "requests": 8, "charsPerSec": 252.58 }, "runningPipes": { "eng-spa": 1 }, "useCount": { "eng-spa": 8 }, "uptime": 26 } |
/spellCheck | Note: This endpoint is not yet available in the main branch. Handles spell-checking requests using Voikko or Divvun spell checkers. |
|
The output is a JSON array where each element represents a token from the input text. Each token includes the following information:
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz' [ {"token": "қазақша", "known": true, "sugg": []}, {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]} ] $ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun' [ {"token": "қазақша", "known": true, "sugg": []}, {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]} ] |
SSL[edit]
APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running:
openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes
Then run APY with --ssl-key server.key --ssl-cert server.crt
, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze
If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for:
curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze
Remember to open port 2737 to your server.
Gateway[edit]
A gateway for APY is located in the same directory and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding /list
requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers.
A list of APY servers is a required positional argument; an example server list is provided in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one.
The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each (mode, language)
and forwards requests to the fastest server as measured in units of response time per response length.
Running on init[edit]
Systemd[edit]
See Apy/Debian for the quickstart.
Running as a --user unit[edit]
If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do:
sudo apt-get install dbus libpam-systemd # or dnf on Fedora etc. sudo loginctl enable-linger tussenvoegsel
To read the logs without sudo, admin will also have to enable persistent logs (see below).
Then as your "tussenvoegsel" user, do
mkdir -p ~/.config/systemd/user/ git clone https://github.com/apertium/apertium-apy cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/
Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy.
Here's a full example apy.service file:
$ cat ~/.config/systemd/user/apy.service [Unit] Description=Translation server and API for Apertium Documentation=http://wiki.apertium.org/wiki/Apertium-apy After=network.target [Service] WorkingDirectory=/home/tussenvoegsel/apertium-apy ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes Restart=always WatchdogSec=10s [Install] WantedBy=multi-user.target
You should now be able to do:
systemctl --user daemon-reload # re-read the edited apy.service file systemctl --user start apy # start apy immediately systemctl --user stop apy # stop apy immediately systemctl --user enable apy # make apy start after next reboot systemctl --user status apy # check if apy is running journalctl -f --user-unit apy # follow the apy logs journalctl -n100 --user-unit apy # show last 100 lines of apy logs curl 'localhost:2737/listPairs' # show installed pairs curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words
Persistent logs[edit]
By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this:
sudo mkdir /var/log/journal sudo systemctl restart systemd-journald
Upstart[edit]
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: sudo apt-get install upstart
The apertiumconfig file contains paths of some apertium directories and the serverlist file. It can be saved anywhere. Make sure the paths are correct!
/home/user/apertiumconfig
APERTIUMPATH=/home/user APYPATH=/home/user/apertium-apy SERVERLIST=/home/user/serverlist HTMLTOOLSPATH=/home/user/apertium-html-tools #optional, see 'Logging': LOGFILE=/home/user/apertiumlog
The following upstart scripts have to be saved in /etc/init
.
apertium-all.conf
description "start/stop all apertium services" start on startup
apertium-apy.conf
description "apertium-apy init script" start on starting apertium-all stop on stopped apertium-all respawn respawn limit 50 300 env CONFIG=/etc/default/apertium script . $CONFIG python3 $APYPATH/servlet.py $APERTIUMPATH end script
apertium-apy-gateway.conf
description "apertium-apy gateway init script" start on starting apertium-all stop on stopped apertium-all respawn respawn limit 50 300 env CONFIG=/home/user/apertiumconfig script . $CONFIG python3 $APYPATH/gateway.py $SERVERLIST end script
apertium-html-tools.conf
description "apertium-html-tools init script" start on starting apertium-all stop on stopped apertium-all respawn respawn limit 50 300 env CONFIG=/etc/default/apertium script . $CONFIG cd $HTMLTOOLSPATH python3 -m http.server 8888 end script
Use sudo start apertium-all
to start all services. Just like the filenames, the jobs are called apertium-apy
, apertium-apy-gateway
and apertium-html-tools
.
The jobs can be independently started by: sudo start JOB
You can stop them by using sudo stop JOB
Restart: sudo restart JOB
View the status and PID: sudo status JOB
Logging[edit]
The log files of the processes can be found in the /var/log/upstart/
folder.
The starting/stopping of the jobs can be logged by appending this to the end of apertium-apy.conf
, apertium-apy-gateway.conf
and apertium-html-tools.conf
files.
pre-start script . $CONFIG touch $LOGFILE echo "`date` $UPSTART_JOB started" >> $LOGFILE end script post-stop script . $CONFIG touch $LOGFILE echo "`date` $UPSTART_JOB stoppped" >> $LOGFILE end script
TODO[edit]
- hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/
- translation cache
- variants like ca_valencia, oc_aran and pt_BR look odd on the web page?
- gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …)
Troubleshooting[edit]
CRITICAL:root:apy.py APy needs a UTF-8 locale, please set …[edit]
Do
export LC_ALL=C.UTF-8
and put that line in your ~/.bashrc
See also Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22.
listen tcp 0.0.0.0:2737: bind: address already in use[edit]
Probably apy is already running, or some other program is holding the port open.
See what programs are using port 2737 with
lsof -i :2737
or
netstat -pna | grep 2737
If you're using docker, you may have to sudo
those commands (lsof and netstat don't write anything, so that Should Be Safe™)
forking problems on systemd 228[edit]
If you get errors like
HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'}) Traceback (most recent call last): File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute result = yield result File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/home/apertium/apertium-apy/servlet.py", line 389, in get self.get_argument('markUnknown', default='yes')) File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond translated = yield pipeline.translate(toTranslate, nosplit) File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/home/apertium/apertium-apy/translation.py", line 69, in translate parts = yield [translateNULFlush(part, self) for part in all_split] File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback result_list.append(f.result()) File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run yielded = self.gen.send(value) File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE) File "/usr/lib/python3.5/subprocess.py", line 947, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child restore_signals, start_new_session, preexec_fn) BlockingIOError: [Errno 11] Resource temporarily unavailable
on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings.
logging errors[edit]
If you encounter errors involving enable_pretty_logging()
while starting APY, comment out the line with a leading #
to solve the issue.
- What was the error? This should be possible to fix / work around.
High IO usage[edit]
If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file.
'return' with argument inside generator on python 3.2 or older[edit]
Traceback (most recent call last): File "./servlet.py", line 25, in <module> import translation File "translation.py", line 132 return proc_reformat.communicate()[0].decode('utf-8') SyntaxError: 'return' with argument inside generator
Solution: upgrade to Python 3.3 or newer.
Docs[edit]
- /Translation
- /Debian – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc.
- /Fedora – quickstart installation guide for running your very own APY server on Fedora