Difference between revisions of "Apertium-apy"

From Apertium
Jump to navigation Jump to search
 
(82 intermediate revisions by 10 users not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}


'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for [[ScaleMT]]. It is currently found in the SVN under [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy/ trunk/apertium-tools/apertium-apy], where [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy/servlet.py servlet.py] contains the relevant web server bits. This is meant for front ends like [[apertium-html-tools]].
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for [[ScaleMT]]. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in [https://github.com/apertium/apertium-apy GitHub], where [https://github.com/apertium/apertium-apy/blob/master/servlet.py servlet.py] contains the relevant web server bits. The server is used by front ends like [[apertium-html-tools]] (on apertium.org) and [https://www.mediawiki.org/wiki/Content_translation Mediawiki Content Translation].




The http://apertium.org page uses the installation at http://apy.projectjj.com which currently only runs ''released'' language pairs. However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers, read on for how to do that.
The https://apertium.org page uses an installation which currently only runs ''released'' language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that.

== Test it! ==

<pre>
$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse

[["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]]
</pre>


== Installation ==
== Installation ==
<span style="color: #f00;">''See [[/Debian]] for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.''</span>
First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See [[Minimal_installation_from_SVN]] for how to do this.


First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See [[Installation]] for how to do this.
APY uses [http://www.tornadoweb.org/en/stable/ Tornado] as its web framework. Ensure that you install the Python 3 versions of any dependencies. On Debian/Ubuntu, you can do

You should have Python '''3.4''' or newer (though 3.2 has been reported to work as of 324a185).

APY uses [http://www.tornadoweb.org/en/stable/ Tornado 3.1 or newer] as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do
<pre>
<pre>
sudo apt-get install python3-tornado
sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion
sudo pip3 install --upgrade tornado
</pre>
</pre>
Or you can install it via <code>pip install tornado</code> or other variants depending on your environment.


Then checkout APY from SVN and run it:
Then clone APY from github and run it:


<pre>
<pre>
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy
git clone git@github.com:apertium/apertium-apy.git
cd apertium-apy
cd apertium-apy
./servlet.py /usr/local/share/apertium # all .mode files from under this directory are included
./servlet.py /usr/share/apertium # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs
</pre>
</pre>


See '''./servlet.py --help''' for documentation on how to start APY. Here are some popular optional arguments:


*'''-l --lang-names''': path to sqlite3 database of localized language names (see [[#List localised language names]]; you should include this if you're using [[apertium-html-tools]])
Optional arguments include:
*'''-l --lang-names''': path to sqlite database of localized language names (<code>unicode.db</code> by default)
*'''-p --port''': port to run server on (2737 by default)
*'''-p --port''': port to run server on (2737 by default)
*'''-c --ssl-cert:''' path to SSL certificate
*'''-c --ssl-cert:''' path to SSL certificate
*'''-k --ssl-key:''' path to SSL key file
*'''-k --ssl-key:''' path to SSL key file
*'''-x --num-processes:''' number of child processes (default to number of cores)
*'''-j --num-processes:''' number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs)
*'''-s --nonpairs-path:''' include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium SVN)
*'''-s --nonpairs-path:''' include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout)
*'''-f --missing-freqs:''' path to sqlite3 database of words that were unknown (requires <code>sudo apt-get install sqlite3</code>)
*'''-i --max-pipes-per-pair:''' how many pipelines we can have per language pair (per http server), default = 1
*'''-u --max-users-per-pipe:''' if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5)
*'''-m --max-idle-secs:''' after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM)
*'''-n --min-pipes-per-pair:''' when shutting down idle pairs, keep at least this many open (default = 0)
*'''-r --restart-pipe-after:''' if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests

===Installing dependencies without root===
If you don't have root, you can still install the python dependencies with
<pre>
$ pip3 install --user --upgrade tornado
</pre>
(But your server still needs <code>build-essential python3-dev python3-pip zlib1g-dev</code> installed.)

Then you just need to run <pre>PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH</pre> before starting APY.

===Installing dependencies without root nor pip3===
Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev</code>), but this is simpler if you don't want to mess with pip.

Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do
<pre>
cd apertium-apy
tar xf ~/Nedlastingar/tornado-4.3.tar.gz
( cd tornado-4.3 && python3 setup.py build )
ln -s tornado-4.3/build/lib*/tornado tornado
</pre>


===Optional features===
===Optional features===
====List localised language names====
====List localised language names====
If you use [[apertium-html-tools]], you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's <code>sudo apt-get install sqlite3</code>), then do
If you have sqlite3, you can do
<pre>
<pre>
make
make
</pre>
</pre>


to create the unicode.db used for the <code>/listLanguageNames</code> function.
to create the langNames.db used for the <code>/listLanguageNames</code> function.


====Language identification====
====Language identification====
Line 57: Line 95:
<code>
<code>
<pre>curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre>
<pre>curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre>
</code> It can also be tested through your browser or through HTTP calls. Unfortunately, curl does '''not''' decode JSON output by default and to make testing easier, a APY Sandbox is provided in the SVN with [[Apertium-html-tools]].
</code> It can also be tested through your browser or through HTTP calls. Unfortunately, curl does '''not''' decode JSON output by default and to make testing easier, a APY Sandbox is provided with [[Apertium-html-tools]].


{| class="wikitable" border="1"
{| class="wikitable" border="1"
Line 68: Line 106:
| '''/listPairs'''
| '''/listPairs'''
| List available language pairs
| List available language pairs
| None
|
*'''include_deprecated_codes''': give this parameter to include old ISO-639-1 codes in output
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an Array of language pair objects with keys <code>sourceLanguage</code> and <code>targetLanguage</code>.
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an Array of language pair objects with keys <code>sourceLanguage</code> and <code>targetLanguage</code>.
<pre>
<pre>
Line 109: Line 148:
*'''langpair''': language pair to use for translation
*'''langpair''': language pair to use for translation
*'''q''': text to translate
*'''q''': text to translate
*'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words
*'''deformat''': deformatter to be used: one of html (default), txt, rtf
*'''reformat''': deformatter to be used: one of html, html-noent (default), txt, rtf
*'''format''': if deformatter and reformatter are the same, they can be specified here
For more about formatting, please see [http://wiki.apertium.org/wiki/Format_handling Format Handling].
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an JS Object that has key <code>translatedText</code> that contains the translated text.
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an JS Object that has key <code>translatedText</code> that contains the translated text.
<pre>
<pre>
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?'
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?'
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}
$ echo Сен бардың ба? > myfile
$ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat'
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}
</pre>

The following two queries contain nonstandard whitespace characters and are equivalent:
<pre>
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&deformat=txt&reformat=txt'
{"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null}
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&format=txt'
{"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null}
</pre>

The following two queries illustrate the difference between the <code>html</code> and <code>html-noent</code> reformatter:
<pre>
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html'
{"responseData": {"translatedText": "Qu&amp;eacute; hace este trabajo?"}, "responseDetails": null, "responseStatus": 200}
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent'
{"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200}
</pre>
|-
| '''/translateDoc'''
| Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex)
|
*'''langpair''': language pair to use for translation
*'''file''': document to translate
*'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words
| Returns the translated document.
<pre>
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt
</pre>
</pre>
|-
|-
Line 118: Line 192:
| Morphologically analyze text
| Morphologically analyze text
|
|
*'''mode''': language to use for analysis
*'''lang''': language to use for analysis
*'''q''': text to analyze
*'''q''': text to analyze
| The returned JS Array contains JS Arrays in the format <code>[analysis, input-text]</code>.
| The returned JS Array contains JS Arrays in the format <code>[analysis, input-text]</code>.
Line 126: Line 200:
white-space: -o-pre-wrap;
white-space: -o-pre-wrap;
word-wrap: break-word;">
word-wrap: break-word;">
$ curl -G --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
$ curl -G --data "lang=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]]
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]]
</pre>
</pre>
Line 133: Line 207:
| Generate surface forms from text
| Generate surface forms from text
|
|
*'''mode''': language to use for generation
*'''lang''': language to use for generation
*'''q''': text to generate
*'''q''': text to generate
| The returned JS Array contains JS Arrays in the format <code>[generated, input-text]</code>.
| The returned JS Array contains JS Arrays in the format <code>[generated, input-text]</code>.
<pre>
<pre>
$ curl -G --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate
$ curl -G --data "lang=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate
[["сен","^сен<v><tv><imp><p2><sg>$ "]]
[["сен","^сен<v><tv><imp><p2><sg>$ "]]
</pre>
</pre>
Line 215: Line 289:
</pre>
</pre>
|-
|-
| '''/coverage'''
| '''/calcCoverage'''
| Get coverage of a language on a text
| Get coverage of a language on a text
|
|
*'''mode''': language to analyze with
*'''lang''': language to analyze with
*'''q''': text to analyze for coverage
*'''q''': text to analyze for coverage
| The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage.
| The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage.
Line 226: Line 300:
white-space: -o-pre-wrap;
white-space: -o-pre-wrap;
word-wrap: break-word;">
word-wrap: break-word;">
$ curl 'http://localhost:2737/getCoverage?mode=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind'
$ curl 'http://localhost:2737/getCoverage?lang=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind'
[0.9230769230769231]
[0.9230769230769231]
</pre>
</pre>
Line 240: Line 314:
</pre>
</pre>
|-
|-
| '''/stats'''
| Return some statistics about pair usage, uptime, portion of time spent actively translating
|
*'''requests=N''' (optional): limit period-based stats to last N requests
| Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py)
<pre>
$ curl -Ss localhost:2737/stats|jq .responseData
{
"holdingPipes": 0,
"periodStats": {
"totTimeSpent": 10.760803,
"ageFirstRequest": 19.609394,
"totChars": 2718,
"requests": 8,
"charsPerSec": 252.58
},
"runningPipes": {
"eng-spa": 1
},
"useCount": {
"eng-spa": 8
},
"uptime": 26
}
</pre>
|-
| '''/spellCheck'''
| '''Note: This endpoint is not yet available in the main branch.''' Handles spell-checking requests using Voikko or Divvun spell checkers.
|
*'''q''': The text to be spell-checked (String, Required, e.g., `қазақша билмеймін`)
*'''lang''': The language of the text (String, Required, e.g., `kaz`)
*'''spellchecker''': The spell checker to use (String, Optional, Defaults to `voikko`, e.g., `divvun`)
| The output is a JSON array where each element represents a token from the input text. Each token includes the following information:
<pre>
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz'
[
{"token": "қазақша", "known": true, "sugg": []},
{"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]}
]

$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun'
[
{"token": "қазақша", "known": true, "sugg": []},
{"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]}
]
</pre>

|}
|}


Line 248: Line 369:
</pre>
</pre>


Then run APY with <code>--sslKey server.key --sslCert server.crt</code>, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):
Then run APY with <code>--ssl-key server.key --ssl-cert server.crt</code>, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):
<pre>
<pre>
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze
Line 260: Line 381:


== Gateway ==
== Gateway ==
A gateway for APY is located in the [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy/ same SVN directory] and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding <code>/list</code> requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers.
A gateway for APY is located in the [https://github.com/apertium/apertium-apy same directory] and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding <code>/list</code> requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers.


A list of APY servers is a required positional argument; an example server list is [https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy/serverlist-example provided] in the same SVN directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one.
A list of APY servers is a required positional argument; an example server list is [https://github.com/apertium/apertium-apy/blob/master/serverlist-example provided] in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one.


The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each <code>(mode, language)</code> and forwards requests to the fastest server as measured in units of response time per response length.
The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each <code>(mode, language)</code> and forwards requests to the fastest server as measured in units of response time per response length.
==Upstart scripts==
==Running on init==
===Systemd===
See [[Apy/Debian]] for the quickstart.

====Running as a --user unit====
If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do:
<pre>
sudo apt-get install dbus libpam-systemd # or dnf on Fedora etc.
sudo loginctl enable-linger tussenvoegsel
</pre>
To read the logs without sudo, admin will also have to enable persistent logs (see [[Apertium-apy#Persistent_logs|below]]).


Then as your "tussenvoegsel" user, do
<pre>
mkdir -p ~/.config/systemd/user/
git clone https://github.com/apertium/apertium-apy
cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/
</pre>
Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy.

Here's a full example apy.service file:
<pre>
$ cat ~/.config/systemd/user/apy.service
[Unit]
Description=Translation server and API for Apertium
Documentation=http://wiki.apertium.org/wiki/Apertium-apy
After=network.target
[Service]
WorkingDirectory=/home/tussenvoegsel/apertium-apy
ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes
Restart=always
WatchdogSec=10s
[Install]
WantedBy=multi-user.target
</pre>


You should now be able to do:
<pre>
systemctl --user daemon-reload # re-read the edited apy.service file
systemctl --user start apy # start apy immediately
systemctl --user stop apy # stop apy immediately
systemctl --user enable apy # make apy start after next reboot
systemctl --user status apy # check if apy is running
journalctl -f --user-unit apy # follow the apy logs
journalctl -n100 --user-unit apy # show last 100 lines of apy logs
curl 'localhost:2737/listPairs' # show installed pairs
curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words
</pre>

====Persistent logs====
By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this:
<pre>
sudo mkdir /var/log/journal
sudo systemctl restart systemd-journald
</pre>

===Upstart===
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: <code>sudo apt-get install upstart</code>
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: <code>sudo apt-get install upstart</code>


Line 352: Line 531:
View the status and PID: <code>sudo status JOB</code>
View the status and PID: <code>sudo status JOB</code>


===Logging===
====Logging====
The log files of the processes can be found in the <code>/var/log/upstart/</code> folder.
The log files of the processes can be found in the <code>/var/log/upstart/</code> folder.


Line 371: Line 550:


==TODO==
==TODO==
* hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/
* It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along.
* It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running.
* http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted.
* some language pairs still don't work (sme-nob?)
* hfst-proc -g doesn't work with null-flushing (or?)
* translation cache
* translation cache
* add support for ca_valencia, oc_aran and pt_BR
* variants like ca_valencia, oc_aran and pt_BR look odd on the web page?
* gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …)
* http://apy.projectjj.com/ currently shows a 404, / should show some sort of general info about the server and a link to this wiki page


==Troubleshooting==
==Troubleshooting==

=== CRITICAL:root:apy.py APy needs a UTF-8 locale, please set … ===
Do <pre> export LC_ALL=C.UTF-8</pre>
and put that line in your ~/.bashrc

See also [[Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22]].

=== listen tcp 0.0.0.0:2737: bind: address already in use ===
Probably apy is already running, or some other program is holding the port open.

See what programs are using port 2737 with
<pre>
lsof -i :2737
</pre>
or
<pre>
netstat -pna | grep 2737
</pre>

If you're using docker, you may have to <code>sudo</code> those commands (lsof and netstat don't write anything, so that Should Be Safe™)

===forking problems on systemd 228 ===
If you get errors like
<pre>
HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'})
Traceback (most recent call last):
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute
result = yield result
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/home/apertium/apertium-apy/servlet.py", line 389, in get
self.get_argument('markUnknown', default='yes'))
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond
translated = yield pipeline.translate(toTranslate, nosplit)
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
yielded = self.gen.throw(*exc_info)
File "/home/apertium/apertium-apy/translation.py", line 69, in translate
parts = yield [translateNULFlush(part, self) for part in all_split]
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
value = future.result()
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback
result_list.append(f.result())
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run
yielded = self.gen.send(value)
File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush
proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE)
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child
restore_signals, start_new_session, preexec_fn)
BlockingIOError: [Errno 11] Resource temporarily unavailable
</pre>
on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings.

===logging errors===
If you encounter errors involving <code>enable_pretty_logging()</code> while starting APY, comment out the line with a leading <code>#</code> to solve the issue.
If you encounter errors involving <code>enable_pretty_logging()</code> while starting APY, comment out the line with a leading <code>#</code> to solve the issue.
: What was the error? This should be possible to fix / work around.
: What was the error? This should be possible to fix / work around.

===High IO usage===
If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file.

==='return' with argument inside generator on python 3.2 or older===
<pre>
Traceback (most recent call last):
File "./servlet.py", line 25, in <module> import translation
File "translation.py", line 132
return proc_reformat.communicate()[0].decode('utf-8')
SyntaxError: 'return' with argument inside generator
</pre>
Solution: upgrade to Python 3.3 or newer.


==Docs==
==Docs==
* [[/Translation]]
* [[/Translation]]
* [[/Debian]] – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc.
* [[/Threading]]
* [[/Fedora]] – quickstart installation guide for running your very own APY server on Fedora

== Please cite ==
* https://www.aclweb.org/anthology/W18-2207/


[[Category:Tools]]
[[Category:Tools]]
[[Category:Services]]
[[Category:Documentation]]

Latest revision as of 19:54, 1 August 2024

Apertium-APy stands for "Apertium API in Python". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for ScaleMT. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in GitHub, where servlet.py contains the relevant web server bits. The server is used by front ends like apertium-html-tools (on apertium.org) and Mediawiki Content Translation.


The https://apertium.org page uses an installation which currently only runs released language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that.

Test it![edit]

$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse

[["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]]

Installation[edit]

See /Debian for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.

First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See Installation for how to do this.

You should have Python 3.4 or newer (though 3.2 has been reported to work as of 324a185).

APY uses Tornado 3.1 or newer as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do

sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion
sudo pip3 install --upgrade tornado

Then clone APY from github and run it:

git clone git@github.com:apertium/apertium-apy.git
cd apertium-apy
./servlet.py /usr/share/apertium   # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs

See ./servlet.py --help for documentation on how to start APY. Here are some popular optional arguments:

  • -l --lang-names: path to sqlite3 database of localized language names (see #List localised language names; you should include this if you're using apertium-html-tools)
  • -p --port: port to run server on (2737 by default)
  • -c --ssl-cert: path to SSL certificate
  • -k --ssl-key: path to SSL key file
  • -j --num-processes: number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs)
  • -s --nonpairs-path: include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout)
  • -f --missing-freqs: path to sqlite3 database of words that were unknown (requires sudo apt-get install sqlite3)
  • -i --max-pipes-per-pair: how many pipelines we can have per language pair (per http server), default = 1
  • -u --max-users-per-pipe: if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5)
  • -m --max-idle-secs: after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM)
  • -n --min-pipes-per-pair: when shutting down idle pairs, keep at least this many open (default = 0)
  • -r --restart-pipe-after: if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests

Installing dependencies without root[edit]

If you don't have root, you can still install the python dependencies with

$ pip3 install --user --upgrade tornado

(But your server still needs build-essential python3-dev python3-pip zlib1g-dev installed.)

Then you just need to run

PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH

before starting APY.

Installing dependencies without root nor pip3[edit]

Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev), but this is simpler if you don't want to mess with pip.

Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do

cd apertium-apy
tar xf ~/Nedlastingar/tornado-4.3.tar.gz 
( cd tornado-4.3 && python3 setup.py build )
ln -s tornado-4.3/build/lib*/tornado tornado

Optional features[edit]

List localised language names[edit]

If you use apertium-html-tools, you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's sudo apt-get install sqlite3), then do

make

to create the langNames.db used for the /listLanguageNames function.

Language identification[edit]

The /identifyLang function can provide language identification.

If you install Compact Language Detection 2 (CLD2), you get fast and fairly accurate language detection. Installation can be a bit tricky though.


Alternatively, you can start servlet.py with the -s argument pointing to a directory of language pairs with analyser modes, in which case APY will try to do language detection by analysing the text and finding which analyser had the least unknowns. This is a bit slow though :-)

APY will prefer using CLD2 if it's available, otherwise fall back to analyser coverage.

Usage[edit]

APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:

curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord

It can also be tested through your browser or through HTTP calls. Unfortunately, curl does not decode JSON output by default and to make testing easier, a APY Sandbox is provided with Apertium-html-tools.

URL Function Parameters Output
/listPairs List available language pairs
  • include_deprecated_codes: give this parameter to include old ISO-639-1 codes in output
To be consistent with ScaleMT, the returned JS Object contains a responseData key with an Array of language pair objects with keys sourceLanguage and targetLanguage.
$ curl 'http://localhost:2737/listPairs'

{"responseStatus": 200, "responseData": [
 {"sourceLanguage": "kaz", "targetLanguage": "tat"}, 
 {"sourceLanguage": "tat", "targetLanguage": "kaz"}, 
 {"sourceLanguage": "mk", "targetLanguage": "en"}
], "responseDetails": null}
/list List available mode information
  • q: type of information to list
    • pairs (alias for /listPairs)
    • analyzers/analysers
    • generators
    • taggers/disambiguators
The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).
$ curl 'http://localhost:2737/list?q=analyzers'
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", 
 "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"}
$ curl 'http://localhost:2737/list?q=generators'
{"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"}
$ curl 'http://localhost:2737/list?q=taggers'
{"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger",
 "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"}
/translate Translate text
  • langpair: language pair to use for translation
  • q: text to translate
  • markUnknown=no (optional): include this to remove "*" in front of unknown words
  • deformat: deformatter to be used: one of html (default), txt, rtf
  • reformat: deformatter to be used: one of html, html-noent (default), txt, rtf
  • format: if deformatter and reformatter are the same, they can be specified here

For more about formatting, please see Format Handling.

To be consistent with ScaleMT, the returned JS Object contains a responseData key with an JS Object that has key translatedText that contains the translated text.
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?'
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}
$ echo Сен бардың ба? > myfile
$ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat'
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}

The following two queries contain nonstandard whitespace characters and are equivalent:

$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This    works well&deformat=txt&reformat=txt'
{"responseStatus": 200, "responseData": {"translatedText": "Esto    trabaja\u2001bien"}, "responseDetails": null}
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This    works well&format=txt'
{"responseStatus": 200, "responseData": {"translatedText": "Esto    trabaja\u2001bien"}, "responseDetails": null}

The following two queries illustrate the difference between the html and html-noent reformatter:

$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html'
{"responseData": {"translatedText": "Qu&eacute; hace este trabajo?"}, "responseDetails": null, "responseStatus": 200}
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent'
{"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200}
/translateDoc Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex)
  • langpair: language pair to use for translation
  • file: document to translate
  • markUnknown=no (optional): include this to remove "*" in front of unknown words
Returns the translated document.
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt
/analyze or /analyse Morphologically analyze text
  • lang: language to use for analysis
  • q: text to analyze
The returned JS Array contains JS Arrays in the format [analysis, input-text].
$ curl -G --data "lang=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]]
/generate Generate surface forms from text
  • lang: language to use for generation
  • q: text to generate
The returned JS Array contains JS Arrays in the format [generated, input-text].
$ curl -G --data "lang=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate
[["сен","^сен<v><tv><imp><p2><sg>$ "]]
/perWord Perform morphological tasks per word
  • lang: language to use for tasks
  • modes: morphological tasks to perform on text (15 combinations possible - delimit using '+')
    • tagger/disambig
    • biltrans
    • translate
    • morph
  • q: text to perform tasks on
The returned JS Array contains JS Objects each containing the key input and up to 4 other keys corresponding to the requested modes (tagger, morph, biltrans and translate).
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger&q=let+there+be+light'
[{"input": "let", "tagger": "let<vblex><pp>"}, {"input": "there", "tagger": "there<adv>"}, {"input": "be", "tagger": "be<vbser><inf>"}, {"input": "light", "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=translate&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=let+there+be+light'
[{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+biltrans&q=let+there+be+light'
[{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+translate&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans+tagger&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans+tagger&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+tagger&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans+tagger&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

/listLanguageNames Get localized language names
  • locale: language to get localized language names in
  • languages: list of '+' delimited language codes to retrieve localized names for (optional - if not specified, all available codes will be returned)
The returned JS Object contains a mapping of requested language codes to localized language names
$ curl 'http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk'
{"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"}
/calcCoverage Get coverage of a language on a text
  • lang: language to analyze with
  • q: text to analyze for coverage
The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage.
$ curl 'http://localhost:2737/getCoverage?lang=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind'
[0.9230769230769231]
/identifyLang Return a list of languages with probabilities of the text being in that language. Uses CLD2 if that's installed, otherwise will try any analyser modes.
  • q: text which you would like to compute probabilities for
The returned JS Object contains a mapping from language codes to probabilities.
$ curl 'http://localhost:2737/identifyLang?q=This+is+a+piece+of+text.'
{"ca": 0.19384234, "en": 0.98792465234, "kk": 0.293442432, "zh": 0.002931001}
/stats Return some statistics about pair usage, uptime, portion of time spent actively translating
  • requests=N (optional): limit period-based stats to last N requests
Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py)
$ curl -Ss localhost:2737/stats|jq .responseData
{
  "holdingPipes": 0,
  "periodStats": {
    "totTimeSpent": 10.760803,
    "ageFirstRequest": 19.609394,
    "totChars": 2718,
    "requests": 8,
    "charsPerSec": 252.58
  },
  "runningPipes": {
    "eng-spa": 1
  },
  "useCount": {
    "eng-spa": 8
  },
  "uptime": 26
}
/spellCheck Note: This endpoint is not yet available in the main branch. Handles spell-checking requests using Voikko or Divvun spell checkers.
  • q: The text to be spell-checked (String, Required, e.g., `қазақша билмеймін`)
  • lang: The language of the text (String, Required, e.g., `kaz`)
  • spellchecker: The spell checker to use (String, Optional, Defaults to `voikko`, e.g., `divvun`)
The output is a JSON array where each element represents a token from the input text. Each token includes the following information:
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz'
[
  {"token": "қазақша", "known": true, "sugg": []},
  {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]}
]

$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun'
[
  {"token": "қазақша", "known": true, "sugg": []},
  {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]}
]

SSL[edit]

APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running:

openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes

Then run APY with --ssl-key server.key --ssl-cert server.crt, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):

curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze

If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for:

curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze

Remember to open port 2737 to your server.

Gateway[edit]

A gateway for APY is located in the same directory and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding /list requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers.

A list of APY servers is a required positional argument; an example server list is provided in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one.

The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each (mode, language) and forwards requests to the fastest server as measured in units of response time per response length.

Running on init[edit]

Systemd[edit]

See Apy/Debian for the quickstart.

Running as a --user unit[edit]

If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do:

sudo apt-get install dbus libpam-systemd  # or dnf on Fedora etc.
sudo loginctl enable-linger tussenvoegsel

To read the logs without sudo, admin will also have to enable persistent logs (see below).


Then as your "tussenvoegsel" user, do

mkdir -p ~/.config/systemd/user/
git clone https://github.com/apertium/apertium-apy
cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/

Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy.

Here's a full example apy.service file:

$ cat ~/.config/systemd/user/apy.service 
[Unit]
Description=Translation server and API for Apertium
Documentation=http://wiki.apertium.org/wiki/Apertium-apy
After=network.target
[Service]
WorkingDirectory=/home/tussenvoegsel/apertium-apy
ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes
Restart=always
WatchdogSec=10s
[Install]
WantedBy=multi-user.target


You should now be able to do:

systemctl --user daemon-reload   # re-read the edited apy.service file
systemctl --user start apy       # start apy immediately
systemctl --user stop apy        # stop apy immediately
systemctl --user enable apy      # make apy start after next reboot
systemctl --user status apy      # check if apy is running
journalctl -f --user-unit apy    # follow the apy logs
journalctl -n100 --user-unit apy # show last 100 lines of apy logs
curl 'localhost:2737/listPairs'  # show installed pairs
curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words

Persistent logs[edit]

By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this:

sudo mkdir /var/log/journal
sudo systemctl restart systemd-journald

Upstart[edit]

You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: sudo apt-get install upstart

The apertiumconfig file contains paths of some apertium directories and the serverlist file. It can be saved anywhere. Make sure the paths are correct!

/home/user/apertiumconfig

APERTIUMPATH=/home/user
APYPATH=/home/user/apertium-apy
SERVERLIST=/home/user/serverlist
HTMLTOOLSPATH=/home/user/apertium-html-tools
#optional, see 'Logging':
LOGFILE=/home/user/apertiumlog  

The following upstart scripts have to be saved in /etc/init.

apertium-all.conf

description "start/stop all apertium services"
     
start on startup

apertium-apy.conf

description "apertium-apy init script"

start on starting apertium-all
stop on stopped apertium-all
respawn
respawn limit 50 300

env CONFIG=/etc/default/apertium

script
    . $CONFIG
    python3 $APYPATH/servlet.py $APERTIUMPATH
end script

apertium-apy-gateway.conf

description "apertium-apy gateway init script"
     
start on starting apertium-all
stop on stopped apertium-all
respawn
respawn limit 50 300
     
env CONFIG=/home/user/apertiumconfig

script
    . $CONFIG
    python3 $APYPATH/gateway.py $SERVERLIST
end script 

apertium-html-tools.conf

description "apertium-html-tools init script"
           
start on starting apertium-all
stop on stopped apertium-all
respawn
respawn limit 50 300
     
env CONFIG=/etc/default/apertium

script
    . $CONFIG
    cd $HTMLTOOLSPATH
    python3 -m http.server 8888
end script

Use sudo start apertium-all to start all services. Just like the filenames, the jobs are called apertium-apy, apertium-apy-gateway and apertium-html-tools.

The jobs can be independently started by: sudo start JOB

You can stop them by using sudo stop JOB

Restart: sudo restart JOB

View the status and PID: sudo status JOB

Logging[edit]

The log files of the processes can be found in the /var/log/upstart/ folder.

The starting/stopping of the jobs can be logged by appending this to the end of apertium-apy.conf, apertium-apy-gateway.conf and apertium-html-tools.conf files.

pre-start script
	. $CONFIG
	touch $LOGFILE
	echo "`date` $UPSTART_JOB started" >> $LOGFILE	
end script

post-stop script
	. $CONFIG
	touch $LOGFILE
	echo "`date` $UPSTART_JOB stoppped" >> $LOGFILE	
end script

TODO[edit]

  • hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/
  • translation cache
  • variants like ca_valencia, oc_aran and pt_BR look odd on the web page?
  • gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …)

Troubleshooting[edit]

CRITICAL:root:apy.py APy needs a UTF-8 locale, please set …[edit]

Do

 export LC_ALL=C.UTF-8

and put that line in your ~/.bashrc

See also Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22.

listen tcp 0.0.0.0:2737: bind: address already in use[edit]

Probably apy is already running, or some other program is holding the port open.

See what programs are using port 2737 with

lsof -i :2737

or

netstat -pna | grep 2737

If you're using docker, you may have to sudo those commands (lsof and netstat don't write anything, so that Should Be Safe™)

forking problems on systemd 228[edit]

If you get errors like

   HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'})
    Traceback (most recent call last):
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute
        result = yield result
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
        yielded = self.gen.throw(*exc_info)
      File "/home/apertium/apertium-apy/servlet.py", line 389, in get
        self.get_argument('markUnknown', default='yes'))
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
        yielded = self.gen.throw(*exc_info)
      File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond
        translated = yield pipeline.translate(toTranslate, nosplit)
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
        yielded = self.gen.throw(*exc_info)
      File "/home/apertium/apertium-apy/translation.py", line 69, in translate
        parts = yield [translateNULFlush(part, self) for part in all_split]
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback
        result_list.append(f.result())
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run
        yielded = self.gen.send(value)
      File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush
        proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE)
      File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
        restore_signals, start_new_session)
      File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child
        restore_signals, start_new_session, preexec_fn)
    BlockingIOError: [Errno 11] Resource temporarily unavailable

on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings.

logging errors[edit]

If you encounter errors involving enable_pretty_logging() while starting APY, comment out the line with a leading # to solve the issue.

What was the error? This should be possible to fix / work around.

High IO usage[edit]

If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file.

'return' with argument inside generator on python 3.2 or older[edit]

Traceback (most recent call last):   
File "./servlet.py", line 25, in <module>     import translation   
File "translation.py", line 132
     return proc_reformat.communicate()[0].decode('utf-8') 
SyntaxError: 'return' with argument inside generator

Solution: upgrade to Python 3.3 or newer.

Docs[edit]

  • /Translation
  • /Debian – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc.
  • /Fedora – quickstart installation guide for running your very own APY server on Fedora

Please cite[edit]