Difference between revisions of "Apertium-apy"
(→Usage) |
Harikrishna (talk | contribs) |
||
(148 intermediate revisions by 12 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
|||
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple apertium API server written in python, meant as a drop-in replacement for [[ScaleMT]]. It is currently found in the svn under trunk/apertium-tools/apertium-apy, where servlet.py is basically its entirety. This is meant for front ends like the simple one in trunk/apertium-tools/simple-html (where index.html is the main deal). |
|||
'''Apertium-APy''' stands for "'''Apertium''' '''A'''PI in '''Py'''thon". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for [[ScaleMT]]. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in [https://github.com/apertium/apertium-apy GitHub], where [https://github.com/apertium/apertium-apy/blob/master/servlet.py servlet.py] contains the relevant web server bits. The server is used by front ends like [[apertium-html-tools]] (on apertium.org) and [https://www.mediawiki.org/wiki/Content_translation Mediawiki Content Translation]. |
|||
The https://apertium.org page uses an installation which currently only runs ''released'' language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that. |
|||
== Test it! == |
|||
<pre> |
|||
$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse |
|||
[["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]] |
|||
</pre> |
|||
== Installation == |
== Installation == |
||
<span style="color: #f00;">''See [[/Debian]] for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.''</span> |
|||
First, compile and install apertium/lttoolbox/apertium-lex-tools, and compile your language pairs. See [[Minimal_installation_from_SVN]] for how to do this. Then |
|||
First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See [[Installation]] for how to do this. |
|||
You should have Python '''3.4''' or newer (though 3.2 has been reported to work as of 324a185). |
|||
APY uses [http://www.tornadoweb.org/en/stable/ Tornado 3.1 or newer] as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do |
|||
<pre> |
<pre> |
||
sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion |
|||
svn co https://svn.code.sf.net/p/apertium/svn/trunk/apertium-tools/apertium-apy |
|||
sudo pip3 install --upgrade tornado |
|||
</pre> |
|||
Then clone APY from github and run it: |
|||
<pre> |
|||
git clone git@github.com:apertium/apertium-apy.git |
|||
cd apertium-apy |
cd apertium-apy |
||
./servlet.py /usr/share/apertium # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs |
|||
export APERTIUMPATH="/path/to/apertium/svn/trunk" |
|||
./servlet.py "$APERTIUMPATH" |
|||
</pre> |
</pre> |
||
Optional arguments include: |
|||
See '''./servlet.py --help''' for documentation on how to start APY. Here are some popular optional arguments: |
|||
*'''--langNamesDB''': path to database of localized language names |
|||
*'''-port --port''': port to run server on (2737 by default) |
|||
*'''-l --lang-names''': path to sqlite3 database of localized language names (see [[#List localised language names]]; you should include this if you're using [[apertium-html-tools]]) |
|||
*'''--ssl:''' path to SSL certificate |
|||
*'''-p --port''': port to run server on (2737 by default) |
|||
*'''-c --ssl-cert:''' path to SSL certificate |
|||
*'''-k --ssl-key:''' path to SSL key file |
|||
*'''-j --num-processes:''' number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs) |
|||
*'''-s --nonpairs-path:''' include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout) |
|||
*'''-f --missing-freqs:''' path to sqlite3 database of words that were unknown (requires <code>sudo apt-get install sqlite3</code>) |
|||
*'''-i --max-pipes-per-pair:''' how many pipelines we can have per language pair (per http server), default = 1 |
|||
*'''-u --max-users-per-pipe:''' if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5) |
|||
*'''-m --max-idle-secs:''' after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM) |
|||
*'''-n --min-pipes-per-pair:''' when shutting down idle pairs, keep at least this many open (default = 0) |
|||
*'''-r --restart-pipe-after:''' if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests |
|||
===Installing dependencies without root=== |
|||
If you don't have root, you can still install the python dependencies with |
|||
<pre> |
|||
$ pip3 install --user --upgrade tornado |
|||
</pre> |
|||
(But your server still needs <code>build-essential python3-dev python3-pip zlib1g-dev</code> installed.) |
|||
Then you just need to run <pre>PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH</pre> before starting APY. |
|||
===Installing dependencies without root nor pip3=== |
|||
Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev</code>), but this is simpler if you don't want to mess with pip. |
|||
Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do |
|||
<pre> |
|||
cd apertium-apy |
|||
tar xf ~/Nedlastingar/tornado-4.3.tar.gz |
|||
( cd tornado-4.3 && python3 setup.py build ) |
|||
ln -s tornado-4.3/build/lib*/tornado tornado |
|||
</pre> |
|||
===Optional features=== |
|||
====List localised language names==== |
|||
If you use [[apertium-html-tools]], you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's <code>sudo apt-get install sqlite3</code>), then do |
|||
<pre> |
|||
make |
|||
</pre> |
|||
to create the langNames.db used for the <code>/listLanguageNames</code> function. |
|||
====Language identification==== |
|||
The <code>/identifyLang</code> function can provide language identification. |
|||
If you install ''Compact Language Detection 2'' (CLD2), you get fast and fairly accurate language detection. Installation can be a bit tricky though. |
|||
* Ubuntu: see http://blog.xanda.org/2014/04/02/installing-compact-language-detection-2-cld2-on-ubuntu/ |
|||
* Arch Linux: install python-cld2-hg from AUR. |
|||
Alternatively, you can start servlet.py with the -s argument pointing to a directory of language pairs with analyser modes, in which case APY will try to do language detection by analysing the text and finding which analyser had the least unknowns. This is a bit slow though :-) |
|||
APY will prefer using CLD2 if it's available, otherwise fall back to analyser coverage. |
|||
== Usage == |
== Usage == |
||
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested: |
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested: |
||
<code> |
<code> |
||
<pre>curl --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre> |
<pre>curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord</pre> |
||
</code> It can also be tested through your browser or through HTTP calls. Unfortunately, curl does '''not''' decode JSON output by default and to make testing easier, a APY Sandbox is provided with [[Apertium-html-tools]]. |
|||
</code>Note that this sends a POST request, using curl or your browser to send a GET request is also possible. |
|||
{| class="wikitable" border="1" |
{| class="wikitable" border="1" |
||
Line 25: | Line 102: | ||
! Function |
! Function |
||
! Parameters |
! Parameters |
||
! |
! Output |
||
|- |
|- |
||
| '''/listPairs''' |
| '''/listPairs''' |
||
| List available language pairs |
| List available language pairs |
||
| |
| |
||
*'''include_deprecated_codes''': give this parameter to include old ISO-639-1 codes in output |
|||
| <pre> |
|||
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an Array of language pair objects with keys <code>sourceLanguage</code> and <code>targetLanguage</code>. |
|||
$ curl http://localhost:2737/listPairs |
|||
<pre> |
|||
$ curl 'http://localhost:2737/listPairs' |
|||
{"responseStatus": 200, "responseData": [ |
{"responseStatus": 200, "responseData": [ |
||
Line 48: | Line 127: | ||
** generators |
** generators |
||
** taggers/disambiguators |
** taggers/disambiguators |
||
| The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium). |
|||
| <pre> |
|||
<pre> |
|||
$ curl http://localhost:2737/list?q=analyzers |
|||
$ curl 'http://localhost:2737/list?q=analyzers' |
|||
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", |
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", |
||
"tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} |
"tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} |
||
</pre> |
</pre> |
||
<pre> |
<pre> |
||
$ curl http://localhost:2737/list?q=generators |
$ curl 'http://localhost:2737/list?q=generators' |
||
{"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} |
{"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} |
||
</pre> |
</pre> |
||
<pre> |
<pre> |
||
$ curl http://localhost:2737/list?q=taggers |
$ curl 'http://localhost:2737/list?q=taggers' |
||
{"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", |
{"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", |
||
"tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
"tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
||
Line 68: | Line 148: | ||
*'''langpair''': language pair to use for translation |
*'''langpair''': language pair to use for translation |
||
*'''q''': text to translate |
*'''q''': text to translate |
||
*'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words |
|||
| <pre> |
|||
*'''deformat''': deformatter to be used: one of html (default), txt, rtf |
|||
*'''reformat''': deformatter to be used: one of html, html-noent (default), txt, rtf |
|||
*'''format''': if deformatter and reformatter are the same, they can be specified here |
|||
For more about formatting, please see [http://wiki.apertium.org/wiki/Format_handling Format Handling]. |
|||
| To be consistent with ScaleMT, the returned JS Object contains a <code>responseData</code> key with an JS Object that has key <code>translatedText</code> that contains the translated text. |
|||
<pre> |
|||
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' |
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' |
||
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} |
|||
output |
|||
$ echo Сен бардың ба? > myfile |
|||
$ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat' |
|||
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} |
|||
</pre> |
|||
The following two queries contain nonstandard whitespace characters and are equivalent: |
|||
<pre> |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&deformat=txt&reformat=txt' |
|||
{"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&format=txt' |
|||
{"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} |
|||
</pre> |
|||
The following two queries illustrate the difference between the <code>html</code> and <code>html-noent</code> reformatter: |
|||
<pre> |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html' |
|||
{"responseData": {"translatedText": "Qu&eacute; hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} |
|||
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent' |
|||
{"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} |
|||
</pre> |
|||
|- |
|||
| '''/translateDoc''' |
|||
| Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex) |
|||
| |
|||
*'''langpair''': language pair to use for translation |
|||
*'''file''': document to translate |
|||
*'''markUnknown=no''' (optional): include this to remove "*" in front of unknown words |
|||
| Returns the translated document. |
|||
<pre> |
|||
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt |
|||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/analyze''' |
| '''/analyze''' or '''/analyse''' |
||
| Morphologically analyze text |
| Morphologically analyze text |
||
| |
| |
||
*''' |
*'''lang''': language to use for analysis |
||
*'''q''': text to analyze |
*'''q''': text to analyze |
||
| The returned JS Array contains JS Arrays in the format <code>[analysis, input-text]</code>. |
|||
| |
|||
<pre style="white-space: pre-wrap; |
|||
<pre> |
|||
white-space: -moz-pre-wrap; |
|||
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
|||
white-space: -pre-wrap; |
|||
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], |
|||
white-space: -o-pre-wrap; |
|||
["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], |
|||
word-wrap: break-word;"> |
|||
["?/?<sent>","?"],["./.<sent>",".\n"]] |
|||
$ curl -G --data "lang=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
|||
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]] |
|||
</pre> |
</pre> |
||
|- |
|- |
||
Line 89: | Line 207: | ||
| Generate surface forms from text |
| Generate surface forms from text |
||
| |
| |
||
*''' |
*'''lang''': language to use for generation |
||
*'''q''': text to generate |
*'''q''': text to generate |
||
| The returned JS Array contains JS Arrays in the format <code>[generated, input-text]</code>. |
|||
| <pre> |
|||
<pre> |
|||
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate |
|||
$ curl -G --data "lang=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate |
|||
[["сен","^сен<v><tv><imp><p2><sg>$ "]] |
[["сен","^сен<v><tv><imp><p2><sg>$ "]] |
||
</pre> |
</pre> |
||
Line 99: | Line 218: | ||
| Perform morphological tasks per word |
| Perform morphological tasks per word |
||
| |
| |
||
*''' |
*'''lang''': language to use for tasks |
||
*'''modes''': morphological tasks to perform on text |
*'''modes''': morphological tasks to perform on text (15 combinations possible - delimit using '+') |
||
** tagger/disambig |
** tagger/disambig |
||
** biltrans |
** biltrans |
||
** translate |
** translate |
||
** |
** morph |
||
** translate+tagger (in any order) |
|||
** morph+tagger/morph+disambig (in any order) |
|||
*'''q''': text to perform tasks on |
*'''q''': text to perform tasks on |
||
| The returned JS Array contains JS Objects each containing the key <code>input</code> and up to 4 other keys corresponding to the requested modes (<code>tagger</code>, <code>morph</code>, <code>biltrans</code> and <code>translate</code>). |
|||
| <pre> |
|||
<pre style="white-space: pre-wrap; |
|||
$ curl "http://localhost:2737/perWord?lang=en-es&modes=morph&q=light" |
|||
white-space: -moz-pre-wrap; |
|||
[{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "input": "light"}] |
|||
white-space: -pre-wrap; |
|||
white-space: -o-pre-wrap; |
|||
word-wrap: break-word;"> |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph&q=let+there+be+light' |
|||
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger&q=let+there+be+light' |
|||
[{"input": "let", "tagger": "let<vblex><pp>"}, {"input": "there", "tagger": "there<adv>"}, {"input": "be", "tagger": "be<vbser><inf>"}, {"input": "light", "tagger": "light<adj><sint>"}] |
|||
[{"analyses": ["light<adj><sint>"], "input": "light"}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=let+there+be+light' |
|||
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] |
|||
[{"input": "light", "translations": ["luz<n><f><sg>", |
|||
"ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=translate&q=let+there+be+light' |
|||
[{"input": "let", "translate": ["dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"]}] |
|||
[{"input": "light", "translations": ["ligero<adj>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=let+there+be+light' |
|||
[{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] |
|||
[{"analyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], |
|||
"input": "light", "translations": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", |
|||
"encender<vblex><pres>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans&q=let+there+be+light' |
|||
[{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] |
|||
[{"analyses": ["light<adj><sint>"], "input": "light", "translations": ["ligero<adj>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans&q=let+there+be+light' |
|||
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] |
|||
[{"ambiguousAnalyses": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], |
|||
"input": "light", "disambiguatedAnalyses": ["light<adj><sint>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+biltrans&q=let+there+be+light' |
|||
[{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+translate&q=let+there+be+light' |
|||
[{"input": "let", "translate": ["dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "tagger": "light<adj><sint>"}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate&q=let+there+be+light' |
|||
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans+tagger&q=let+there+be+light' |
|||
[{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans+tagger&q=let+there+be+light' |
|||
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+tagger&q=let+there+be+light' |
|||
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans&q=let+there+be+light' |
|||
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] |
|||
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans+tagger&q=let+there+be+light' |
|||
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
|||
</pre> |
</pre> |
||
|- |
|- |
||
| '''/listLanguageNames''' |
|||
| Get localized language names |
|||
| |
|||
*'''locale''': language to get localized language names in |
|||
*'''languages''': list of '+' delimited language codes to retrieve localized names for (optional - if not specified, all available codes will be returned) |
|||
| The returned JS Object contains a mapping of requested language codes to localized language names |
|||
<pre> |
|||
$ curl 'http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk' |
|||
{"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"} |
|||
</pre> |
|||
|- |
|||
| '''/calcCoverage''' |
|||
| Get coverage of a language on a text |
|||
| |
|||
*'''lang''': language to analyze with |
|||
*'''q''': text to analyze for coverage |
|||
| The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage. |
|||
<pre style="white-space: pre-wrap; |
|||
white-space: -moz-pre-wrap; |
|||
white-space: -pre-wrap; |
|||
white-space: -o-pre-wrap; |
|||
word-wrap: break-word;"> |
|||
$ curl 'http://localhost:2737/getCoverage?lang=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind' |
|||
[0.9230769230769231] |
|||
</pre> |
|||
|- |
|||
| '''/identifyLang''' |
|||
| Return a list of languages with probabilities of the text being in that language. Uses CLD2 if that's installed, otherwise will try any analyser modes. |
|||
| |
|||
*'''q''': text which you would like to compute probabilities for |
|||
| The returned JS Object contains a mapping from language codes to probabilities. |
|||
<pre> |
|||
$ curl 'http://localhost:2737/identifyLang?q=This+is+a+piece+of+text.' |
|||
{"ca": 0.19384234, "en": 0.98792465234, "kk": 0.293442432, "zh": 0.002931001} |
|||
</pre> |
|||
|- |
|||
| '''/stats''' |
|||
| Return some statistics about pair usage, uptime, portion of time spent actively translating |
|||
| |
|||
*'''requests=N''' (optional): limit period-based stats to last N requests |
|||
| Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py) |
|||
<pre> |
|||
$ curl -Ss localhost:2737/stats|jq .responseData |
|||
{ |
|||
"holdingPipes": 0, |
|||
"periodStats": { |
|||
"totTimeSpent": 10.760803, |
|||
"ageFirstRequest": 19.609394, |
|||
"totChars": 2718, |
|||
"requests": 8, |
|||
"charsPerSec": 252.58 |
|||
}, |
|||
"runningPipes": { |
|||
"eng-spa": 1 |
|||
}, |
|||
"useCount": { |
|||
"eng-spa": 8 |
|||
}, |
|||
"uptime": 26 |
|||
} |
|||
</pre> |
|||
|- |
|||
| '''/spellCheck''' |
|||
| '''Note: This endpoint is not yet available in the main branch.''' Handles spell-checking requests using Voikko or Divvun spell checkers. |
|||
| |
|||
*'''q''': The text to be spell-checked (String, Required, e.g., `қазақша билмеймін`) |
|||
*'''lang''': The language of the text (String, Required, e.g., `kaz`) |
|||
*'''spellchecker''': The spell checker to use (String, Optional, Defaults to `voikko`, e.g., `divvun`) |
|||
| The output is a JSON array where each element represents a token from the input text. Each token includes the following information: |
|||
<pre> |
|||
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz' |
|||
[ |
|||
{"token": "қазақша", "known": true, "sugg": []}, |
|||
{"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]} |
|||
] |
|||
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun' |
|||
[ |
|||
{"token": "қазақша", "known": true, "sugg": []}, |
|||
{"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]} |
|||
] |
|||
</pre> |
|||
|} |
|} |
||
== |
== SSL == |
||
APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running: |
|||
Currently it uses TCPServer inheriting ThreadingMixIn. A lock on translateNULFlush (which has to have at most one thread per pipeline) ensures that part stays single-threaded (to avoid Alice getting Bob's text). |
|||
<pre> |
|||
openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes |
|||
</pre> |
|||
Then run APY with <code>--ssl-key server.key --ssl-cert server.crt</code>, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures): |
|||
===Try it out=== |
|||
<pre> |
|||
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze |
|||
</pre> |
|||
If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for: |
|||
Try testing with e.g. |
|||
<pre> |
|||
curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze |
|||
export APERTIUMPATH="/path/to/svn/trunk" |
|||
</pre> |
|||
python3 servlet "$APERTIUMPATH" 2737 & |
|||
Remember to open port 2737 to your server. |
|||
curl -s --data-urlencode 'langpair=nb|nn' --data-urlencode \ |
|||
'q@/tmp/reallybigfile' 'http://localhost:2737/translate' >/tmp/output & |
|||
curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den' |
|||
curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den' |
|||
curl 'http://localhost:2737/translate?langpair=nb|nn&q=men+ikke+den' |
|||
And see how the last three (after a slight wait) start outputting before the first request is done. |
|||
== Gateway == |
|||
===Morphological Analysis and Generation=== |
|||
A gateway for APY is located in the [https://github.com/apertium/apertium-apy same directory] and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding <code>/list</code> requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers. |
|||
A list of APY servers is a required positional argument; an example server list is [https://github.com/apertium/apertium-apy/blob/master/serverlist-example provided] in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one. |
|||
To analyze text, send a POST or GET request to <code>/analyze</code> with parameters <code>mode</code> and <code>q</code> set. For example: |
|||
The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each <code>(mode, language)</code> and forwards requests to the fastest server as measured in units of response time per response length. |
|||
$ curl --data "mode=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze |
|||
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "],["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"],["?/?<sent>","?"],["./.<sent>",".\n"]] |
|||
==Running on init== |
|||
===Systemd=== |
|||
See [[Apy/Debian]] for the quickstart. |
|||
====Running as a --user unit==== |
|||
The JSON response will consist of a list of lists each of form <code>[analysis with following non-analyzed text*, original input token]</code>. To receive a list of valid analyzer modes, send a request to <code>/listAnalyzers</code>. |
|||
If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do: |
|||
<pre> |
|||
sudo apt-get install dbus libpam-systemd # or dnf on Fedora etc. |
|||
sudo loginctl enable-linger tussenvoegsel |
|||
</pre> |
|||
To read the logs without sudo, admin will also have to enable persistent logs (see [[Apertium-apy#Persistent_logs|below]]). |
|||
To generate surface forms from an analysis, send a POST or GET request to <code>/generate</code> with parameters <code>mode</code> and <code>q</code> set. For example: |
|||
Then as your "tussenvoegsel" user, do |
|||
$ curl --data "mode=kaz&q=^сен<v><tv><imp><p2><sg>$+^сен<v><tv><imp><p2><pl>$" http://localhost:2737/generate |
|||
<pre> |
|||
[["сен ","^сен<v><tv><imp><p2><sg>$ "],["сеніңдер","^сен<v><tv><imp><p2><pl>$"]] |
|||
mkdir -p ~/.config/systemd/user/ |
|||
git clone https://github.com/apertium/apertium-apy |
|||
cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/ |
|||
</pre> |
|||
Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy. |
|||
Here's a full example apy.service file: |
|||
The JSON response will consist of a list of lists each of form <code>[generated form with following non-analyzed text*, original lexical unit input]</code>. To receive a list of valid generator modes, send a request to <code>/listGenerators</code>. |
|||
<pre> |
|||
$ cat ~/.config/systemd/user/apy.service |
|||
[Unit] |
|||
Description=Translation server and API for Apertium |
|||
Documentation=http://wiki.apertium.org/wiki/Apertium-apy |
|||
After=network.target |
|||
[Service] |
|||
WorkingDirectory=/home/tussenvoegsel/apertium-apy |
|||
ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes |
|||
Restart=always |
|||
WatchdogSec=10s |
|||
[Install] |
|||
WantedBy=multi-user.target |
|||
</pre> |
|||
<nowiki>*</nowiki> e.g. whitespace, superblanks |
|||
You should now be able to do: |
|||
<pre> |
|||
systemctl --user daemon-reload # re-read the edited apy.service file |
|||
systemctl --user start apy # start apy immediately |
|||
systemctl --user stop apy # stop apy immediately |
|||
systemctl --user enable apy # make apy start after next reboot |
|||
systemctl --user status apy # check if apy is running |
|||
journalctl -f --user-unit apy # follow the apy logs |
|||
journalctl -n100 --user-unit apy # show last 100 lines of apy logs |
|||
curl 'localhost:2737/listPairs' # show installed pairs |
|||
curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words |
|||
</pre> |
|||
=== |
====Persistent logs==== |
||
By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this: |
|||
To test with a self-signed signature: |
|||
<pre> |
<pre> |
||
sudo mkdir /var/log/journal |
|||
openssl req -new -x509 -keyout server.pem -out server.pem -days 365 -nodes |
|||
sudo systemctl restart systemd-journald |
|||
</pre> |
</pre> |
||
===Upstart=== |
|||
Then run with --ssl server.pem, and test with https and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures): |
|||
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: <code>sudo apt-get install upstart</code> |
|||
The apertiumconfig file contains paths of some apertium directories and the serverlist file. It can be saved anywhere. Make sure the paths are correct! |
|||
/home/user/apertiumconfig |
|||
<pre> |
<pre> |
||
APERTIUMPATH=/home/user |
|||
curl -k --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze |
|||
APYPATH=/home/user/apertium-apy |
|||
SERVERLIST=/home/user/serverlist |
|||
HTMLTOOLSPATH=/home/user/apertium-html-tools |
|||
#optional, see 'Logging': |
|||
LOGFILE=/home/user/apertiumlog |
|||
</pre> |
</pre> |
||
The following upstart scripts have to be saved in <code>/etc/init</code>. |
|||
apertium-all.conf |
|||
If you have a signed signature for e.g. apache, it's likely to be split into two files, one .key and one .crt. You can cat them together into one to use with servlet.py: |
|||
<pre> |
<pre> |
||
description "start/stop all apertium services" |
|||
cat server.key server.crt > server.keycrt |
|||
start on startup |
|||
</pre> |
</pre> |
||
Now you should be able to use curl without -k for the domain which the certificate is signed for: |
|||
apertium-apy.conf |
|||
<pre> |
<pre> |
||
description "apertium-apy init script" |
|||
curl --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze |
|||
start on starting apertium-all |
|||
stop on stopped apertium-all |
|||
respawn |
|||
respawn limit 50 300 |
|||
env CONFIG=/etc/default/apertium |
|||
script |
|||
. $CONFIG |
|||
python3 $APYPATH/servlet.py $APERTIUMPATH |
|||
end script |
|||
</pre> |
|||
apertium-apy-gateway.conf |
|||
<pre> |
|||
description "apertium-apy gateway init script" |
|||
start on starting apertium-all |
|||
stop on stopped apertium-all |
|||
respawn |
|||
respawn limit 50 300 |
|||
env CONFIG=/home/user/apertiumconfig |
|||
script |
|||
. $CONFIG |
|||
python3 $APYPATH/gateway.py $SERVERLIST |
|||
end script |
|||
</pre> |
|||
apertium-html-tools.conf |
|||
<pre> |
|||
description "apertium-html-tools init script" |
|||
start on starting apertium-all |
|||
stop on stopped apertium-all |
|||
respawn |
|||
respawn limit 50 300 |
|||
env CONFIG=/etc/default/apertium |
|||
script |
|||
. $CONFIG |
|||
cd $HTMLTOOLSPATH |
|||
python3 -m http.server 8888 |
|||
end script |
|||
</pre> |
|||
Use <code>sudo start apertium-all</code> to start all services. Just like the filenames, the jobs are called <code>apertium-apy</code>, <code>apertium-apy-gateway</code> and <code>apertium-html-tools</code>. |
|||
The jobs can be independently started by: <code>sudo start JOB</code> |
|||
You can stop them by using <code>sudo stop JOB</code> |
|||
Restart: <code>sudo restart JOB</code> |
|||
View the status and PID: <code>sudo status JOB</code> |
|||
====Logging==== |
|||
The log files of the processes can be found in the <code>/var/log/upstart/</code> folder. |
|||
The starting/stopping of the jobs can be logged by appending this to the end of <code>apertium-apy.conf</code>, <code>apertium-apy-gateway.conf</code> and <code>apertium-html-tools.conf</code> files. |
|||
<pre> |
|||
pre-start script |
|||
. $CONFIG |
|||
touch $LOGFILE |
|||
echo "`date` $UPSTART_JOB started" >> $LOGFILE |
|||
end script |
|||
post-stop script |
|||
. $CONFIG |
|||
touch $LOGFILE |
|||
echo "`date` $UPSTART_JOB stoppped" >> $LOGFILE |
|||
end script |
|||
</pre> |
</pre> |
||
Remember to open port 2737 to your server. |
|||
==TODO== |
==TODO== |
||
* hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/ |
|||
* It should be possible to set a time-out for translation threads, so if a translation is taking too long, it gets killed and the queue moves along. |
|||
* translation cache |
|||
* It should use one lock per pipeline, since we don't need to wait for mk-en just because sme-nob is running. |
|||
* variants like ca_valencia, oc_aran and pt_BR look odd on the web page? |
|||
* http://stackoverflow.com/a/487281/69663 recommends select/polling over threading (http://docs.python.org/3.3/library/socketserver.html for more on the differences) but requires either lots of manually written dispatching code (http://pymotw.com/2/select/) or a framework like Twisted. |
|||
* gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …) |
|||
* some language pairs still don't work (sme-nob?) |
|||
* hfst-proc -g doesn't work with null-flushing (or?) |
|||
==Troubleshooting== |
|||
=== CRITICAL:root:apy.py APy needs a UTF-8 locale, please set … === |
|||
Do <pre> export LC_ALL=C.UTF-8</pre> |
|||
and put that line in your ~/.bashrc |
|||
See also [[Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22]]. |
|||
=== listen tcp 0.0.0.0:2737: bind: address already in use === |
|||
Probably apy is already running, or some other program is holding the port open. |
|||
See what programs are using port 2737 with |
|||
<pre> |
|||
lsof -i :2737 |
|||
</pre> |
|||
or |
|||
<pre> |
|||
netstat -pna | grep 2737 |
|||
</pre> |
|||
If you're using docker, you may have to <code>sudo</code> those commands (lsof and netstat don't write anything, so that Should Be Safe™) |
|||
===forking problems on systemd 228 === |
|||
If you get errors like |
|||
<pre> |
|||
HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'}) |
|||
Traceback (most recent call last): |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute |
|||
result = yield result |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run |
|||
yielded = self.gen.throw(*exc_info) |
|||
File "/home/apertium/apertium-apy/servlet.py", line 389, in get |
|||
self.get_argument('markUnknown', default='yes')) |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run |
|||
yielded = self.gen.throw(*exc_info) |
|||
File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond |
|||
translated = yield pipeline.translate(toTranslate, nosplit) |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run |
|||
yielded = self.gen.throw(*exc_info) |
|||
File "/home/apertium/apertium-apy/translation.py", line 69, in translate |
|||
parts = yield [translateNULFlush(part, self) for part in all_split] |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run |
|||
value = future.result() |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback |
|||
result_list.append(f.result()) |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result |
|||
raise_exc_info(self._exc_info) |
|||
File "<string>", line 3, in raise_exc_info |
|||
File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run |
|||
yielded = self.gen.send(value) |
|||
File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush |
|||
proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE) |
|||
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__ |
|||
restore_signals, start_new_session) |
|||
File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child |
|||
restore_signals, start_new_session, preexec_fn) |
|||
BlockingIOError: [Errno 11] Resource temporarily unavailable |
|||
</pre> |
|||
on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings. |
|||
===logging errors=== |
|||
If you encounter errors involving <code>enable_pretty_logging()</code> while starting APY, comment out the line with a leading <code>#</code> to solve the issue. |
|||
: What was the error? This should be possible to fix / work around. |
|||
===High IO usage=== |
|||
If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file. |
|||
==='return' with argument inside generator on python 3.2 or older=== |
|||
<pre> |
|||
Traceback (most recent call last): |
|||
File "./servlet.py", line 25, in <module> import translation |
|||
File "translation.py", line 132 |
|||
return proc_reformat.communicate()[0].decode('utf-8') |
|||
SyntaxError: 'return' with argument inside generator |
|||
</pre> |
|||
Solution: upgrade to Python 3.3 or newer. |
|||
==Docs== |
|||
* [[/Translation]] |
|||
* [[/Debian]] – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc. |
|||
* [[/Fedora]] – quickstart installation guide for running your very own APY server on Fedora |
|||
== Please cite == |
|||
* https://www.aclweb.org/anthology/W18-2207/ |
|||
[[Category:Tools]] |
[[Category:Tools]] |
||
[[Category:Services]] |
|||
[[Category:Documentation]] |
Latest revision as of 19:54, 1 August 2024
Apertium-APy stands for "Apertium API in Python". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for ScaleMT. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in GitHub, where servlet.py contains the relevant web server bits. The server is used by front ends like apertium-html-tools (on apertium.org) and Mediawiki Content Translation.
The https://apertium.org page uses an installation which currently only runs released language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that.
Test it![edit]
$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse [["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]]
Installation[edit]
See /Debian for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.
First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See Installation for how to do this.
You should have Python 3.4 or newer (though 3.2 has been reported to work as of 324a185).
APY uses Tornado 3.1 or newer as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do
sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion sudo pip3 install --upgrade tornado
Then clone APY from github and run it:
git clone git@github.com:apertium/apertium-apy.git cd apertium-apy ./servlet.py /usr/share/apertium # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs
See ./servlet.py --help for documentation on how to start APY. Here are some popular optional arguments:
- -l --lang-names: path to sqlite3 database of localized language names (see #List localised language names; you should include this if you're using apertium-html-tools)
- -p --port: port to run server on (2737 by default)
- -c --ssl-cert: path to SSL certificate
- -k --ssl-key: path to SSL key file
- -j --num-processes: number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs)
- -s --nonpairs-path: include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout)
- -f --missing-freqs: path to sqlite3 database of words that were unknown (requires
sudo apt-get install sqlite3
) - -i --max-pipes-per-pair: how many pipelines we can have per language pair (per http server), default = 1
- -u --max-users-per-pipe: if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5)
- -m --max-idle-secs: after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM)
- -n --min-pipes-per-pair: when shutting down idle pairs, keep at least this many open (default = 0)
- -r --restart-pipe-after: if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests
Installing dependencies without root[edit]
If you don't have root, you can still install the python dependencies with
$ pip3 install --user --upgrade tornado
(But your server still needs build-essential python3-dev python3-pip zlib1g-dev
installed.)
Then you just need to run
PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH
before starting APY.
Installing dependencies without root nor pip3[edit]
Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev), but this is simpler if you don't want to mess with pip.
Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do
cd apertium-apy tar xf ~/Nedlastingar/tornado-4.3.tar.gz ( cd tornado-4.3 && python3 setup.py build ) ln -s tornado-4.3/build/lib*/tornado tornado
Optional features[edit]
List localised language names[edit]
If you use apertium-html-tools, you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's sudo apt-get install sqlite3
), then do
make
to create the langNames.db used for the /listLanguageNames
function.
Language identification[edit]
The /identifyLang
function can provide language identification.
If you install Compact Language Detection 2 (CLD2), you get fast and fairly accurate language detection. Installation can be a bit tricky though.
- Ubuntu: see http://blog.xanda.org/2014/04/02/installing-compact-language-detection-2-cld2-on-ubuntu/
- Arch Linux: install python-cld2-hg from AUR.
Alternatively, you can start servlet.py with the -s argument pointing to a directory of language pairs with analyser modes, in which case APY will try to do language detection by analysing the text and finding which analyser had the least unknowns. This is a bit slow though :-)
APY will prefer using CLD2 if it's available, otherwise fall back to analyser coverage.
Usage[edit]
APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:
curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord
It can also be tested through your browser or through HTTP calls. Unfortunately, curl does not decode JSON output by default and to make testing easier, a APY Sandbox is provided with Apertium-html-tools.
URL | Function | Parameters | Output |
---|---|---|---|
/listPairs | List available language pairs |
|
To be consistent with ScaleMT, the returned JS Object contains a responseData key with an Array of language pair objects with keys sourceLanguage and targetLanguage .
$ curl 'http://localhost:2737/listPairs' {"responseStatus": 200, "responseData": [ {"sourceLanguage": "kaz", "targetLanguage": "tat"}, {"sourceLanguage": "tat", "targetLanguage": "kaz"}, {"sourceLanguage": "mk", "targetLanguage": "en"} ], "responseDetails": null} |
/list | List available mode information |
|
The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).
$ curl 'http://localhost:2737/list?q=analyzers' {"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"} $ curl 'http://localhost:2737/list?q=generators' {"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"} $ curl 'http://localhost:2737/list?q=taggers' {"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger", "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"} |
/translate | Translate text |
For more about formatting, please see Format Handling. |
To be consistent with ScaleMT, the returned JS Object contains a responseData key with an JS Object that has key translatedText that contains the translated text.
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?' {"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} $ echo Сен бардың ба? > myfile $ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat' {"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null} The following two queries contain nonstandard whitespace characters and are equivalent: $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&deformat=txt&reformat=txt' {"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This works well&format=txt' {"responseStatus": 200, "responseData": {"translatedText": "Esto trabaja\u2001bien"}, "responseDetails": null} The following two queries illustrate the difference between the $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html' {"responseData": {"translatedText": "Qué hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} $ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent' {"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200} |
/translateDoc | Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex) |
|
Returns the translated document.
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt |
/analyze or /analyse | Morphologically analyze text |
|
The returned JS Array contains JS Arrays in the format [analysis, input-text] .
$ curl -G --data "lang=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze [["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]] |
/generate | Generate surface forms from text |
|
The returned JS Array contains JS Arrays in the format [generated, input-text] .
$ curl -G --data "lang=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate [["сен","^сен<v><tv><imp><p2><sg>$ "]] |
/perWord | Perform morphological tasks per word |
|
The returned JS Array contains JS Objects each containing the key input and up to 4 other keys corresponding to the requested modes (tagger , morph , biltrans and translate ).
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger&q=let+there+be+light' [{"input": "let", "tagger": "let<vblex><pp>"}, {"input": "there", "tagger": "there<adv>"}, {"input": "be", "tagger": "be<vbser><inf>"}, {"input": "light", "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=translate&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=let+there+be+light' [{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+biltrans&q=let+there+be+light' [{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+translate&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans+tagger&q=let+there+be+light' [{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans+tagger&q=let+there+be+light' [{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+tagger&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}] curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans+tagger&q=let+there+be+light' [{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}] |
/listLanguageNames | Get localized language names |
|
The returned JS Object contains a mapping of requested language codes to localized language names
$ curl 'http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk' {"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"} |
/calcCoverage | Get coverage of a language on a text |
|
The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage.
$ curl 'http://localhost:2737/getCoverage?lang=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind' [0.9230769230769231] |
/identifyLang | Return a list of languages with probabilities of the text being in that language. Uses CLD2 if that's installed, otherwise will try any analyser modes. |
|
The returned JS Object contains a mapping from language codes to probabilities.
$ curl 'http://localhost:2737/identifyLang?q=This+is+a+piece+of+text.' {"ca": 0.19384234, "en": 0.98792465234, "kk": 0.293442432, "zh": 0.002931001} |
/stats | Return some statistics about pair usage, uptime, portion of time spent actively translating |
|
Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py)
$ curl -Ss localhost:2737/stats|jq .responseData { "holdingPipes": 0, "periodStats": { "totTimeSpent": 10.760803, "ageFirstRequest": 19.609394, "totChars": 2718, "requests": 8, "charsPerSec": 252.58 }, "runningPipes": { "eng-spa": 1 }, "useCount": { "eng-spa": 8 }, "uptime": 26 } |
/spellCheck | Note: This endpoint is not yet available in the main branch. Handles spell-checking requests using Voikko or Divvun spell checkers. |
|
The output is a JSON array where each element represents a token from the input text. Each token includes the following information:
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz' [ {"token": "қазақша", "known": true, "sugg": []}, {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]} ] $ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun' [ {"token": "қазақша", "known": true, "sugg": []}, {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]} ] |
SSL[edit]
APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running:
openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes
Then run APY with --ssl-key server.key --ssl-cert server.crt
, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):
curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze
If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for:
curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze
Remember to open port 2737 to your server.
Gateway[edit]
A gateway for APY is located in the same directory and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding /list
requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers.
A list of APY servers is a required positional argument; an example server list is provided in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one.
The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each (mode, language)
and forwards requests to the fastest server as measured in units of response time per response length.
Running on init[edit]
Systemd[edit]
See Apy/Debian for the quickstart.
Running as a --user unit[edit]
If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do:
sudo apt-get install dbus libpam-systemd # or dnf on Fedora etc. sudo loginctl enable-linger tussenvoegsel
To read the logs without sudo, admin will also have to enable persistent logs (see below).
Then as your "tussenvoegsel" user, do
mkdir -p ~/.config/systemd/user/ git clone https://github.com/apertium/apertium-apy cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/
Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy.
Here's a full example apy.service file:
$ cat ~/.config/systemd/user/apy.service [Unit] Description=Translation server and API for Apertium Documentation=http://wiki.apertium.org/wiki/Apertium-apy After=network.target [Service] WorkingDirectory=/home/tussenvoegsel/apertium-apy ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes Restart=always WatchdogSec=10s [Install] WantedBy=multi-user.target
You should now be able to do:
systemctl --user daemon-reload # re-read the edited apy.service file systemctl --user start apy # start apy immediately systemctl --user stop apy # stop apy immediately systemctl --user enable apy # make apy start after next reboot systemctl --user status apy # check if apy is running journalctl -f --user-unit apy # follow the apy logs journalctl -n100 --user-unit apy # show last 100 lines of apy logs curl 'localhost:2737/listPairs' # show installed pairs curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words
Persistent logs[edit]
By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this:
sudo mkdir /var/log/journal sudo systemctl restart systemd-journald
Upstart[edit]
You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: sudo apt-get install upstart
The apertiumconfig file contains paths of some apertium directories and the serverlist file. It can be saved anywhere. Make sure the paths are correct!
/home/user/apertiumconfig
APERTIUMPATH=/home/user APYPATH=/home/user/apertium-apy SERVERLIST=/home/user/serverlist HTMLTOOLSPATH=/home/user/apertium-html-tools #optional, see 'Logging': LOGFILE=/home/user/apertiumlog
The following upstart scripts have to be saved in /etc/init
.
apertium-all.conf
description "start/stop all apertium services" start on startup
apertium-apy.conf
description "apertium-apy init script" start on starting apertium-all stop on stopped apertium-all respawn respawn limit 50 300 env CONFIG=/etc/default/apertium script . $CONFIG python3 $APYPATH/servlet.py $APERTIUMPATH end script
apertium-apy-gateway.conf
description "apertium-apy gateway init script" start on starting apertium-all stop on stopped apertium-all respawn respawn limit 50 300 env CONFIG=/home/user/apertiumconfig script . $CONFIG python3 $APYPATH/gateway.py $SERVERLIST end script
apertium-html-tools.conf
description "apertium-html-tools init script" start on starting apertium-all stop on stopped apertium-all respawn respawn limit 50 300 env CONFIG=/etc/default/apertium script . $CONFIG cd $HTMLTOOLSPATH python3 -m http.server 8888 end script
Use sudo start apertium-all
to start all services. Just like the filenames, the jobs are called apertium-apy
, apertium-apy-gateway
and apertium-html-tools
.
The jobs can be independently started by: sudo start JOB
You can stop them by using sudo stop JOB
Restart: sudo restart JOB
View the status and PID: sudo status JOB
Logging[edit]
The log files of the processes can be found in the /var/log/upstart/
folder.
The starting/stopping of the jobs can be logged by appending this to the end of apertium-apy.conf
, apertium-apy-gateway.conf
and apertium-html-tools.conf
files.
pre-start script . $CONFIG touch $LOGFILE echo "`date` $UPSTART_JOB started" >> $LOGFILE end script post-stop script . $CONFIG touch $LOGFILE echo "`date` $UPSTART_JOB stoppped" >> $LOGFILE end script
TODO[edit]
- hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/
- translation cache
- variants like ca_valencia, oc_aran and pt_BR look odd on the web page?
- gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …)
Troubleshooting[edit]
CRITICAL:root:apy.py APy needs a UTF-8 locale, please set …[edit]
Do
export LC_ALL=C.UTF-8
and put that line in your ~/.bashrc
See also Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22.
listen tcp 0.0.0.0:2737: bind: address already in use[edit]
Probably apy is already running, or some other program is holding the port open.
See what programs are using port 2737 with
lsof -i :2737
or
netstat -pna | grep 2737
If you're using docker, you may have to sudo
those commands (lsof and netstat don't write anything, so that Should Be Safe™)
forking problems on systemd 228[edit]
If you get errors like
HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'}) Traceback (most recent call last): File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute result = yield result File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/home/apertium/apertium-apy/servlet.py", line 389, in get self.get_argument('markUnknown', default='yes')) File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond translated = yield pipeline.translate(toTranslate, nosplit) File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run yielded = self.gen.throw(*exc_info) File "/home/apertium/apertium-apy/translation.py", line 69, in translate parts = yield [translateNULFlush(part, self) for part in all_split] File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run value = future.result() File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback result_list.append(f.result()) File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result raise_exc_info(self._exc_info) File "<string>", line 3, in raise_exc_info File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run yielded = self.gen.send(value) File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE) File "/usr/lib/python3.5/subprocess.py", line 947, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child restore_signals, start_new_session, preexec_fn) BlockingIOError: [Errno 11] Resource temporarily unavailable
on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings.
logging errors[edit]
If you encounter errors involving enable_pretty_logging()
while starting APY, comment out the line with a leading #
to solve the issue.
- What was the error? This should be possible to fix / work around.
High IO usage[edit]
If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file.
'return' with argument inside generator on python 3.2 or older[edit]
Traceback (most recent call last): File "./servlet.py", line 25, in <module> import translation File "translation.py", line 132 return proc_reformat.communicate()[0].decode('utf-8') SyntaxError: 'return' with argument inside generator
Solution: upgrade to Python 3.3 or newer.
Docs[edit]
- /Translation
- /Debian – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc.
- /Fedora – quickstart installation guide for running your very own APY server on Fedora