Apertium-apy

From Apertium
Revision as of 19:26, 1 August 2024 by Harikrishna (talk | contribs)
Jump to navigation Jump to search

Apertium-APy stands for "Apertium API in Python". It's a simple Apertium API server written in Python 3, meant as a drop-in replacement for ScaleMT. Its primary/intended purpose is requests from web applications, though it's fairly versatile. It is currently found in GitHub, where servlet.py contains the relevant web server bits. The server is used by front ends like apertium-html-tools (on apertium.org) and Mediawiki Content Translation.


The https://apertium.org page uses an installation which currently only runs released language pairs (also available from https://apertium.org/apy if you prefer). However, APY is very easy to set up on your own server, where you can run all the development pairs and even analysers and taggers (like what http://turkic.apertium.org does), read on for how to do that.

Test it!

$ curl -G --data "lang=kir&modes=morph&q=алдым" https://beta.apertium.org/apy/analyse

[["алдым/алд<n><px1sg><nom>/алд<n><px1sg><nom>+э<cop><aor><p3><pl>/алд<n><px1sg><nom>+э<cop><aor><p3><sg>/ал<v><tv><ifi><p1><sg>/ал<vaux><ifi><p1><sg>", "алдым"]]

Installation

See /Debian for a complete quickstart installation guide for Debian, Ubuntu, Linux Mint, etc that uses the prebuilt binaries.

First, install apertium/lttoolbox/apertium-lex-tools, and your language pairs. See Installation for how to do this.

You should have Python 3.4 or newer (though 3.2 has been reported to work as of 324a185).

APY uses Tornado 3.1 or newer as its web framework. Ensure that you install the Python 3.4 (or newer) versions of any dependencies. On Debian/Ubuntu, you can do

sudo apt-get install build-essential python3-dev python3-pip zlib1g-dev subversion
sudo pip3 install --upgrade tornado

Then clone APY from github and run it:

git clone git@github.com:apertium/apertium-apy.git
cd apertium-apy
./servlet.py /usr/share/apertium   # the server will use all .mode files from under this directory, use /usr/local/share/apertium for "make install"ed pairs

See ./servlet.py --help for documentation on how to start APY. Here are some popular optional arguments:

  • -l --lang-names: path to sqlite3 database of localized language names (see #List localised language names; you should include this if you're using apertium-html-tools)
  • -p --port: port to run server on (2737 by default)
  • -c --ssl-cert: path to SSL certificate
  • -k --ssl-key: path to SSL key file
  • -j --num-processes: number of http processes to run (default = 1; use 0 to run one http server per core, where each http server runs all available language pairs)
  • -s --nonpairs-path: include .mode files from this directory, like with the main arg, but skip translator (pair) modes, only include analyser/translator/generator modes from this directory (handy for use with apertium checkout)
  • -f --missing-freqs: path to sqlite3 database of words that were unknown (requires sudo apt-get install sqlite3)
  • -i --max-pipes-per-pair: how many pipelines we can have per language pair (per http server), default = 1
  • -u --max-users-per-pipe: if there are this many concurrent users in the least-used pipeline of a pair (and we haven't reached max-pipes-per-pair), start a new pipeline (default = 5)
  • -m --max-idle-secs: after each translation request, go through the list of language pairs and shut down any pair that hasn't been used in the last MAX_IDLE_SECS seconds (to save on RAM)
  • -n --min-pipes-per-pair: when shutting down idle pairs, keep at least this many open (default = 0)
  • -r --restart-pipe-after: if a pipeline has been used for this many requests, shut it down (to avoid possible memory creep if a pair has bugs) after it has handled its current requests

Installing dependencies without root

If you don't have root, you can still install the python dependencies with

$ pip3 install --user --upgrade tornado

(But your server still needs build-essential python3-dev python3-pip zlib1g-dev installed.)

Then you just need to run

PYTHONPATH="/usr/local/lib/python3.3/site-packages:${PYTHONPATH}"; export PYTHONPATH

before starting APY.

Installing dependencies without root nor pip3

Your server still needs python3 (and probably code>build-essential python3-dev zlib1g-dev), but this is simpler if you don't want to mess with pip.

Just go to https://pypi.python.org/pypi/tornado/#downloads and get the newest version .tar.gz source release; say it got stored as ~/Nedlastingar/tornado-4.3.tar.gz, then do

cd apertium-apy
tar xf ~/Nedlastingar/tornado-4.3.tar.gz 
( cd tornado-4.3 && python3 setup.py build )
ln -s tornado-4.3/build/lib*/tornado tornado

Optional features

List localised language names

If you use apertium-html-tools, you probably want localised language names instead of three-letter codes. To get this, first install sqlite3 (on Debian/Ubuntu that's sudo apt-get install sqlite3), then do

make

to create the langNames.db used for the /listLanguageNames function.

Language identification

The /identifyLang function can provide language identification.

If you install Compact Language Detection 2 (CLD2), you get fast and fairly accurate language detection. Installation can be a bit tricky though.


Alternatively, you can start servlet.py with the -s argument pointing to a directory of language pairs with analyser modes, in which case APY will try to do language detection by analysing the text and finding which analyser had the least unknowns. This is a bit slow though :-)

APY will prefer using CLD2 if it's available, otherwise fall back to analyser coverage.

Usage

APY supports three types of requests: GET, POST, and JSONP. Using GET/POST are possible only if APY is running on the same server as the client due to cross-site scripting restrictions; however, JSONP requests are permitted in any context and will be useful. Using curl, APY can easily be tested:

curl -G --data "lang=kaz-tat&modes=morph&q=алдым" http://localhost:2737/perWord

It can also be tested through your browser or through HTTP calls. Unfortunately, curl does not decode JSON output by default and to make testing easier, a APY Sandbox is provided with Apertium-html-tools.

URL Function Parameters Output
/listPairs List available language pairs
  • include_deprecated_codes: give this parameter to include old ISO-639-1 codes in output
To be consistent with ScaleMT, the returned JS Object contains a responseData key with an Array of language pair objects with keys sourceLanguage and targetLanguage.
$ curl 'http://localhost:2737/listPairs'

{"responseStatus": 200, "responseData": [
 {"sourceLanguage": "kaz", "targetLanguage": "tat"}, 
 {"sourceLanguage": "tat", "targetLanguage": "kaz"}, 
 {"sourceLanguage": "mk", "targetLanguage": "en"}
], "responseDetails": null}
/list List available mode information
  • q: type of information to list
    • pairs (alias for /listPairs)
    • analyzers/analysers
    • generators
    • taggers/disambiguators
The returned JS Object contains a mapping from language pairs to mode names (used internally by Apertium).
$ curl 'http://localhost:2737/list?q=analyzers'
{"mk-en": "mk-en-morph", "en-es": "en-es-anmor", "kaz-tat": "kaz-tat-morph", 
 "tat-kaz": "tat-kaz-morph", "fin": "fin-morph", "es-en": "es-en-anmor", "kaz": "kaz-morph"}
$ curl 'http://localhost:2737/list?q=generators'
{"en-es": "en-es-generador", "fin": "fin-gener", "es-en": "es-en-generador"}
$ curl 'http://localhost:2737/list?q=taggers'
{"es-en": "es-en-tagger", "en-es": "en-es-tagger", "mk-en": "mk-en-tagger",
 "tat-kaz": "tat-kaz-tagger", "kaz-tat": "kaz-tat-tagger", "kaz": "kaz-tagger"}
/translate Translate text
  • langpair: language pair to use for translation
  • q: text to translate
  • markUnknown=no (optional): include this to remove "*" in front of unknown words
  • deformat: deformatter to be used: one of html (default), txt, rtf
  • reformat: deformatter to be used: one of html, html-noent (default), txt, rtf
  • format: if deformatter and reformatter are the same, they can be specified here

For more about formatting, please see Format Handling.

To be consistent with ScaleMT, the returned JS Object contains a responseData key with an JS Object that has key translatedText that contains the translated text.
$ curl 'http://localhost:2737/translate?langpair=kaz|tat&q=Сен+бардың+ба?'
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}
$ echo Сен бардың ба? > myfile
$ curl --data-urlencode 'q@myfile' 'http://localhost:2737/translate?langpair=kaz|tat'
{"responseStatus": 200, "responseData": {"translatedText": "Син барныңмы?"}, "responseDetails": null}

The following two queries contain nonstandard whitespace characters and are equivalent:

$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This    works well&deformat=txt&reformat=txt'
{"responseStatus": 200, "responseData": {"translatedText": "Esto    trabaja\u2001bien"}, "responseDetails": null}
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=This    works well&format=txt'
{"responseStatus": 200, "responseData": {"translatedText": "Esto    trabaja\u2001bien"}, "responseDetails": null}

The following two queries illustrate the difference between the html and html-noent reformatter:

$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html'
{"responseData": {"translatedText": "Qu&eacute; hace este trabajo?"}, "responseDetails": null, "responseStatus": 200}
$ curl 'http://localhost:2737/translate?langpair=eng|spa&q=How does this work?&reformat=html-noent'
{"responseData": {"translatedText": "Qu\u00e9 hace este trabajo?"}, "responseDetails": null, "responseStatus": 200}
/translateDoc Translate a document (.odt, .txt, .rtf, .html, .docx, .pptx, .xlsx, .tex)
  • langpair: language pair to use for translation
  • file: document to translate
  • markUnknown=no (optional): include this to remove "*" in front of unknown words
Returns the translated document.
$ curl --form 'file=@/path/to/kaz.odt' 'http://localhost:2737/translateDoc?langpair=kaz|tat' > tat.odt
/analyze or /analyse Morphologically analyze text
  • lang: language to use for analysis
  • q: text to analyze
The returned JS Array contains JS Arrays in the format [analysis, input-text].
$ curl -G --data "lang=kaz&q=Сен+бардың+ба?" http://localhost:2737/analyze
[["Сен/сен<v><tv><imp><p2><sg>/сен<prn><pers><p2><sg><nom>","Сен "], ["бардың ба/бар<adj><subst><gen>+ма<qst>/бар<v><iv><ifi><p2><sg>+ма<qst>","бардың ба"], ["?/?<sent>","?"]]
/generate Generate surface forms from text
  • lang: language to use for generation
  • q: text to generate
The returned JS Array contains JS Arrays in the format [generated, input-text].
$ curl -G --data "lang=kaz&q=^сен<v><tv><imp><p2><sg>$" http://localhost:2737/generate
[["сен","^сен<v><tv><imp><p2><sg>$ "]]
/perWord Perform morphological tasks per word
  • lang: language to use for tasks
  • modes: morphological tasks to perform on text (15 combinations possible - delimit using '+')
    • tagger/disambig
    • biltrans
    • translate
    • morph
  • q: text to perform tasks on
The returned JS Array contains JS Objects each containing the key input and up to 4 other keys corresponding to the requested modes (tagger, morph, biltrans and translate).
curl 'http://localhost:2737/perWord?lang=en-es&modes=morph&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger&q=let+there+be+light'
[{"input": "let", "tagger": "let<vblex><pp>"}, {"input": "there", "tagger": "there<adv>"}, {"input": "be", "tagger": "be<vbser><inf>"}, {"input": "light", "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+tagger&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=translate&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=biltrans&q=let+there+be+light'
[{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+biltrans&q=let+there+be+light'
[{"input": "let", "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=tagger+translate&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=translate+biltrans+tagger&q=let+there+be+light'
[{"input": "let", "translate": ["dejar<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "translate": ["all\u00ed<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "translate": ["ser<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "translate": ["ligero<adj>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+biltrans+tagger&q=let+there+be+light'
[{"input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+tagger&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "tagger": "light<adj><sint>"}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"]}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"]}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"]}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"]}]

curl 'http://localhost:2737/perWord?lang=en-es&modes=morph+translate+biltrans+tagger&q=let+there+be+light'
[{"translate": ["dejar<vblex><pp>"], "input": "let", "morph": ["let<vblex><inf>", "let<vblex><pres>", "let<vblex><past>", "let<vblex><pp>"], "biltrans": ["dejar<vblex><inf>", "dejar<vblex><pres>", "dejar<vblex><past>", "dejar<vblex><pp>"], "tagger": "let<vblex><pp>"}, {"translate": ["all\u00ed<adv>"], "input": "there", "morph": ["there<adv>"], "biltrans": ["all\u00ed<adv>"], "tagger": "there<adv>"}, {"translate": ["ser<vbser><inf>"], "input": "be", "morph": ["be<vbser><inf>"], "biltrans": ["ser<vbser><inf>"], "tagger": "be<vbser><inf>"}, {"translate": ["ligero<adj>"], "input": "light", "morph": ["light<n><sg>", "light<adj><sint>", "light<vblex><inf>", "light<vblex><pres>"], "biltrans": ["luz<n><f><sg>", "ligero<adj>", "encender<vblex><inf>", "encender<vblex><pres>"], "tagger": "light<adj><sint>"}]

/listLanguageNames Get localized language names
  • locale: language to get localized language names in
  • languages: list of '+' delimited language codes to retrieve localized names for (optional - if not specified, all available codes will be returned)
The returned JS Object contains a mapping of requested language codes to localized language names
$ curl 'http://localhost:2737/listLanguageNames?locale=fr&languages=ca+en+mk+tat+kk'
{"ca": "catalan", "en": "anglais", "kk": "kazakh", "mk": "macédonien", "tat": "tatar"}
/calcCoverage Get coverage of a language on a text
  • lang: language to analyze with
  • q: text to analyze for coverage
The returned JS Array contains a single floating point value ≤ 1 that indicates the coverage.
$ curl 'http://localhost:2737/getCoverage?lang=en-es&q=Whereas disregard and contempt for which have outraged the conscience of mankind'
[0.9230769230769231]
/identifyLang Return a list of languages with probabilities of the text being in that language. Uses CLD2 if that's installed, otherwise will try any analyser modes.
  • q: text which you would like to compute probabilities for
The returned JS Object contains a mapping from language codes to probabilities.
$ curl 'http://localhost:2737/identifyLang?q=This+is+a+piece+of+text.'
{"ca": 0.19384234, "en": 0.98792465234, "kk": 0.293442432, "zh": 0.002931001}
/stats Return some statistics about pair usage, uptime, portion of time spent actively translating
  • requests=N (optional): limit period-based stats to last N requests
Note that period-based stats are limited to 3600 seconds by default (see -T argument to servlet.py)
$ curl -Ss localhost:2737/stats|jq .responseData
{
  "holdingPipes": 0,
  "periodStats": {
    "totTimeSpent": 10.760803,
    "ageFirstRequest": 19.609394,
    "totChars": 2718,
    "requests": 8,
    "charsPerSec": 252.58
  },
  "runningPipes": {
    "eng-spa": 1
  },
  "useCount": {
    "eng-spa": 8
  },
  "uptime": 26
}
/spellCheck Handles spell-checking requests using Voikko or Divvun spell checkers.
  • q: The text to be spell-checked (String, Required, e.g., `қазақша билмеймін`)
  • lang: The language of the text (String, Required, e.g., `kaz`)
  • spellchecker: The spell checker to use (String, Optional, Defaults to `voikko`, e.g., `divvun`)
The output is a JSON array where each element represents a token from the input text. Each token includes the following information:
$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz'
[
  {"token": "қазақша", "known": true, "sugg": []},
  {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейміз"]}
]

$ curl 'http://localhost:2737/spellCheck?q=қазақша билмеймін&lang=kaz&spellchecker=divvun'
[
  {"token": "қазақша", "known": true, "sugg": []},
  {"token": "билмеймін", "known": false, "sugg": ["білмеймін", "билеймін", "билемеймін", "бөлмеймін", "білмейтін", "білмейін", "білмейміз", "иілмеймін", "тілмеймін", "ілмеймін"]}
]

SSL

APY supports HTTPS out of the box. To test with a self-signed signature, create a certificate and key by running:

openssl req -new -x509 -keyout server.key -out server.crt -days 365 -nodes

Then run APY with --ssl-key server.key --ssl-cert server.crt, and test with HTTPS and the -k argument to curl (-k means curl accepts self-signed or even slightly "lying" signatures):

curl -k -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://localhost:2737/analyze

If you have a real signed certificate, you should be able to use curl without -k for the domain which the certificate is signed for:

curl -G --data "mode=kaz-tat&q=Сен+бардың+ба?" https://oohlookatmeimencrypted.com:2737/analyze

Remember to open port 2737 to your server.

Gateway

A gateway for APY is located in the same directory and provides functionality such as silently intercepting and forwarding requests, and aggregating APY instance capabilities for overriding /list requests. For example, a gateway provided access to two servers with varied capabilities, in terms of language pairs, will report aggregated capabilities to the client, hiding the existence of two servers.

A list of APY servers is a required positional argument; an example server list is provided in the same directory. If the gateway is requested to run on a already occupied port, it will attempt to traverse the available ports until it can bind on to a free one.

The gateway currently operates on a Fastest paradigm load balancer that continuously adapts to changing circumstances by basing its routing on the client's requests. On initialization, all servers are assigned a weight of 0 and consequently, each server will be eventually utilized as the gateway determines the server speeds. The gateway stores a moving average of the last x requests for each (mode, language) and forwards requests to the fastest server as measured in units of response time per response length.

Running on init

Systemd

See Apy/Debian for the quickstart.

Running as a --user unit

If you want to be able to start and stop apy as a non-root user, you'll first have to get your administrator to run some commands. Say your user is named "tussenvoegsel", the admin will have to do:

sudo apt-get install dbus libpam-systemd  # or dnf on Fedora etc.
sudo loginctl enable-linger tussenvoegsel

To read the logs without sudo, admin will also have to enable persistent logs (see below).


Then as your "tussenvoegsel" user, do

mkdir -p ~/.config/systemd/user/
git clone https://github.com/apertium/apertium-apy
cp ~/apertium-apy/tools/systemd/apy.service ~/.config/systemd/user/

Now edit .config/systemd/user/apy.service and remove PrivateTmp, set the User to "tussenvoegsel" (or whatever it is) and WorkingDirectory/ExecStart paths to /home/tussenvoegsel/apertium-apy.

Here's a full example apy.service file:

$ cat ~/.config/systemd/user/apy.service 
[Unit]
Description=Translation server and API for Apertium
Documentation=http://wiki.apertium.org/wiki/Apertium-apy
After=network.target
[Service]
WorkingDirectory=/home/tussenvoegsel/apertium-apy
ExecStart=/usr/bin/python3 /home/tussenvoegsel/apertium-apy/servlet.py /usr/share/apertium/modes
Restart=always
WatchdogSec=10s
[Install]
WantedBy=multi-user.target


You should now be able to do:

systemctl --user daemon-reload   # re-read the edited apy.service file
systemctl --user start apy       # start apy immediately
systemctl --user stop apy        # stop apy immediately
systemctl --user enable apy      # make apy start after next reboot
systemctl --user status apy      # check if apy is running
journalctl -f --user-unit apy    # follow the apy logs
journalctl -n100 --user-unit apy # show last 100 lines of apy logs
curl 'localhost:2737/listPairs'  # show installed pairs
curl 'localhost:2737/translate?q=ja+nu&langpair=sme|nob' # translate some words

Persistent logs

By default, logs are not persistent across reboots nor readable without sudo. The below commands fix this:

sudo mkdir /var/log/journal
sudo systemctl restart systemd-journald

Upstart

You can use upstart scripts to automatically run the apy and html-tools on startup and respawn the processes when they get killed. If you don't have upstart installed: sudo apt-get install upstart

The apertiumconfig file contains paths of some apertium directories and the serverlist file. It can be saved anywhere. Make sure the paths are correct!

/home/user/apertiumconfig

APERTIUMPATH=/home/user
APYPATH=/home/user/apertium-apy
SERVERLIST=/home/user/serverlist
HTMLTOOLSPATH=/home/user/apertium-html-tools
#optional, see 'Logging':
LOGFILE=/home/user/apertiumlog  

The following upstart scripts have to be saved in /etc/init.

apertium-all.conf

description "start/stop all apertium services"
     
start on startup

apertium-apy.conf

description "apertium-apy init script"

start on starting apertium-all
stop on stopped apertium-all
respawn
respawn limit 50 300

env CONFIG=/etc/default/apertium

script
    . $CONFIG
    python3 $APYPATH/servlet.py $APERTIUMPATH
end script

apertium-apy-gateway.conf

description "apertium-apy gateway init script"
     
start on starting apertium-all
stop on stopped apertium-all
respawn
respawn limit 50 300
     
env CONFIG=/home/user/apertiumconfig

script
    . $CONFIG
    python3 $APYPATH/gateway.py $SERVERLIST
end script 

apertium-html-tools.conf

description "apertium-html-tools init script"
           
start on starting apertium-all
stop on stopped apertium-all
respawn
respawn limit 50 300
     
env CONFIG=/etc/default/apertium

script
    . $CONFIG
    cd $HTMLTOOLSPATH
    python3 -m http.server 8888
end script

Use sudo start apertium-all to start all services. Just like the filenames, the jobs are called apertium-apy, apertium-apy-gateway and apertium-html-tools.

The jobs can be independently started by: sudo start JOB

You can stop them by using sudo stop JOB

Restart: sudo restart JOB

View the status and PID: sudo status JOB

Logging

The log files of the processes can be found in the /var/log/upstart/ folder.

The starting/stopping of the jobs can be logged by appending this to the end of apertium-apy.conf, apertium-apy-gateway.conf and apertium-html-tools.conf files.

pre-start script
	. $CONFIG
	touch $LOGFILE
	echo "`date` $UPSTART_JOB started" >> $LOGFILE	
end script

post-stop script
	. $CONFIG
	touch $LOGFILE
	echo "`date` $UPSTART_JOB stoppped" >> $LOGFILE	
end script

TODO

  • hfst-proc -g and lrx-proc don't work with null-flushing, see https://sourceforge.net/p/hfst/bugs/240/ and https://sourceforge.net/p/apertium/tickets/45/
  • translation cache
  • variants like ca_valencia, oc_aran and pt_BR look odd on the web page?
  • gateway: we need a way to have a second server running only the most popular language pairs, and a gateway that sends requests to whichever server has the requested pair. Simply doing -j2 is not a good solution, since we'd waste a lot of RAM on keeping open pipelines that are rarely used. (Or we could turn off pipelines after not being used for a while …)

Troubleshooting

CRITICAL:root:apy.py APy needs a UTF-8 locale, please set …

Do

 export LC_ALL=C.UTF-8

and put that line in your ~/.bashrc

See also Installation_troubleshooting#Warning:_unsupported_locale.2C_fallback_to_.22C.22.22.

listen tcp 0.0.0.0:2737: bind: address already in use

Probably apy is already running, or some other program is holding the port open.

See what programs are using port 2737 with

lsof -i :2737

or

netstat -pna | grep 2737

If you're using docker, you may have to sudo those commands (lsof and netstat don't write anything, so that Should Be Safe™)

forking problems on systemd 228

If you get errors like

   HTTPServerRequest(protocol='http', host='apy.projectjj.com', method='GET', uri='/translate?langpair=nob%7Cnno&q=ikke%0A%0A&callback=_jqjsp&_146183949405=', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Connection': 'Keep-Alive', 'Cookie': '_pk_ref.1.9697=%5B%22%2C%22%22%2C146183942%2C%22https%3A%2F%2Fwww.google.no%2F%22%5D; _pk_id.1.9697=96baa844663e946.1441366937.7.146839495.1461839482.; _pk_ses.1.9697=*', 'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate, sdch', 'X-Forwarded-Server': 'www.apertium.org, apy.projectjj.com', 'X-Forwarded-For': '152.93.00.00, 193.145.00.00', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.0 Safari/537.36', 'Accept-Language': 'nb-NO,nb;q=0.8,no;q=0.6,nn;q=0.4,en-US;q=0.2,en;q=0.2', 'Host': 'apy.projectjj.com', 'Referer': 'https://www.apertium.org/index.nob.html?dir=nob-nno', 'X-Forwarded-Host': 'www.apertium.org, apy.projectjj.com'})
    Traceback (most recent call last):
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/web.py", line 1415, in _execute
        result = yield result
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
        yielded = self.gen.throw(*exc_info)
      File "/home/apertium/apertium-apy/servlet.py", line 389, in get
        self.get_argument('markUnknown', default='yes'))
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
        yielded = self.gen.throw(*exc_info)
      File "/home/apertium/apertium-apy/servlet.py", line 369, in translateAndRespond
        translated = yield pipeline.translate(toTranslate, nosplit)
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 876, in run
        yielded = self.gen.throw(*exc_info)
      File "/home/apertium/apertium-apy/translation.py", line 69, in translate
        parts = yield [translateNULFlush(part, self) for part in all_split]
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 870, in run
        value = future.result()
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 656, in callback
        result_list.append(f.result())
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 215, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 3, in raise_exc_info
      File "/home/apertium/.local/lib/python3.4/site-packages/tornado/gen.py", line 879, in run
        yielded = self.gen.send(value)
      File "/home/apertium/apertium-apy/translation.py", line 214, in translateNULFlush
        proc_deformat = Popen("apertium-deshtml", stdin=PIPE, stdout=PIPE)
      File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
        restore_signals, start_new_session)
      File "/usr/lib/python3.5/subprocess.py", line 1480, in _execute_child
        restore_signals, start_new_session, preexec_fn)
    BlockingIOError: [Errno 11] Resource temporarily unavailable

on systems with systemd>=228 and linux>=4.3, then it's likely you're bumping the TaskMax systemd attribute which puts a limit of 512 tasks per cgroup(?) or 4096 per user (similar to ulimit task limits). See http://unix.stackexchange.com/questions/253903/creating-threads-fails-with-resource-temporarily-unavailable-with-4-3-kernel/255603#255603 for info; basically you want to change the DefaultTasksMax or UserTasksMax settings.

logging errors

If you encounter errors involving enable_pretty_logging() while starting APY, comment out the line with a leading # to solve the issue.

What was the error? This should be possible to fix / work around.

High IO usage

If you are logging unknowns (-f / --missing-freqs), you should probably also give some value to -M (e.g. -M1000), otherwise you might get a lot of disk usage on that sqlite file.

'return' with argument inside generator on python 3.2 or older

Traceback (most recent call last):   
File "./servlet.py", line 25, in <module>     import translation   
File "translation.py", line 132
     return proc_reformat.communicate()[0].decode('utf-8') 
SyntaxError: 'return' with argument inside generator

Solution: upgrade to Python 3.3 or newer.

Docs

  • /Translation
  • /Debian – quickstart installation guide for running your very own APY server on Debian, Ubuntu etc.
  • /Fedora – quickstart installation guide for running your very own APY server on Fedora

Please cite