User:Kiara

Kiara's page

Suggestion task:

Notes

1. How to work with APY from the command line: http://wiki.apertium.org/wiki/APY#Usage

2. How to launch Suggestions from the command line https://github.com/goavki/apertium-html-tools/pull/35

Suggestion docs:

This is for the apy page

use ./servlet.py /usr/local/share/apertium/ --wiki-username=WikiUsername --wiki-password=WikiPassword -rs=YourRecaptchaSecret to run apy in google reCaptcha mode

-b --bypass-token: testing token is generated to bypass recaptcha

URL	Function	Parameters	Output
/suggest	Generate a suggestion on target wiki-page using a testing token.	context: sentence word: word that will be sugested newWord: suggestion langpair: language pair to use for translation g-recaptcha-response: testing token generated when running apy (note that only testing token can be used with curl)	Returns the status. If "Success", the suggestion is posted on the target wiki-page. Note that the correct wiki-page url is required (wiki_util.py) For production usage of Google reCaptcha the registration is required (https://developers.google.com/recaptcha/). Note that correct keys are required when starting apy and in the html-tools config file. curl --data 'context=otro+mundo&word=*mundo&newWord=MUNDO&langpair=esp\|eng&g-recaptcha-response=testingToken' http://localhost:2737/suggest {"responseStatus": 200, "responseData": {"status": "Success"}, "responseDetails": null}

This is for the html-tools page:

ENABLED: turns on the suggestion mode (True/False)
RECAPTCHA_SITE_KEY: recaptcha site key which can be obtained by registration at https://developers.google.com/recaptcha/
CONTEXT_WRAP: a number of context words from the left

Speller backlog:

1. Localize 'Any ideas?' fixed and question

2. Punctuation fixed

3. Documentation

4. Button glitch fixed

5. Hovering over a misspelled word highlights it in black, with a second underline. fixed

6. After a word has been updated, it stays red, even though the underline disappears fixed

7. An error message for missed -translate mode fixed and question

Language detection

Apertium code	langdetect code	Language
af	af	Afrikaans
ara	ar	Arabic
an	N/A	Aragonese
ast	N/A	Asturian
bg	bg	Bulgarian
	bn	Bengali
br	N/A	Breton
ca	ca	Catalan
	cs	Czech
cy	cy	Welsh
dan	da	Danish
	de	German
	el	Greek
en	en	English
eo	N/A	Esperanto
es	es	Spanish
	et	Estonian
eu	N/A	Basque
	fa	Persian
	fi	Finnish
fra	fr	French
gl	N/A	Galician
	gu	Gujarati
	he	Hebrew
hin	hi	Hindi
	hr	Croatian
	hu	Hungarian
id	id	Indonesian
is	N/A	Icelandic
it	it	Italian
	ja	Japanese
kaz	N/A (kk)	Kazakh
	kn	Kannada
	ko	Korean
	lt	Lithuanian
	lv	Latvian
mk	mk	Macedonian
	ml	Malayalam
	mr	Marathi (Marāṭhī)
ms	N/A	Malaysian
mt	N/A	Maltese
nob	N/A (nb)	Bokmål
	ne	Nepali
nl	nl	Dutch
nno	N/A (nn)	Norwegian Nynorsk
nor	no	Norwegian
oc	N/A	Occitan
	pa	Panjabi
	pl	Polish
pt	pt	Portuguese
ro	ro	Romanian
	ru	Russian
hbs	N/A (sh)	Serbo-Croatian
sme	N/A (se)	Northern Sami
	sk	Slovak
slv	sl	Slovenian
	so	Somali
	sq	Albanian
swe (sv)	sv	Swedish
	sw	Swahili
	ta	Tamil
	te	Telugu
	th	Thai
	tl	Tagalog
	tr	Turkish
tat	N/A (tt)	Tatar
	uk	Ukrainian
urd	ur	Urdu
	vi	Vietnamese
N/A	zh-cn	Chinese (Simplified and using Mainland Chinese terms)
N/A	zh-tw	Chinese (Traditional and using Taiwanese terms)

How to train a new language model:

1. Install Langdetect library (https://github.com/Mimino666/langdetect).

$ pip install langdetect

Supported Python versions 2.6, 2.7, 3.x.

2. Prepare the training data.

For instant, using Wikipedia dumps (http://wiki.apertium.org/wiki/Wikipedia_Extractor)

3. Train the model (https://github.com/Mimino666/langdetect#how-to-add-new-language)

You need to create a new language profile. The easiest way to do it is to use the langdetect.jar tool, which can generate language profiles from Wikipedia abstract database files or plain text.

Wikipedia abstract database files can be retrieved from "Wikipedia Downloads" (http://download.wikimedia.org/). They form '(language code)wiki-(version)-abstract.xml' (e.g. 'enwiki-20101004-abstract.xml' ).

usage: java -jar langdetect.jar --genprofile -d [directory path] [language codes]

Specify the directory which has abstract databases by -d option.
This tool can handle gzip compressed file.

Remark: The database filename in Chinese is like 'zhwiki-(version)-abstract-zh-cn.xml' or zhwiki-(version)-abstract-zh-tw.xml', so that it must be modified 'zh-cnwiki-(version)-abstract.xml' or 'zh-twwiki-(version)-abstract.xml'.

To generate language profile from a plain text, use the genprofile-text command.

usage: java -jar langdetect.jar --genprofile-text -l [language code] [text file path]

For more details see language-detection Wiki: https://code.google.com/archive/p/language-detection/wikis/Tools.wiki.

4. Locate the folder where Langdetect is installed

5. Copy the new language model to the Profiles folder

 cp [options] /usr/local/lib/python3.4/dist-packages/langdetect/profiles/

6. Test the installed models:

from langdetect import detector_factory 

detector_factory.init_factory()

print(detector_factory._factory.langlist)

(nno-nob): now, → azazaz (I must go now, thank you very much eee)
(nno-nob): you → ccc (I must go now, thank you very much eee)
(eng-spa): eee → 123333 (que ir ahora, muchas gracias eee)
(eng-spa): fff → 123567899 (fff Rana)
(eng-spa): fff → zaxscdfvb (fff Rana)
(eng-spa): fff → ;lkjhgfds (fff Rana)
(kaz-tat): Харғыл → Һаргыл (Харғыл барғыл)
(spa-eng): multidisciplinario → multidisciplinary (State multidisciplinario)
(eng-spa): fff → 123345678 (fff Rana)
(eng-spa): jabberwock → . (Como

jabberwock)

(eng-spa): jabberwock → diaverboca (Como

jabberwock)

(eng-spa): gg → qazz234234 (fff Rana £@¡)
(eng-spa): eee → wertyuiojlugiut (ahora, muchas gracias eee /!£$ £@¡)
(eng-spa): eee → [z[z[z[z[z[z[z (ahora, muchas gracias eee /!£$ £@¡)
(spa-eng): zxzx → новое слово (I am not a zxzx.)
(spa-eng): zaxaxa → это новое слово (am not a zxzx zaxaxa)
(spa-eng): cat → dgdggg ttt (rad cat)
(spa-eng): cat → gato rojo (rad cat)
(spa-eng): rad → gato rojo (rad cat)
(spa-eng): cat → gatooo (rad cat)
(spa-eng): cat → qxdcty666 (rad cat)
(spa-eng): memem → thisIsTest (Dodo foo foxie car carum memem goo cat-cat vfrtyty poipoi koo)
(spa-eng): rnh → wdqbbyth (sader gvbfj cacafe rnh gdejb)
(spa-eng_US): rad → хует (rad cat tat)
(spa-eng): dva → zvn,kloi]] (ras dva tri)

User:Kiara

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools