Difference between revisions of "User:Kiara"
(148 intermediate revisions by one other user not shown) | |||
Line 9: | Line 9: | ||
1. How to work with APY from the command line: http://wiki.apertium.org/wiki/APY#Usage |
1. How to work with APY from the command line: http://wiki.apertium.org/wiki/APY#Usage |
||
2. How to launch Suggestions from the command line https://github.com/ |
2. How to launch Suggestions from the command line https://github.com/apertium/apertium-html-tools/pull/35 |
||
We should add a description here http://wiki.apertium.org/wiki/Apertium-apy |
|||
Suggestion docs: |
|||
This is for the apy page |
|||
------------------------------------------------------ |
|||
Plan: |
|||
use ./servlet.py /usr/local/share/apertium/ --wiki-username=WikiUsername --wiki-password=WikiPassword -rs=YourRecaptchaSecret to run apy in google reCaptcha mode |
|||
1. Initial tests (a command line tests, UI tests) |
|||
*-b --bypass-token: testing token is generated to bypass recaptcha |
|||
2. Try to merge the feature locally |
|||
------------------------------------------------------ |
|||
3. UI refinements |
|||
{| class="wikitable" border="1" |
|||
4. Tests for apy endpoints (Ask Svineet) |
|||
|- |
|||
! URL |
|||
! Function |
|||
! Parameters |
|||
! Output |
|||
|- |
|||
| '''/suggest''' |
|||
| Generate a suggestion on target wiki-page using a testing token. |
|||
| |
|||
*'''context''': sentence |
|||
*'''word''': word that will be sugested |
|||
*'''newWord''': suggestion |
|||
*'''langpair''': language pair to use for translation |
|||
*'''g-recaptcha-response''': testing token generated when running apy (note that only testing token can be used with curl) |
|||
|Returns the status. If "Success", the suggestion is posted on the target wiki-page. |
|||
Note that the correct wiki-page url is required (wiki_util.py) |
|||
5. |
|||
For production usage of Google reCaptcha the registration is required (https://developers.google.com/recaptcha/). |
|||
* {{suggest|spa|eng|as but your *euwere because I saw|*euwere|!!!}} |
|||
* {{suggest|spa|eng|if we go *weee *ytre|*weee|4444}} |
|||
Note that correct keys are required when starting apy and in the html-tools config file. |
|||
* {{suggest|spa|eng|if we go *weee *ytre|*weee|23432234234}} |
|||
<pre> |
|||
* {{suggest|spa|eng|we go *weee *ytre|*ytre|-----8}} |
|||
curl --data 'context=otro+mundo&word=*mundo&newWord=MUNDO&langpair=esp|eng&g-recaptcha-response=testingToken' http://localhost:2737/suggest |
|||
* {{suggest|spa|eng|World *tggg your|*jogar|ww}} |
|||
{"responseStatus": 200, "responseData": {"status": "Success"}, "responseDetails": null} |
|||
* {{suggest|spa|eng|your and Miguel *jogar|*jogar|111}} |
|||
</pre> |
|||
* {{suggest|spa|eng|*fff *ttt|*fff|ddd}} |
|||
|- |
|||
* {{suggest|spa|eng|*fff *ttt|*fff|ddfgdgfgd}} |
|||
|} |
|||
* {{suggest|spa|eng|*ttt *eee|*ttt|rrr}} |
|||
* {{suggest|spa|eng|*ttt *eee|*eee|qazxsdf}} |
|||
This is for the html-tools page: |
|||
* {{suggest|spa|eng|*eee *rr|*eee|rrrrr}} |
|||
* {{suggest|spa|eng|*eee *rr|*eee|qqq}} |
|||
*ENABLED: turns on the suggestion mode (True/False) |
|||
* {{suggest|spa|eng|*eee *rr|*rr|989789789}} |
|||
*RECAPTCHA_SITE_KEY: recaptcha site key which can be obtained by registration at https://developers.google.com/recaptcha/ |
|||
* {{suggest|spa|eng|*eee *rr|*eee|iuytrew}} |
|||
*CONTEXT_WRAP: a number of context words from the left |
|||
* {{suggest|spa|eng|*eee *rr|*eee|dddd}} |
|||
* {{suggest|spa|eng|*eee *rr|*eee|q111}} |
|||
----------------------------------------------------- |
|||
* {{suggest|spa|eng|*eee *rr|*eee|qqqq}} |
|||
* {{suggest|spa|eng|*eee *rr|*eee|fdgedsdf}} |
|||
* {{suggest|spa|eng|*eee *rr|*eee|kjhgfd}} |
|||
Speller backlog: |
|||
* {{suggest|spa|eng|*eee *rr|*eee|qqqqqqlkjhg}} |
|||
* {{suggest|spa|eng|*eee *rr|*eee|mnfds}} |
|||
1. Localize 'Any ideas?' <b>fixed and question</b> |
|||
* {{suggest|spa|eng|*ikjuedw *rytdfg|*ikjuedw|rr4ree}} |
|||
* {{suggest|spa|eng|*ikjuedw *rytdfg|*ikjuedw|fdd}} |
|||
2. Punctuation <b>fixed</b> |
|||
* {{suggest|spa|eng|*qaz *bvrr|*bvrr|qqqqyhtgf}} |
|||
* {{suggest|spa|eng|*qaz *bvrr|*bvrr|ssss}} |
|||
3. Documentation |
|||
* {{suggest|spa|eng|*qaz *bvrr|*bvrr|jjjj}} |
|||
* {{suggest|spa|eng|*qaz *bvrr|*qaz|errfrfrf}} |
|||
4. Button glitch <b>fixed</b> |
|||
* {{suggest|spa|eng|*qaz *bvrr|*qaz|erfddb}} |
|||
* {{suggest|spa|eng|*qaz *bvrr|*bvrr|nbvcdsertgfd}} |
|||
5. Hovering over a misspelled word highlights it in black, with a second underline. <b>fixed</b> |
|||
* {{suggest|spa|eng|*qaz *bvrr|*bvrr|fffff}} |
|||
* {{suggest|spa|eng|*qaz *bvrr|*bvrr|gdgdgdrgse}} |
|||
6. After a word has been updated, it stays red, even though the underline disappears <b>fixed</b> |
|||
7. An error message for missed -translate mode <b>fixed and question</b> |
|||
----------------------------------------------------- |
|||
<b>Language detection</b> |
|||
{| class="wikitable" border="1" |
|||
|- |
|||
! Apertium code |
|||
! langdetect code |
|||
! Language |
|||
|- |
|||
| af |
|||
| af |
|||
| Afrikaans |
|||
|- |
|||
| ara |
|||
| ar |
|||
| Arabic |
|||
|- |
|||
| an |
|||
| N/A |
|||
| Aragonese |
|||
|- |
|||
| ast |
|||
| N/A |
|||
| Asturian |
|||
|- |
|||
| bg |
|||
| bg |
|||
| Bulgarian |
|||
|- |
|||
| |
|||
| bn |
|||
| Bengali |
|||
|- |
|||
| br |
|||
| N/A |
|||
| Breton |
|||
|- |
|||
| ca |
|||
| ca |
|||
| Catalan |
|||
|- |
|||
| |
|||
| cs |
|||
| Czech |
|||
|- |
|||
| cy |
|||
| cy |
|||
| Welsh |
|||
|- |
|||
| dan |
|||
| da |
|||
| Danish |
|||
|- |
|||
| |
|||
| de |
|||
| German |
|||
|- |
|||
| |
|||
| el |
|||
| Greek |
|||
|- |
|||
| en |
|||
| en |
|||
| English |
|||
|- |
|||
| eo |
|||
| N/A |
|||
| Esperanto |
|||
|- |
|||
| es |
|||
| es |
|||
| Spanish |
|||
|- |
|||
| |
|||
| et |
|||
| Estonian |
|||
|- |
|||
| eu |
|||
| N/A |
|||
| Basque |
|||
|- |
|||
| |
|||
| fa |
|||
| Persian |
|||
|- |
|||
| |
|||
| fi |
|||
| Finnish |
|||
|- |
|||
| fra |
|||
| fr |
|||
| French |
|||
|- |
|||
| gl |
|||
| N/A |
|||
| Galician |
|||
|- |
|||
| |
|||
| gu |
|||
| Gujarati |
|||
|- |
|||
| |
|||
| he |
|||
| Hebrew |
|||
|- |
|||
| hin |
|||
| hi |
|||
| Hindi |
|||
|- |
|||
| |
|||
| hr |
|||
| Croatian |
|||
|- |
|||
| |
|||
| hu |
|||
| Hungarian |
|||
|- |
|||
| id |
|||
| id |
|||
| Indonesian |
|||
|- |
|||
| is |
|||
| N/A |
|||
| Icelandic |
|||
|- |
|||
| it |
|||
| it |
|||
| Italian |
|||
|- |
|||
| |
|||
| ja |
|||
| Japanese |
|||
|- |
|||
| kaz |
|||
| N/A (kk) |
|||
| Kazakh |
|||
|- |
|||
| |
|||
| kn |
|||
| Kannada |
|||
|- |
|||
| |
|||
| ko |
|||
| Korean |
|||
|- |
|||
| |
|||
| lt |
|||
| Lithuanian |
|||
|- |
|||
| |
|||
| lv |
|||
| Latvian |
|||
|- |
|||
| mk |
|||
| mk |
|||
| Macedonian |
|||
|- |
|||
| |
|||
| ml |
|||
| Malayalam |
|||
|- |
|||
| |
|||
| mr |
|||
| Marathi (Marāṭhī) |
|||
|- |
|||
| ms |
|||
| N/A |
|||
| Malaysian |
|||
|- |
|||
| mt |
|||
| N/A |
|||
| Maltese |
|||
|- |
|||
| nob |
|||
| N/A (nb) |
|||
| Bokmål |
|||
|- |
|||
| |
|||
| ne |
|||
| Nepali |
|||
|- |
|||
| nl |
|||
| nl |
|||
| Dutch |
|||
|- |
|||
| nno |
|||
| N/A (nn) |
|||
| Norwegian Nynorsk |
|||
|- |
|||
| nor |
|||
| no |
|||
| Norwegian |
|||
|- |
|||
| oc |
|||
| N/A |
|||
| Occitan |
|||
|- |
|||
| |
|||
| pa |
|||
| Panjabi |
|||
|- |
|||
| |
|||
| pl |
|||
| Polish |
|||
|- |
|||
| pt |
|||
| pt |
|||
| Portuguese |
|||
|- |
|||
| ro |
|||
| ro |
|||
| Romanian |
|||
|- |
|||
| |
|||
| ru |
|||
| Russian |
|||
|- |
|||
| hbs |
|||
| N/A (sh) |
|||
| Serbo-Croatian |
|||
|- |
|||
| sme |
|||
| N/A (se) |
|||
| Northern Sami |
|||
|- |
|||
| |
|||
| sk |
|||
| Slovak |
|||
|- |
|||
| slv |
|||
| sl |
|||
| Slovenian |
|||
|- |
|||
| |
|||
| so |
|||
| Somali |
|||
|- |
|||
| |
|||
| sq |
|||
| Albanian |
|||
|- |
|||
| swe (sv) |
|||
| sv |
|||
| Swedish |
|||
|- |
|||
| |
|||
| sw |
|||
| Swahili |
|||
|- |
|||
| |
|||
| ta |
|||
| Tamil |
|||
|- |
|||
| |
|||
| te |
|||
| Telugu |
|||
|- |
|||
| |
|||
| th |
|||
| Thai |
|||
|- |
|||
| |
|||
| tl |
|||
| Tagalog |
|||
|- |
|||
| |
|||
| tr |
|||
| Turkish |
|||
|- |
|||
| tat |
|||
| N/A (tt) |
|||
| Tatar |
|||
|- |
|||
| |
|||
| uk |
|||
| Ukrainian |
|||
|- |
|||
| urd |
|||
| ur |
|||
| Urdu |
|||
|- |
|||
| |
|||
| vi |
|||
| Vietnamese |
|||
|- |
|||
| N/A |
|||
| zh-cn |
|||
| Chinese (Simplified and using Mainland Chinese terms) |
|||
|- |
|||
| N/A |
|||
| zh-tw |
|||
| Chinese (Traditional and using Taiwanese terms) |
|||
|} |
|||
<b>How to train a new language model:</b> |
|||
1. Install Langdetect library (https://github.com/Mimino666/langdetect). |
|||
<code><pre> |
|||
$ pip install langdetect |
|||
</pre></code> |
|||
Supported Python versions 2.6, 2.7, 3.x. |
|||
2. Prepare the training data. |
|||
For instant, using Wikipedia dumps (http://wiki.apertium.org/wiki/Wikipedia_Extractor) |
|||
3. Train the model (https://github.com/Mimino666/langdetect#how-to-add-new-language) |
|||
You need to create a new language profile. The easiest way to do it is to use the langdetect.jar tool, which can generate language profiles from Wikipedia abstract database files or plain text. |
|||
Wikipedia abstract database files can be retrieved from "Wikipedia Downloads" (http://download.wikimedia.org/). They form '(language code)wiki-(version)-abstract.xml' (e.g. 'enwiki-20101004-abstract.xml' ). |
|||
<code><pre> |
|||
usage: java -jar langdetect.jar --genprofile -d [directory path] [language codes] |
|||
</pre></code> |
|||
* Specify the directory which has abstract databases by -d option. |
|||
* This tool can handle gzip compressed file. |
|||
Remark: The database filename in Chinese is like 'zhwiki-(version)-abstract-zh-cn.xml' or zhwiki-(version)-abstract-zh-tw.xml', so that it must be modified 'zh-cnwiki-(version)-abstract.xml' or 'zh-twwiki-(version)-abstract.xml'. |
|||
To generate language profile from a plain text, use the genprofile-text command. |
|||
<code><pre> |
|||
usage: java -jar langdetect.jar --genprofile-text -l [language code] [text file path] |
|||
</pre></code> |
|||
For more details see language-detection Wiki: https://code.google.com/archive/p/language-detection/wikis/Tools.wiki. |
|||
4. Locate the folder where Langdetect is installed |
|||
5. Copy the new language model to the Profiles folder |
|||
<code><pre> |
|||
cp [options] /usr/local/lib/python3.4/dist-packages/langdetect/profiles/ |
|||
</pre></code> |
|||
6. Test the installed models: |
|||
<code> |
|||
<pre>from langdetect import detector_factory |
|||
detector_factory.init_factory() |
|||
print(detector_factory._factory.langlist)</pre> |
|||
</code> |
|||
* {{suggest|nno|nob|now,|azazaz|I must go now, thank you very much eee}} |
|||
* {{suggest|nno|nob|you|ccc|I must go now, thank you very much eee}} |
|||
* {{suggest|eng|spa|eee|123333|que ir ahora, muchas gracias eee}} |
|||
* {{suggest|eng|spa|fff|123567899|fff Rana}} |
|||
* {{suggest|eng|spa|fff|zaxscdfvb|fff Rana}} |
|||
* {{suggest|eng|spa|fff|;lkjhgfds|fff Rana}} |
|||
* {{suggest|kaz|tat|Харғыл|Һаргыл|Харғыл барғыл}} |
|||
* {{suggest|spa|eng|multidisciplinario|multidisciplinary|State multidisciplinario}} |
|||
* {{suggest|eng|spa|fff|123345678|fff Rana}} |
|||
* {{suggest|eng|spa|jabberwock|.|Como |
|||
jabberwock}} |
|||
* {{suggest|eng|spa|jabberwock|diaverboca|Como |
|||
jabberwock}} |
|||
* {{suggest|eng|spa|gg|qazz234234|fff Rana £@¡}} |
|||
* {{suggest|eng|spa|eee|wertyuiojlugiut|ahora, muchas gracias eee /!£$ £@¡}} |
|||
* {{suggest|eng|spa|eee|[z[z[z[z[z[z[z|ahora, muchas gracias eee /!£$ £@¡}} |
|||
* {{suggest|spa|eng|zxzx|новое слово|I am not a zxzx.}} |
|||
* {{suggest|spa|eng|zaxaxa|это новое слово|am not a zxzx zaxaxa}} |
|||
* {{suggest|spa|eng|cat|dgdggg ttt|rad cat}} |
|||
* {{suggest|spa|eng|cat|gato rojo|rad cat}} |
|||
* {{suggest|spa|eng|rad|gato rojo|rad cat}} |
|||
* {{suggest|spa|eng|cat|gatooo|rad cat}} |
|||
* {{suggest|spa|eng|cat|qxdcty666|rad cat}} |
|||
* {{suggest|spa|eng|memem|thisIsTest|Dodo foo foxie car carum memem goo cat-cat vfrtyty poipoi koo}} |
|||
* {{suggest|spa|eng|rnh|wdqbbyth|sader gvbfj cacafe rnh gdejb}} |
|||
* {{suggest|spa|eng_US|rad|хует|rad cat tat}} |
|||
* {{suggest|spa|eng|dva|zvn,kloi]]|ras dva tri}} |
|||
* {{suggest|spa|eng|dva|qaazxxdvcgyjmki,|ras dva tri}} |
|||
* {{suggest|spa|eng|dva|fdfdfdfdfd|ras dva tri}} |
|||
* {{suggest|spa|eng|cat|tgbnjkjlks|rad cat}} |
|||
* {{suggest|spa|eng|rad|12367890|rad cat}} |
|||
* {{suggest|eng|spa|limba|hgyturnkglg|sa limba sarda No est}} |
|||
* {{suggest|eng|spa|Sushain|Sushi|hola, Sushain.}} |
|||
* {{suggest|eng|spa|limba|bimba|sa limba sarda No est los pidarasos}} |
|||
* {{suggest|spa|eng|cfre|13|rad re cfre cdfgr}} |
|||
* {{suggest|spa|eng|cfre|1313|rad re cfre cdfgr}} |
|||
* {{suggest|spa|eng|re|131313|rad re cfre cdfgr}} |
|||
* {{suggest|spa|eng|chet|21315667|odin dva tri chet pt shest sem vosem deviat}} |
|||
* {{suggest|spa|eng|odin|qazzf|shest sem vosem deviat odin dva tri chet pt shest}} |
|||
* {{suggest|spa|eng|shest|shest|odin dva tri chet pt shest sem vosem deviat}} |
|||
* {{suggest|spa|eng|deviat|deviat|odin dva tri chet pt}} |
|||
* {{suggest|eng|spa|vosem|vosem|odin dva tri chet pt shest sem vosem vosem deviat}} |
|||
* {{suggest|eng|spa|vosem|uosem|odin dva tri chet pt shest sem vosem vosem deviat}} |
|||
* {{suggest|eng|spa|shest|666|odin dva tri chet pt shest sem vosem vosem deviat}} |
|||
* {{suggest|spa|eng|pt|5|odin dva tri chet pt}} |
|||
* {{suggest|spa|eng|pt|5|odin dva tri chet pt}} |
|||
* {{suggest|spa|eng|pt|5|odin dva tri chet pt}} |
|||
* {{suggest|spa|eng|shest|666|odin dva tri chet pt shest sem vosem vosem deviat}} |
|||
* {{suggest|spa|eng|vosem|888|tri chet pt shest sem, vosem deviat}} |
|||
* {{suggest|spa|eng|dva|dva2|odin dva tri}} |
|||
* {{suggest|spa|eng|dva|2|odin dva tri}} |
|||
* {{suggest|spa|eng|dva|2222|odin dva tri}} |
|||
* {{suggest|spa|eng|dva|22222|odin dva tri vosem chet pt shest}} |
|||
* {{suggest|spa|eng|dva|222222|odin dva tri vosem chet pt shest}} |
|||
* {{suggest|spa|eng|dva|2222222|odin dva tri vosem chet pt shest}} |
|||
* {{suggest|spa|eng|vosem|8|chet pt shest sem, vosem vosem deviat d}} |
|||
* {{suggest|eng|spa|anyything|anything|ahora. No necesitas para instalar anyything. Si sigues las instrucciones aquí}} |
|||
* {{suggest|eng|spa|anyything|333|ahora. No necesitas para instalar anyything. Si sigues las instrucciones aquí}} |
|||
* {{suggest|eng|spa|anyything|444|ahora. No necesitas para instalar anyything. Si sigues las instrucciones aquí}} |
Latest revision as of 01:44, 8 March 2018
Kiara's page
Suggestion task:
Notes
1. How to work with APY from the command line: http://wiki.apertium.org/wiki/APY#Usage
2. How to launch Suggestions from the command line https://github.com/apertium/apertium-html-tools/pull/35
Suggestion docs:
This is for the apy page
use ./servlet.py /usr/local/share/apertium/ --wiki-username=WikiUsername --wiki-password=WikiPassword -rs=YourRecaptchaSecret to run apy in google reCaptcha mode
- -b --bypass-token: testing token is generated to bypass recaptcha
URL | Function | Parameters | Output |
---|---|---|---|
/suggest | Generate a suggestion on target wiki-page using a testing token. |
|
Returns the status. If "Success", the suggestion is posted on the target wiki-page.
Note that the correct wiki-page url is required (wiki_util.py)
Note that correct keys are required when starting apy and in the html-tools config file. curl --data 'context=otro+mundo&word=*mundo&newWord=MUNDO&langpair=esp|eng&g-recaptcha-response=testingToken' http://localhost:2737/suggest {"responseStatus": 200, "responseData": {"status": "Success"}, "responseDetails": null} |
This is for the html-tools page:
- ENABLED: turns on the suggestion mode (True/False)
- RECAPTCHA_SITE_KEY: recaptcha site key which can be obtained by registration at https://developers.google.com/recaptcha/
- CONTEXT_WRAP: a number of context words from the left
Speller backlog:
1. Localize 'Any ideas?' fixed and question
2. Punctuation fixed
3. Documentation
4. Button glitch fixed
5. Hovering over a misspelled word highlights it in black, with a second underline. fixed
6. After a word has been updated, it stays red, even though the underline disappears fixed
7. An error message for missed -translate mode fixed and question
Language detection
Apertium code | langdetect code | Language |
---|---|---|
af | af | Afrikaans |
ara | ar | Arabic |
an | N/A | Aragonese |
ast | N/A | Asturian |
bg | bg | Bulgarian |
bn | Bengali | |
br | N/A | Breton |
ca | ca | Catalan |
cs | Czech | |
cy | cy | Welsh |
dan | da | Danish |
de | German | |
el | Greek | |
en | en | English |
eo | N/A | Esperanto |
es | es | Spanish |
et | Estonian | |
eu | N/A | Basque |
fa | Persian | |
fi | Finnish | |
fra | fr | French |
gl | N/A | Galician |
gu | Gujarati | |
he | Hebrew | |
hin | hi | Hindi |
hr | Croatian | |
hu | Hungarian | |
id | id | Indonesian |
is | N/A | Icelandic |
it | it | Italian |
ja | Japanese | |
kaz | N/A (kk) | Kazakh |
kn | Kannada | |
ko | Korean | |
lt | Lithuanian | |
lv | Latvian | |
mk | mk | Macedonian |
ml | Malayalam | |
mr | Marathi (Marāṭhī) | |
ms | N/A | Malaysian |
mt | N/A | Maltese |
nob | N/A (nb) | Bokmål |
ne | Nepali | |
nl | nl | Dutch |
nno | N/A (nn) | Norwegian Nynorsk |
nor | no | Norwegian |
oc | N/A | Occitan |
pa | Panjabi | |
pl | Polish | |
pt | pt | Portuguese |
ro | ro | Romanian |
ru | Russian | |
hbs | N/A (sh) | Serbo-Croatian |
sme | N/A (se) | Northern Sami |
sk | Slovak | |
slv | sl | Slovenian |
so | Somali | |
sq | Albanian | |
swe (sv) | sv | Swedish |
sw | Swahili | |
ta | Tamil | |
te | Telugu | |
th | Thai | |
tl | Tagalog | |
tr | Turkish | |
tat | N/A (tt) | Tatar |
uk | Ukrainian | |
urd | ur | Urdu |
vi | Vietnamese | |
N/A | zh-cn | Chinese (Simplified and using Mainland Chinese terms) |
N/A | zh-tw | Chinese (Traditional and using Taiwanese terms) |
How to train a new language model:
1. Install Langdetect library (https://github.com/Mimino666/langdetect).
$ pip install langdetect
Supported Python versions 2.6, 2.7, 3.x.
2. Prepare the training data.
For instant, using Wikipedia dumps (http://wiki.apertium.org/wiki/Wikipedia_Extractor)
3. Train the model (https://github.com/Mimino666/langdetect#how-to-add-new-language)
You need to create a new language profile. The easiest way to do it is to use the langdetect.jar tool, which can generate language profiles from Wikipedia abstract database files or plain text.
Wikipedia abstract database files can be retrieved from "Wikipedia Downloads" (http://download.wikimedia.org/). They form '(language code)wiki-(version)-abstract.xml' (e.g. 'enwiki-20101004-abstract.xml' ).
usage: java -jar langdetect.jar --genprofile -d [directory path] [language codes]
- Specify the directory which has abstract databases by -d option.
- This tool can handle gzip compressed file.
Remark: The database filename in Chinese is like 'zhwiki-(version)-abstract-zh-cn.xml' or zhwiki-(version)-abstract-zh-tw.xml', so that it must be modified 'zh-cnwiki-(version)-abstract.xml' or 'zh-twwiki-(version)-abstract.xml'.
To generate language profile from a plain text, use the genprofile-text command.
usage: java -jar langdetect.jar --genprofile-text -l [language code] [text file path]
For more details see language-detection Wiki: https://code.google.com/archive/p/language-detection/wikis/Tools.wiki.
4. Locate the folder where Langdetect is installed
5. Copy the new language model to the Profiles folder
cp [options] /usr/local/lib/python3.4/dist-packages/langdetect/profiles/
6. Test the installed models:
from langdetect import detector_factory
detector_factory.init_factory()
print(detector_factory._factory.langlist)
- (nno-nob): now, → azazaz (I must go now, thank you very much eee)
- (nno-nob): you → ccc (I must go now, thank you very much eee)
- (eng-spa): eee → 123333 (que ir ahora, muchas gracias eee)
- (eng-spa): fff → 123567899 (fff Rana)
- (eng-spa): fff → zaxscdfvb (fff Rana)
- (eng-spa): fff → ;lkjhgfds (fff Rana)
- (kaz-tat): Харғыл → Һаргыл (Харғыл барғыл)
- (spa-eng): multidisciplinario → multidisciplinary (State multidisciplinario)
- (eng-spa): fff → 123345678 (fff Rana)
- (eng-spa): jabberwock → . (Como
jabberwock)
- (eng-spa): jabberwock → diaverboca (Como
jabberwock)
- (eng-spa): gg → qazz234234 (fff Rana £@¡)
- (eng-spa): eee → wertyuiojlugiut (ahora, muchas gracias eee /!£$ £@¡)
- (eng-spa): eee → [z[z[z[z[z[z[z (ahora, muchas gracias eee /!£$ £@¡)
- (spa-eng): zxzx → новое слово (I am not a zxzx.)
- (spa-eng): zaxaxa → это новое слово (am not a zxzx zaxaxa)
- (spa-eng): cat → dgdggg ttt (rad cat)
- (spa-eng): cat → gato rojo (rad cat)
- (spa-eng): rad → gato rojo (rad cat)
- (spa-eng): cat → gatooo (rad cat)
- (spa-eng): cat → qxdcty666 (rad cat)
- (spa-eng): memem → thisIsTest (Dodo foo foxie car carum memem goo cat-cat vfrtyty poipoi koo)
- (spa-eng): rnh → wdqbbyth (sader gvbfj cacafe rnh gdejb)
- (spa-eng_US): rad → хует (rad cat tat)
- (spa-eng): dva → zvn,kloi]] (ras dva tri)
- (spa-eng): dva → qaazxxdvcgyjmki, (ras dva tri)
- (spa-eng): dva → fdfdfdfdfd (ras dva tri)
- (spa-eng): cat → tgbnjkjlks (rad cat)
- (spa-eng): rad → 12367890 (rad cat)
- (eng-spa): limba → hgyturnkglg (sa limba sarda No est)
- (eng-spa): Sushain → Sushi (hola, Sushain.)
- (eng-spa): limba → bimba (sa limba sarda No est los pidarasos)
- (spa-eng): cfre → 13 (rad re cfre cdfgr)
- (spa-eng): cfre → 1313 (rad re cfre cdfgr)
- (spa-eng): re → 131313 (rad re cfre cdfgr)
- (spa-eng): chet → 21315667 (odin dva tri chet pt shest sem vosem deviat)
- (spa-eng): odin → qazzf (shest sem vosem deviat odin dva tri chet pt shest)
- (spa-eng): shest → shest (odin dva tri chet pt shest sem vosem deviat)
- (spa-eng): deviat → deviat (odin dva tri chet pt)
- (eng-spa): vosem → vosem (odin dva tri chet pt shest sem vosem vosem deviat)
- (eng-spa): vosem → uosem (odin dva tri chet pt shest sem vosem vosem deviat)
- (eng-spa): shest → 666 (odin dva tri chet pt shest sem vosem vosem deviat)
- (spa-eng): pt → 5 (odin dva tri chet pt)
- (spa-eng): pt → 5 (odin dva tri chet pt)
- (spa-eng): pt → 5 (odin dva tri chet pt)
- (spa-eng): shest → 666 (odin dva tri chet pt shest sem vosem vosem deviat)
- (spa-eng): vosem → 888 (tri chet pt shest sem, vosem deviat)
- (spa-eng): dva → dva2 (odin dva tri)
- (spa-eng): dva → 2 (odin dva tri)
- (spa-eng): dva → 2222 (odin dva tri)
- (spa-eng): dva → 22222 (odin dva tri vosem chet pt shest)
- (spa-eng): dva → 222222 (odin dva tri vosem chet pt shest)
- (spa-eng): dva → 2222222 (odin dva tri vosem chet pt shest)
- (spa-eng): vosem → 8 (chet pt shest sem, vosem vosem deviat d)
- (eng-spa): anyything → anything (ahora. No necesitas para instalar anyything. Si sigues las instrucciones aquí)
- (eng-spa): anyything → 333 (ahora. No necesitas para instalar anyything. Si sigues las instrucciones aquí)
- (eng-spa): anyything → 444 (ahora. No necesitas para instalar anyything. Si sigues las instrucciones aquí)