Difference between revisions of "Specific resources per language"
Dharjunior (talk | contribs) |
(→Urdu) |
||
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
{{Github-migration-check}} |
|||
{{TOCD}} |
{{TOCD}} |
||
The incubator can be found in the 'incubator' |
The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work. |
||
Line 14: | Line 13: | ||
;Resources |
;Resources |
||
* http://mylanguages.org/learn_albanian.php |
|||
* http://www.albanianoverview.com/grammar.htm |
|||
* http://www.seelrc.org:8080/grammar/pdf/albanian_bookmarked.pdf |
|||
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free) |
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free) |
||
Line 25: | Line 25: | ||
===[[Assamese and Hindi]]=== |
===[[Assamese and Hindi]]=== |
||
:''Dictionary: [https:// |
:''Dictionary: [https://github.com/apertium/apertium-as-hi/blob/91f3c38b0c636deb620cbd27725d63dd763c5f0b/apertium-as-hi.hi.dix Assemese-Hindi Bidix]'' |
||
Line 40: | Line 40: | ||
===[[Bulgarian]]=== |
===[[Bulgarian]]=== |
||
:''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]'' |
|||
* https://link.springer.com/article/10.1007/s11185-010-9059-2 |
|||
;Resources |
|||
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology] |
|||
===[[Cornish]]=== |
===[[Cornish]]=== |
||
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix from SourceForge]'' |
|||
[No Longer Accessible] |
|||
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]'' |
:''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]'' |
||
'''This resource has not been migrated to GitHub from SVN |
|||
''' |
|||
;Resources |
;Resources |
||
* https://www.freelang.net/online/cornish.php |
|||
* [http://www.cornishtranslator.com/ Cornish Translator] |
|||
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist] |
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist] |
||
===[[Czech]]=== |
===[[Czech]]=== |
||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]'' |
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]'' |
||
'''This resource has not been migrated to GitHub from SVN |
|||
''' |
|||
:''Dictionary: [https://github.com/apertium/apertium-eo-cs/blob/c16fa21194a285941307a68e420c194a1825ebc3/apertium-eo-cs.eo-cs.dix Czech-Esperanto Bidix]'' |
|||
:''Dictionary: [https://github.com/apertium/apertium-cs-sl/tree/062fa172705e16f77302a8096df3733581079fb8 Czech-Slovenian Bidix]'' |
|||
;Resources |
;Resources |
||
* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords |
|||
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links] |
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links] |
||
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics] |
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics] |
||
Line 79: | Line 81: | ||
* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL |
* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL |
||
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language] |
|||
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)] |
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)] |
||
<pre> |
<pre> |
||
Line 107: | Line 108: | ||
;Resources |
;Resources |
||
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL |
|||
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password |
|||
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL |
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL |
||
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown |
|||
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL |
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL |
||
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL |
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL |
||
Line 156: | Line 154: | ||
===[[Nogai]]=== |
===[[Nogai]]=== |
||
'''Contents to be added''' |
|||
; Resources |
|||
* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary] |
|||
===[[Ossetian]]=== |
===[[Ossetian]]=== |
||
Line 169: | Line 165: | ||
===[[Piemontese]]=== |
===[[Piemontese]]=== |
||
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]'' |
:''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]'' |
||
'''This resource has not been migrated to GitHub from SVN |
|||
''' |
|||
;Resources |
;Resources |
||
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain |
|||
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)." |
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)." |
||
Line 187: | Line 185: | ||
===[[Punjabi]]=== |
===[[Punjabi]]=== |
||
'''Contents to be added''' |
|||
; Resources |
|||
* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon] |
|||
===[[Quechua]]=== |
===[[Quechua]]=== |
||
Line 212: | Line 208: | ||
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality. |
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality. |
||
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries]. |
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries]. |
||
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary] |
|||
===[[Sanskrit]] '''संस्कृतम्'''=== |
===[[Sanskrit]] '''संस्कृतम्'''=== |
||
Line 220: | Line 215: | ||
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln] |
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln] |
||
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary |
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary |
||
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download]. |
|||
===[[Slovakian]]=== |
===[[Slovakian]]=== |
||
Line 227: | Line 221: | ||
;Resources |
;Resources |
||
* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English) |
|||
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish) |
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish) |
||
* http://www.angelfire.com/sk3/quality/Slovak_declension.html |
* http://www.angelfire.com/sk3/quality/Slovak_declension.html |
||
* http://www.juls.savba.sk/msj/ |
|||
===[[Thai]]=== |
===[[Thai]]=== |
||
Line 239: | Line 231: | ||
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]'' |
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]'' |
||
==Github Migration== |
|||
;Resources |
|||
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ — GPL analyser of Urdu |
|||
For languages whose resources are not yet on Github, you can use [[apertium-init]] to make their corresponding repository and add the files from SVN to that repositiry. |
|||
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system |
|||
Latest revision as of 20:40, 11 December 2019
The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work.
Specific resources per language[edit]
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.
See also the individual language pages.
Albanian[edit]
- Dictionary: Albanian Monodix
- Resources
- http://mylanguages.org/learn_albanian.php
- http://www.seelrc.org:8080/grammar/pdf/albanian_bookmarked.pdf
- http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)
Armenian[edit]
- Dictionary: Armenian Monodix
- Resources
Assamese and Hindi[edit]
- Dictionary: Assemese-Hindi Bidix
--- Anusuya
Belarusian[edit]
Bengali[edit]
- http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali
- http://anubadok.sf.net/ -- See above
Bulgarian[edit]
Cornish[edit]
- Dictionary: Cornish Monodix from SourceForge
This resource has not been migrated to GitHub from SVN
- Resources
Czech[edit]
- Dictionary: apertium-pl-cs.cs.dix.xml
This resource has not been migrated to GitHub from SVN
- Dictionary: Czech-Esperanto Bidix
- Dictionary: Czech-Slovenian Bidix
- Resources
- James Naughton's links
- Some complications with diacritics
- Czech morphological guesser - 'free', but not open source
Faroese[edit]
- Dictionary: Faroese Monodix
- Resources
Finnish[edit]
- See also: Omorfi
- Resources
- http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL
- Helsinki Finite-State Transducer Technology (HFST)
s = lemma hn = homonymy ref t = inflection info tn = inflection number (referring to table) av = ref to consonant gradation
German and English[edit]
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for "Ding: A Dictionary LookUp program" (version 1.5 2007-04-09) from Frank Richter, Technische Universität Chemnitz
Greek[edit]
- Dictionary: Greek Monodix
- Greek-English Dictionary: Greek-English Dictionary
- Resources
- Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/
Hebrew[edit]
- Resources
- http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL
- http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL
- http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL
Hindi[edit]
- See also: Hindi
- Resources
- POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2
- https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix
- https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix
- https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list
- https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix
Iranian Persian[edit]
- Dictionary: Persian Monodix
- Resources
Ingush[edit]
- Resources
- Lexical database (non-free)
- Ingush-English dict (non-free)
Latvian[edit]
- Resources
- https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)
- See also
Lithuanian[edit]
- Dictionary: Lithuanian Monodix
- Resources
Nogai[edit]
Contents to be added
Ossetian[edit]
- Dictionary: Ossetian Monodix
- Resources
- Ossetian: Grammatical Sketch — quite nice and comprehensive.
- Ossetic National Corpus
Piemontese[edit]
- Dictionary: Piemontese Monodix from SourceForge
This resource has not been migrated to GitHub from SVN
- Resources
- http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."
Portuguese[edit]
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.
- Resources
We believe it has a LGPL license.
Punjabi[edit]
Contents to be added
Quechua[edit]
- Resources
- http://www.runasimipi.org/
- AVENUE Quechua-Spanish system. (ask Francis Tyers)
Russian[edit]
- Dictionary: monodix
- Bidix: Polish-Russian
- Bidix: English-Russian
- Resources
- http://www.alphadictionary.com/rusgrammar/
- http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf
- Russian analyser - non-free, Windows only
- Using Czech resources for the morphological analysis of Russian
- Pere - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
- Russian--Tajik phrase dictionary, 41k entries.
Sanskrit संस्कृतम्[edit]
- Dictionary: Sanskrit Monodix
- Resources
- Sanskrit Lexicon at Uni-Koeln
- Apte's En-Sa dictionary
Slovakian[edit]
- Dictionary: Slovak Monodix
- Resources
- http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)
- http://www.angelfire.com/sk3/quality/Slovak_declension.html
Thai[edit]
- https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause
Urdu[edit]
- Dictionary: Urdu Monodix
- Bidix: Hindi-Urdu Monodix
Github Migration[edit]
For languages whose resources are not yet on Github, you can use apertium-init to make their corresponding repository and add the files from SVN to that repositiry.