Difference between revisions of "Specific resources per language"
(added new link) |
Dharjunior (talk | contribs) |
||
(44 intermediate revisions by 14 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
− | The incubator can be found |
+ | The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work. |
− | ==Albanian== |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-mk-sq.sq.dix apertium-mk-sq.sq.dix]'' |
||
− | ;Resources |
||
+ | ==Specific resources per language== |
||
+ | |||
+ | Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free. |
||
+ | |||
+ | See also the individual language pages. |
||
+ | |||
+ | ===[[Albanian]]=== |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-sqi/blob/master/apertium-sqi.sqi.dix Albanian Monodix]'' |
||
+ | |||
+ | ;Resources |
||
* http://www.albanianoverview.com/grammar.htm |
* http://www.albanianoverview.com/grammar.htm |
||
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free) |
* http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free) |
||
− | ==Armenian== |
+ | ===[[Armenian]]=== |
− | :''Dictionary: [ |
+ | :''Dictionary: [https://github.com/apertium/apertium-hye/blob/master/apertium-hye.hye.dix Armenian Monodix]'' |
;Resources |
;Resources |
||
Line 16: | Line 23: | ||
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons |
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons |
||
+ | ===[[Assamese and Hindi]]=== |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-as-hi/blob/91f3c38b0c636deb620cbd27725d63dd763c5f0b/apertium-as-hi.hi.dix Assemese-Hindi Bidix]'' |
||
+ | --- Anusuya |
||
− | ==Belarusian== |
||
+ | |||
+ | ===[[Belarusian]]=== |
||
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language] |
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language] |
||
− | == |
+ | ===[[Bengali]]=== |
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]'' |
||
+ | * http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali |
||
− | ;Resources |
||
+ | * http://anubadok.sf.net/ -- See above |
||
+ | ===[[Bulgarian]]=== |
||
− | * [http://www.cornishtranslator.com/ Cornish Translator] |
||
+ | :''Dictionary: [https://raw.githubusercontent.com/apertium/apertium-bul/master/apertium-bul.bul.dix Bulgarian Monodix]'' |
||
− | * [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist] |
||
− | |||
− | ==Bulgarian== |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-mk-bg.bg.dix apertium-mk-bg.bg.dix]'' |
||
;Resources |
;Resources |
||
Line 37: | Line 45: | ||
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology] |
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology] |
||
+ | ===[[Cornish]]=== |
||
+ | |||
+ | :''Dictionary: [https://sourceforge.net/projects/apertium/files/apertium-cy-en/0.1.0/ Cornish Monodix from SourceForge]'' |
||
+ | |||
+ | '''This resource has not been migrated to GitHub from SVN |
||
+ | ''' |
||
+ | |||
+ | ;Resources |
||
+ | |||
+ | * [http://www.cornishtranslator.com/ Cornish Translator] |
||
+ | * [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist] |
||
+ | ===[[Czech]]=== |
||
+ | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/incubator/apertium-pl-cs.cs.dix.xml apertium-pl-cs.cs.dix.xml]'' |
||
+ | '''This resource has not been migrated to GitHub from SVN |
||
+ | ''' |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-eo-cs/blob/c16fa21194a285941307a68e420c194a1825ebc3/apertium-eo-cs.eo-cs.dix Czech-Esperanto Bidix]'' |
||
− | ==Czech== |
||
− | :''Dictionary: [ |
+ | :''Dictionary: [https://github.com/apertium/apertium-cs-sl/tree/062fa172705e16f77302a8096df3733581079fb8 Czech-Slovenian Bidix]'' |
;Resources |
;Resources |
||
Line 48: | Line 71: | ||
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source |
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source |
||
+ | ===[[Faroese]]=== |
||
− | ==German - English== |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-fao/blob/master/apertium-fao.fao.dix Faroese Monodix]'' |
||
+ | |||
+ | ;Resources |
||
+ | * [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ] |
||
+ | * [https://github.com/apertium/apertium-fao-isl/blob/master/apertium-fao-isl.fao-isl.rlx Faroese Constraint Grammar] |
||
+ | * [http://www.archive.org/details/frskanthologi00denmgoog Faroese-Danish dictionary from 1886] |
||
+ | |||
+ | ===[[Finnish]]=== |
||
+ | {{see-also|Omorfi}} |
||
+ | ;Resources |
||
+ | |||
+ | * http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL |
||
+ | * [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language] |
||
+ | * [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)] |
||
+ | <pre> |
||
+ | s = lemma |
||
+ | hn = homonymy ref |
||
+ | t = inflection info |
||
+ | tn = inflection number (referring to table) |
||
+ | av = ref to consonant gradation |
||
+ | </pre> |
||
+ | |||
+ | ===[[German and English]]=== |
||
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz] |
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for [http://www-user.tu-chemnitz.de/~fri/ding/ "Ding: A Dictionary LookUp program"] (version 1.5 2007-04-09) from Frank Richter, [http://tu-chemnitz.de Technische Universität Chemnitz] |
||
− | :'' |
+ | :''[https://github.com/apertium/apertium-eng-deu/blob/master/apertium-eng-deu.eng-deu.dix German-English Dictionary]'' |
− | ==Greek== |
+ | ===[[Greek]]=== |
− | :''Dictionary: [ |
+ | :''Dictionary: [https://github.com/apertium/apertium-ell/blob/master/apertium-ell.ell.dix Greek Monodix] |
+ | :''Greek-English Dictionary: [https://github.com/apertium/apertium-ell-eng/blob/master/apertium-ell-eng.eng.dix Greek-English Dictionary] |
||
;Resources |
;Resources |
||
Line 61: | Line 108: | ||
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/ |
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/ |
||
− | == |
+ | ===[[Hebrew]]=== |
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-hi-ur.hi.dix apertium-hi-ur.hi.dix] |
||
;Resources |
;Resources |
||
+ | * http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL |
||
− | * Morphological analyser: http://www.iiit.net/ltrc/morph/index.htm (GPL) |
||
+ | * http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password |
||
− | * POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2 |
||
+ | * http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL |
||
+ | * http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown |
||
+ | * http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL |
||
+ | * http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL |
||
− | == |
+ | ===[[Hindi]]=== |
+ | {{see-also|Hindi}} |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-tg-fa.fa.dix apertium-tg-fa.fa.dix]'' |
||
;Resources |
;Resources |
||
+ | * POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2 |
||
− | * [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian] |
||
+ | * https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix |
||
− | ==Portuguese== |
||
+ | * https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix |
||
+ | * https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list |
||
+ | * https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix |
||
+ | |||
− | Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it. |
||
+ | |||
+ | ===[[Iranian Persian]]=== |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-pes/blob/master/apertium-pes.pes.dix Persian Monodix]'' |
||
;Resources |
;Resources |
||
+ | * [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian] |
||
− | * [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil] |
||
+ | ===[[Ingush]]=== |
||
− | We believe it has a LGPL license. |
||
+ | ; Resources |
||
− | ==Russian== |
||
+ | * [http://www.linguistics.berkeley.edu/~ingush/database.html Lexical database] (non-free) |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-pl-ru.ru.dix.xml monodix]'' |
||
+ | * [http://books.google.com/books?id=J7wqVHeRWdwC&pg=PA5&lpg=PA5&dq=ingush+father&source=bl&ots=N8TDZudzGZ&sig=JO9X_Y9gio7dUhZWeyZX7j17iPw&hl=ca&ei=vfq4TM6CH86OjAfO94XaDg&sa=X&oi=book_result&ct=result&resnum=3&ved=0CB8Q6AEwAg#v=onepage&q=ingush%20father&f=false Ingush-English dict] (non-free) |
||
− | :''Bidix: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-pl-ru.pl-ru.dix.xml Polish-Russian]'' |
||
− | :''Bidix: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-en-ru.en-ru.dix.xml English-Russian] |
||
+ | ===[[Latvian]]=== |
||
;Resources |
;Resources |
||
+ | * https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml) |
||
+ | ;See also |
||
− | * http://www.alphadictionary.com/rusgrammar/ |
||
+ | * [[Latvian and Russian]] |
||
− | * http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf |
||
− | * [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only |
||
− | * [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian] |
||
− | *[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality. |
||
+ | ===[[Lithuanian]]=== |
||
− | ==Slovakian== |
||
− | :''Dictionary: [ |
+ | :''Dictionary: [https://github.com/apertium/apertium-lit/blob/master/apertium-lit.lit.dix Lithuanian Monodix]'' |
;Resources |
;Resources |
||
+ | ===[[Nogai]]=== |
||
− | * http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English) |
||
− | * http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish) |
||
− | * http://www.angelfire.com/sk3/quality/Slovak_declension.html |
||
− | * http://www.juls.savba.sk/msj/ |
||
+ | ; Resources |
||
− | ==Swedish - Danish== |
||
+ | |||
− | :''Pair: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-sv-da/ apertium-sv-da]'' |
||
+ | * [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary] |
||
+ | |||
+ | ===[[Ossetian]]=== |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-oss/blob/master/apertium-oss.oss.dix Ossetian Monodix]'' |
||
;Resources |
;Resources |
||
+ | * [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] — quite nice and comprehensive. |
||
− | * http://w3.msi.vxu.se/~nivre/research/Talbanken05.html (A 300,000-word tree-bank: it is in XML, all words are nicely tagged with PAROLE-style tags, and it should be easy to build a morphological analyser and a PoS tagger from it; authors are likely be happy to let us use it if we cite them). |
||
+ | * [http://www.ossetic-studies.org/ Ossetic National Corpus] |
||
− | * http://www.isv.cbs.dk/~mbk/treebank/ (Danish tree bank, 100,000-word, as above, under the GPL) |
||
− | * http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da) |
||
+ | ===[[Piemontese]]=== |
||
− | ==Quechua== |
||
+ | :''Dictionary: [https://sourceforge.net/p/apertium/svn/HEAD/tree/incubator/apertium-it-pms.pms.dix Piemontese Monodix from SourceForge]'' |
||
+ | '''This resource has not been migrated to GitHub from SVN |
||
+ | ''' |
||
;Resources |
;Resources |
||
+ | * http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain |
||
− | * http://www.runasimipi.org/ |
||
+ | * http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)." |
||
− | * AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]]) |
||
+ | ===[[Portuguese]]=== |
||
− | ==Norwegian== |
||
− | {{see-also|North Germanic languages}} |
||
− | ''See: [[Norsk ordbank]]'' |
||
+ | Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it. |
||
− | ==Urdu== |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-hi-ur.ur.dix apertium-hi-ur.ur.dix]'' |
||
;Resources |
;Resources |
||
− | * http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ — GPL analyser of Urdu |
||
− | * http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system |
||
+ | * [http://www.nilc.icmc.usp.br/nilc/projects/unitex-pb/web/dicionarios.html Recursos Lexicais Português do Brasil] |
||
− | ==Lithuanian== |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-en-lt.lt.dix apertium-en-lt.lt.dix]'' |
||
+ | We believe it has a LGPL license. |
||
− | ;Resources |
||
− | == |
+ | ===[[Punjabi]]=== |
− | ;Resources |
+ | ; Resources |
+ | * [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon] |
||
− | * http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL |
||
− | * [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language] |
||
− | * [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)] |
||
− | <pre> |
||
− | s = lemma |
||
− | hn = homonymy ref |
||
− | t = inflection info |
||
− | tn = inflection number (referring to table) |
||
− | av = ref to consonant gradation |
||
− | </pre> |
||
− | == |
+ | ===[[Quechua]]=== |
;Resources |
;Resources |
||
+ | * http://www.runasimipi.org/ |
||
− | * http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL |
||
+ | * AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]]) |
||
− | == |
+ | ===[[Russian]]=== |
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-it-pms.pms.dix apertium-it-pms.pms.dix]'' |
||
− | ;Resources |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-rus/blob/master/apertium-rus.rus.dix monodix]'' |
||
− | * http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain |
||
+ | :''Bidix: [https://github.com/apertium/apertium-pol-rus/blob/master/apertium-pol-rus.pol-rus.dix Polish-Russian]'' |
||
− | * http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)." |
||
+ | :''Bidix: [https://github.com/apertium/apertium-rus-eng/blob/master/apertium-ru-en.ru.dix English-Russian] |
||
+ | ;Resources |
||
− | ==Bengali== |
||
+ | * http://www.alphadictionary.com/rusgrammar/ |
||
− | * http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali |
||
+ | * http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf |
||
− | * http://anubadok.sf.net/ -- See above |
||
+ | * [http://www.cic.ipn.mx/~sidorov/rmorph/index.html Russian analyser] - non-free, Windows only |
||
+ | * [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian] |
||
+ | *[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality. |
||
+ | * [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries]. |
||
+ | * [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary] |
||
+ | ===[[Sanskrit]] '''संस्कृतम्'''=== |
||
− | ==Ossetian== |
||
− | :''Dictionary: [ |
+ | :''Dictionary: [https://github.com/apertium/apertium-san/blob/master/apertium-san.san.dix Sanskrit Monodix] |
;Resources |
;Resources |
||
+ | * [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln] |
||
+ | * [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary |
||
+ | * [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download]. |
||
+ | ===[[Slovakian]]=== |
||
− | * [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] — quite nice and comprehensive. |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-slk/blob/master/apertium-slk.slk.dix Slovak Monodix]'' |
||
− | ==Asturian== |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix apertium-es-ast.ast.dix]'' |
||
;Resources |
;Resources |
||
+ | * http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English) |
||
− | * [http://www.academiadelallingua.com/diccionariu/index.php? Asturian Dictionary from Asturian Language Academy] — Good resource but only in Asturian. |
||
+ | * http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish) |
||
− | * [http://mas.lne.es/diccionario/ Dialectal Asturian Dictionary] — Asturian variants into Spanish. |
||
+ | * http://www.angelfire.com/sk3/quality/Slovak_declension.html |
||
+ | * http://www.juls.savba.sk/msj/ |
||
− | == |
+ | ===[[Thai]]=== |
+ | * https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause |
||
− | :''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.dix apertium-fo-is.fo.dix]'' |
||
+ | |||
+ | ===[[Urdu]]=== |
||
+ | :''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]'' |
||
+ | :''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]'' |
||
;Resources |
;Resources |
||
+ | * http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ — GPL analyser of Urdu |
||
− | * [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ] |
||
+ | * http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system |
||
− | * [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.rle Faroese Constraint Grammar] |
||
+ | |||
+ | |||
+ | ==Github Migration== |
||
+ | |||
+ | For languages whose resources are not yet on Github, you can use [[apertium-init]] to make their corresponding repository and add the files from SVN to that repositiry. |
||
+ | |||
− | ==See also== |
||
− | *[[Attic]] |
||
[[Category:Development]] |
[[Category:Development]] |
||
+ | [[Category:Repository]] |
||
+ | [[Category:Documentation in English]] |
Revision as of 13:39, 30 November 2018
The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work.
Specific resources per language
Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.
See also the individual language pages.
Albanian
- Dictionary: Albanian Monodix
- Resources
- http://www.albanianoverview.com/grammar.htm
- http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)
Armenian
- Dictionary: Armenian Monodix
- Resources
Assamese and Hindi
- Dictionary: Assemese-Hindi Bidix
--- Anusuya
Belarusian
Bengali
- http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali
- http://anubadok.sf.net/ -- See above
Bulgarian
- Dictionary: Bulgarian Monodix
- Resources
Cornish
- Dictionary: Cornish Monodix from SourceForge
This resource has not been migrated to GitHub from SVN
- Resources
Czech
- Dictionary: apertium-pl-cs.cs.dix.xml
This resource has not been migrated to GitHub from SVN
- Dictionary: Czech-Esperanto Bidix
- Dictionary: Czech-Slovenian Bidix
- Resources
- Most frequent words Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords
- James Naughton's links
- Some complications with diacritics
- Czech morphological guesser - 'free', but not open source
Faroese
- Dictionary: Faroese Monodix
- Resources
Finnish
- See also: Omorfi
- Resources
- http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL
- Omorfi–Open Morphology for Finnish language
- Helsinki Finite-State Transducer Technology (HFST)
s = lemma hn = homonymy ref t = inflection info tn = inflection number (referring to table) av = ref to consonant gradation
German and English
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for "Ding: A Dictionary LookUp program" (version 1.5 2007-04-09) from Frank Richter, Technische Universität Chemnitz
Greek
- Dictionary: Greek Monodix
- Greek-English Dictionary: Greek-English Dictionary
- Resources
- Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/
Hebrew
- Resources
- http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL
- http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password
- http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL
- http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown
- http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL
- http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL
Hindi
- See also: Hindi
- Resources
- POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2
- https://github.com/unhammer/apertium-en-hi/blob/master/apertium-en-hi.en.dix
- https://github.com/apertium/apertium-hin/blob/master/apertium-hin.hin.dix
- https://github.com/apertium/apertium-urd-hin/blob/master/dev/en-hi-ur.list
- https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix
Iranian Persian
- Dictionary: Persian Monodix
- Resources
Ingush
- Resources
- Lexical database (non-free)
- Ingush-English dict (non-free)
Latvian
- Resources
- https://github.com/PeterisP/morphology GPL full-form dictionary (https://github.com/PeterisP/morphology/blob/master/src/main/resources/Lexicon.xml)
- See also
Lithuanian
- Dictionary: Lithuanian Monodix
- Resources
Nogai
- Resources
Ossetian
- Dictionary: Ossetian Monodix
- Resources
- Ossetian: Grammatical Sketch — quite nice and comprehensive.
- Ossetic National Corpus
Piemontese
- Dictionary: Piemontese Monodix from SourceForge
This resource has not been migrated to GitHub from SVN
- Resources
- http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain
- http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."
Portuguese
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.
- Resources
We believe it has a LGPL license.
Punjabi
- Resources
Quechua
- Resources
- http://www.runasimipi.org/
- AVENUE Quechua-Spanish system. (ask Francis Tyers)
Russian
- Dictionary: monodix
- Bidix: Polish-Russian
- Bidix: English-Russian
- Resources
- http://www.alphadictionary.com/rusgrammar/
- http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf
- Russian analyser - non-free, Windows only
- Using Czech resources for the morphological analysis of Russian
- Pere - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
- Russian--Tajik phrase dictionary, 41k entries.
- Another Tajik--Russian dictionary
Sanskrit संस्कृतम्
- Dictionary: Sanskrit Monodix
- Resources
Slovakian
- Dictionary: Slovak Monodix
- Resources
- http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)
- http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)
- http://www.angelfire.com/sk3/quality/Slovak_declension.html
- http://www.juls.savba.sk/msj/
Thai
- https://github.com/veer66/Yaitron Yaitron English-Thai and Thai-English XML dictionary, license seems standard 4-clause
Urdu
- Dictionary: Urdu Monodix
- Bidix: Hindi-Urdu Monodix
- Resources
- http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ — GPL analyser of Urdu
- http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system
Github Migration
For languages whose resources are not yet on Github, you can use apertium-init to make their corresponding repository and add the files from SVN to that repositiry.