Difference between revisions of "Specific resources per language"
(added new link) |
(rearranged languages alphabetically) |
||
Line 16: | Line 16: | ||
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons |
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons |
||
==Asturian== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix apertium-es-ast.ast.dix]'' |
|||
;Resources |
|||
* [http://www.academiadelallingua.com/diccionariu/index.php? Asturian Dictionary from Asturian Language Academy] — Good resource but only in Asturian. |
|||
* [http://mas.lne.es/diccionario/ Dialectal Asturian Dictionary] — Asturian variants into Spanish. |
|||
Line 22: | Line 28: | ||
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language] |
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language] |
||
== |
==Bengali== |
||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]'' |
|||
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali |
|||
;Resources |
|||
* http://anubadok.sf.net/ -- See above |
|||
* [http://www.cornishtranslator.com/ Cornish Translator] |
|||
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist] |
|||
==Bulgarian== |
==Bulgarian== |
||
Line 37: | Line 41: | ||
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology] |
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology] |
||
==Cornish== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]'' |
|||
;Resources |
|||
* [http://www.cornishtranslator.com/ Cornish Translator] |
|||
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist] |
|||
Line 47: | Line 59: | ||
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics] |
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics] |
||
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source |
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source |
||
==Faroese== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.dix apertium-fo-is.fo.dix]'' |
|||
;Resources |
|||
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ] |
|||
* [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.rle Faroese Constraint Grammar] |
|||
==Finnish== |
|||
;Resources |
|||
* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL |
|||
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language] |
|||
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)] |
|||
<pre> |
|||
s = lemma |
|||
hn = homonymy ref |
|||
t = inflection info |
|||
tn = inflection number (referring to table) |
|||
av = ref to consonant gradation |
|||
</pre> |
|||
==German - English== |
==German - English== |
||
Line 60: | Line 94: | ||
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/ |
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/ |
||
==Hebrew== |
|||
;Resources |
|||
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL |
|||
==Hindi== |
==Hindi== |
||
Line 68: | Line 110: | ||
* Morphological analyser: http://www.iiit.net/ltrc/morph/index.htm (GPL) |
* Morphological analyser: http://www.iiit.net/ltrc/morph/index.htm (GPL) |
||
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2 |
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2 |
||
==Iranian Persian== |
==Iranian Persian== |
||
Line 75: | Line 118: | ||
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian] |
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian] |
||
==Lithuanian== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-en-lt.lt.dix apertium-en-lt.lt.dix]'' |
|||
;Resources |
|||
==Ossetian== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-os-fa.os.dix apertium-os-fa.os.dix]'' |
|||
;Resources |
|||
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] — quite nice and comprehensive. |
|||
==Norwegian== |
|||
{{see-also|North Germanic languages}} |
|||
''See: [[Norsk ordbank]]'' |
|||
==Piemontese== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-it-pms.pms.dix apertium-it-pms.pms.dix]'' |
|||
;Resources |
|||
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain |
|||
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)." |
|||
==Portuguese== |
==Portuguese== |
||
Line 85: | Line 154: | ||
We believe it has a LGPL license. |
We believe it has a LGPL license. |
||
==Quechua== |
|||
;Resources |
|||
* http://www.runasimipi.org/ |
|||
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]]) |
|||
==Russian== |
==Russian== |
||
Line 99: | Line 176: | ||
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian] |
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian] |
||
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality. |
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality. |
||
==Slovakian== |
==Slovakian== |
||
Line 119: | Line 197: | ||
* http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da) |
* http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da) |
||
==Quechua== |
|||
;Resources |
|||
* http://www.runasimipi.org/ |
|||
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]]) |
|||
==Norwegian== |
|||
{{see-also|North Germanic languages}} |
|||
''See: [[Norsk ordbank]]'' |
|||
==Urdu== |
==Urdu== |
||
Line 137: | Line 205: | ||
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system |
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system |
||
==Lithuanian== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-en-lt.lt.dix apertium-en-lt.lt.dix]'' |
|||
;Resources |
|||
==Finnish== |
|||
;Resources |
|||
* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL |
|||
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language] |
|||
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)] |
|||
<pre> |
|||
s = lemma |
|||
hn = homonymy ref |
|||
t = inflection info |
|||
tn = inflection number (referring to table) |
|||
av = ref to consonant gradation |
|||
</pre> |
|||
==Hebrew== |
|||
;Resources |
|||
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL |
|||
==Piemontese== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-it-pms.pms.dix apertium-it-pms.pms.dix]'' |
|||
;Resources |
|||
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain |
|||
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)." |
|||
==Bengali== |
|||
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali |
|||
* http://anubadok.sf.net/ -- See above |
|||
==Ossetian== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-os-fa.os.dix apertium-os-fa.os.dix]'' |
|||
;Resources |
|||
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] — quite nice and comprehensive. |
|||
==Asturian== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix apertium-es-ast.ast.dix]'' |
|||
;Resources |
|||
* [http://www.academiadelallingua.com/diccionariu/index.php? Asturian Dictionary from Asturian Language Academy] — Good resource but only in Asturian. |
|||
* [http://mas.lne.es/diccionario/ Dialectal Asturian Dictionary] — Asturian variants into Spanish. |
|||
==Faroese== |
|||
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.dix apertium-fo-is.fo.dix]'' |
|||
;Resources |
|||
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ] |
|||
* [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.rle Faroese Constraint Grammar] |
|||
==See also== |
==See also== |
Revision as of 17:26, 21 September 2009
The incubator can be found here. It provides a place for people to put dictionaries and other stuff that is useful in constructing language pairs. On this page you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.
Albanian
- Dictionary: apertium-mk-sq.sq.dix
- Resources
- http://www.albanianoverview.com/grammar.htm
- http://www.idividi.com.mk/recnik/index.htm -- albanian--macedonian dictionary (non-free)
Armenian
- Dictionary: apertium-hy-en.hy.dix
- Resources
Asturian
- Dictionary: apertium-es-ast.ast.dix
- Resources
- Asturian Dictionary from Asturian Language Academy — Good resource but only in Asturian.
- Dialectal Asturian Dictionary — Asturian variants into Spanish.
Belarusian
Bengali
- http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali
- http://anubadok.sf.net/ -- See above
Bulgarian
- Dictionary: apertium-mk-bg.bg.dix
- Resources
Cornish
- Dictionary: apertium-cy-kw.kw.dix
- Resources
Czech
- Dictionary: apertium-pl-cs.cs.dix.xml
- Resources
- Most frequent words Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords
- James Naughton's links
- Some complications with diacritics
- Czech morphological guesser - 'free', but not open source
Faroese
- Dictionary: apertium-fo-is.fo.dix
- Resources
Finnish
- Resources
- http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL
- Omorfi–Open Morphology for Finnish language
- Helsinki Finite-State Transducer Technology (HFST)
s = lemma hn = homonymy ref t = inflection info tn = inflection number (referring to table) av = ref to consonant gradation
German - English
German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for "Ding: A Dictionary LookUp program" (version 1.5 2007-04-09) from Frank Richter, Technische Universität Chemnitz
- Dictionary: apertium-de-en.dix
Greek
- Dictionary: apertium-en-el.el.dix
- Resources
- Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/
Hebrew
- Resources
- http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL
Hindi
- Dictionary: apertium-hi-ur.hi.dix
- Resources
- Morphological analyser: http://www.iiit.net/ltrc/morph/index.htm (GPL)
- POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2
Iranian Persian
- Dictionary: apertium-tg-fa.fa.dix
- Resources
Lithuanian
- Dictionary: apertium-en-lt.lt.dix
- Resources
Ossetian
- Dictionary: apertium-os-fa.os.dix
- Resources
- Ossetian: Grammatical Sketch — quite nice and comprehensive.
Norwegian
- See also: North Germanic languages
See: Norsk ordbank
Piemontese
- Dictionary: apertium-it-pms.pms.dix
- Resources
- http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain
- http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."
Portuguese
Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.
- Resources
We believe it has a LGPL license.
Quechua
- Resources
- http://www.runasimipi.org/
- AVENUE Quechua-Spanish system. (ask Francis Tyers)
Russian
- Dictionary: monodix
- Bidix: Polish-Russian
- Bidix: English-Russian
- Resources
- http://www.alphadictionary.com/rusgrammar/
- http://www.seelrc.org:8080/grammar/pdf/stand_alone_russian.pdf
- Russian analyser - non-free, Windows only
- Using Czech resources for the morphological analysis of Russian
- Pere - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
Slovakian
- Dictionary: apertium-pl-sk.sk.dix
- Resources
- http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)
- http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)
- http://www.angelfire.com/sk3/quality/Slovak_declension.html
- http://www.juls.savba.sk/msj/
Swedish - Danish
- Pair: apertium-sv-da
- Resources
- http://w3.msi.vxu.se/~nivre/research/Talbanken05.html (A 300,000-word tree-bank: it is in XML, all words are nicely tagged with PAROLE-style tags, and it should be easy to build a morphological analyser and a PoS tagger from it; authors are likely be happy to let us use it if we cite them).
- http://www.isv.cbs.dk/~mbk/treebank/ (Danish tree bank, 100,000-word, as above, under the GPL)
- http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da)
Urdu
- Dictionary: apertium-hi-ur.ur.dix
- Resources
- http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ — GPL analyser of Urdu
- http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system