Difference between revisions of "Specific resources per language"

From Apertium
Jump to navigation Jump to search
(added new link)
(rearranged languages alphabetically)
Line 16: Line 16:
 
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons
 
* http://www.armeniapedia.org/index.php?title=Category:Armenian_Language_Lessons
   
  +
==Asturian==
  +
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix apertium-es-ast.ast.dix]''
  +
;Resources
  +
  +
* [http://www.academiadelallingua.com/diccionariu/index.php? Asturian Dictionary from Asturian Language Academy] — Good resource but only in Asturian.
  +
* [http://mas.lne.es/diccionario/ Dialectal Asturian Dictionary] — Asturian variants into Spanish.
   
   
Line 22: Line 28:
 
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]
 
* [http://www.vitba.org/fofmb/fofmb.html GFDL grammar of the language]
   
==Cornish==
+
==Bengali==
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]''
 
   
  +
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali
;Resources
 
  +
* http://anubadok.sf.net/ -- See above
   
* [http://www.cornishtranslator.com/ Cornish Translator]
 
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]
 
   
 
==Bulgarian==
 
==Bulgarian==
Line 37: Line 41:
 
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]
 
* [http://www.sfs.nphil.uni-tuebingen.de/iscl/Theses/zhechev.pdf Bulgarian verbal morphology]
   
  +
  +
==Cornish==
  +
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix]''
  +
  +
;Resources
  +
  +
* [http://www.cornishtranslator.com/ Cornish Translator]
  +
* [http://kevindonnelly.org.uk/kernewek/ Cornish-Welsh bilingual wordlist]
   
   
Line 47: Line 59:
 
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]
 
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]
 
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source
 
* [http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Morphology/index.html Czech morphological guesser] - 'free', but not open source
  +
  +
==Faroese==
  +
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.dix apertium-fo-is.fo.dix]''
  +
  +
;Resources
  +
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]
  +
* [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.rle Faroese Constraint Grammar]
  +
  +
==Finnish==
  +
  +
;Resources
  +
  +
* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL
  +
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]
  +
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]
  +
<pre>
  +
s = lemma
  +
hn = homonymy ref
  +
t = inflection info
  +
tn = inflection number (referring to table)
  +
av = ref to consonant gradation
  +
</pre>
   
 
==German - English==
 
==German - English==
Line 60: Line 94:
   
 
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/
 
* Greek <-> Ukranian, Russian, Polish Grammar & Dictionary: http://ellinika.gnu.org.ua/
  +
  +
  +
  +
==Hebrew==
  +
  +
;Resources
  +
  +
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL
   
 
==Hindi==
 
==Hindi==
Line 68: Line 110:
 
* Morphological analyser: http://www.iiit.net/ltrc/morph/index.htm (GPL)
 
* Morphological analyser: http://www.iiit.net/ltrc/morph/index.htm (GPL)
 
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2
 
* POS tagged English-Hindi wordlist: http://indlinux.sourceforge.net/downloads/files/hindidict.txt.bz2
  +
   
 
==Iranian Persian==
 
==Iranian Persian==
Line 75: Line 118:
   
 
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]
 
* [http://books.google.com/books?vid=OCLC20216670&id=Ru1ncSqiRXkC&printsec=titlepage&hl=de#PPA24,M1 Grammar of Persian]
  +
  +
==Lithuanian==
  +
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-en-lt.lt.dix apertium-en-lt.lt.dix]''
  +
  +
;Resources
  +
  +
==Ossetian==
  +
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-os-fa.os.dix apertium-os-fa.os.dix]''
  +
  +
;Resources
  +
  +
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.
  +
  +
==Norwegian==
  +
{{see-also|North Germanic languages}}
  +
''See: [[Norsk ordbank]]''
  +
  +
==Piemontese==
  +
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-it-pms.pms.dix apertium-it-pms.pms.dix]''
  +
;Resources
  +
  +
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain
  +
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."
  +
  +
  +
   
 
==Portuguese==
 
==Portuguese==
Line 85: Line 154:
   
 
We believe it has a LGPL license.
 
We believe it has a LGPL license.
  +
  +
==Quechua==
  +
  +
;Resources
  +
  +
* http://www.runasimipi.org/
  +
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])
  +
   
 
==Russian==
 
==Russian==
Line 99: Line 176:
 
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]
 
* [http://citeseer.ist.psu.edu/cache/papers/cs2/433/http:zSzzSzwww.ling.ohio-state.eduzSz~hanazSzbibliozSzHanaFeldmanBrew2004-RusMorphLite.pdf/hana04resourcelight.pdf Using Czech resources for the morphological analysis of Russian]
 
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
 
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
  +
   
 
==Slovakian==
 
==Slovakian==
Line 119: Line 197:
 
* http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da)
 
* http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da)
   
==Quechua==
 
 
;Resources
 
 
* http://www.runasimipi.org/
 
* AVENUE Quechua-Spanish system. (ask [[User:Francis Tyers|Francis Tyers]])
 
 
==Norwegian==
 
{{see-also|North Germanic languages}}
 
''See: [[Norsk ordbank]]''
 
   
 
==Urdu==
 
==Urdu==
Line 137: Line 205:
 
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system
 
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system
   
==Lithuanian==
 
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-en-lt.lt.dix apertium-en-lt.lt.dix]''
 
   
;Resources
 
 
==Finnish==
 
 
;Resources
 
 
* http://kaino.kotus.fi/sanat/nykysuomi/ &mdash; full form list for Finnish -- LGPL
 
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]
 
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]
 
<pre>
 
s = lemma
 
hn = homonymy ref
 
t = inflection info
 
tn = inflection number (referring to table)
 
av = ref to consonant gradation
 
</pre>
 
 
==Hebrew==
 
 
;Resources
 
 
* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL
 
 
==Piemontese==
 
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-it-pms.pms.dix apertium-it-pms.pms.dix]''
 
;Resources
 
 
* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain
 
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."
 
 
==Bengali==
 
 
* http://bengalinux.sourceforge.net/cgi-bin/anubadok/index.pl -- Free software translation for English→Bengali
 
* http://anubadok.sf.net/ -- See above
 
 
==Ossetian==
 
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-os-fa.os.dix apertium-os-fa.os.dix]''
 
 
;Resources
 
 
* [http://www.azargoshnasp.net/languages/ossetian/grammersketchossetian.pdf Ossetian: Grammatical Sketch] &mdash; quite nice and comprehensive.
 
 
==Asturian==
 
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-es-ast.ast.dix apertium-es-ast.ast.dix]''
 
;Resources
 
 
* [http://www.academiadelallingua.com/diccionariu/index.php? Asturian Dictionary from Asturian Language Academy] &mdash; Good resource but only in Asturian.
 
* [http://mas.lne.es/diccionario/ Dialectal Asturian Dictionary] &mdash; Asturian variants into Spanish.
 
 
==Faroese==
 
:''Dictionary: [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.dix apertium-fo-is.fo.dix]''
 
 
;Resources
 
* [http://giellatekno.uit.no/cgi/d-fao.eng.html U. Tromsø -- Faroese analyser ]
 
* [http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-fo-is.fo.rle Faroese Constraint Grammar]
 
   
 
==See also==
 
==See also==

Revision as of 17:26, 21 September 2009

The incubator can be found here. It provides a place for people to put dictionaries and other stuff that is useful in constructing language pairs. On this page you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.

Albanian

Dictionary: apertium-mk-sq.sq.dix
Resources

Armenian

Dictionary: apertium-hy-en.hy.dix
Resources

Asturian

Dictionary: apertium-es-ast.ast.dix
Resources


Belarusian

Bengali


Bulgarian

Dictionary: apertium-mk-bg.bg.dix
Resources


Cornish

Dictionary: apertium-cy-kw.kw.dix
Resources


Czech

Dictionary: apertium-pl-cs.cs.dix.xml
Resources

Faroese

Dictionary: apertium-fo-is.fo.dix
Resources

Finnish

Resources
s = lemma
hn = homonymy ref
t = inflection info
tn = inflection number (referring to table)
av = ref to consonant gradation

German - English

German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for "Ding: A Dictionary LookUp program" (version 1.5 2007-04-09) from Frank Richter, Technische Universität Chemnitz

Dictionary: apertium-de-en.dix

Greek

Dictionary: apertium-en-el.el.dix
Resources


Hebrew

Resources

Hindi

Dictionary: apertium-hi-ur.hi.dix
Resources


Iranian Persian

Dictionary: apertium-tg-fa.fa.dix
Resources

Lithuanian

Dictionary: apertium-en-lt.lt.dix
Resources

Ossetian

Dictionary: apertium-os-fa.os.dix
Resources

Norwegian

See also: North Germanic languages

See: Norsk ordbank

Piemontese

Dictionary: apertium-it-pms.pms.dix
Resources



Portuguese

Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.

Resources

We believe it has a LGPL license.

Quechua

Resources


Russian

Dictionary: monodix
Bidix: Polish-Russian
Bidix: English-Russian
Resources


Slovakian

Dictionary: apertium-pl-sk.sk.dix
Resources

Swedish - Danish

Pair: apertium-sv-da
Resources


Urdu

Dictionary: apertium-hi-ur.ur.dix
Resources


See also