Difference between revisions of "Specific resources per language"

From Apertium
Jump to navigation Jump to search
 
(12 intermediate revisions by the same user not shown)
Line 64: Line 64:
;Resources
;Resources


* [http://nlp.fi.muni.cz/nlp/aisa/NlpCz/Frekvence_slov_lemmat.html Most frequent words] Also includes a list of the most frequent bi- and tri-grams, but these are of little use as multiwords
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]
* [http://users.ox.ac.uk/~tayl0010/links.html James Naughton's links]
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]
* [http://www.czech-language.cz/alphabet/alph-krtiny.html Some complications with diacritics]
Line 82: Line 81:


* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL
* http://kaino.kotus.fi/sanat/nykysuomi/ — full form list for Finnish -- LGPL
* [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/OMorFiSFSTVersion#Installation Omorfi–Open Morphology for Finnish language]
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]
<pre>
<pre>
Line 110: Line 108:
;Resources
;Resources


* http://www.mila.cs.technion.ac.il/english/resources/lexicons/ lexicons for Hebrew, in weird XLS format -- GPL
* http://www.mila.cs.technion.ac.il/english/resources/software_downloads/index.html Hebrew Morphological Analyzer (for Hebrew undotted text) -- GPL, but download link behind a password
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL
* http://www.cs.technion.ac.il/~barhaim/MorphTagger/ HMM-based part-of-speech tagger For Hebrew -- GPL
* http://www.cs.technion.ac.il/~erelsgl/bxi/hmntx/teud.html Probabilisitic Morphological Analyzer for Hebrew undotted text -- license unknown
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL
* http://hspell.ivrix.org.il/ The hspell Hebrew spell-checker has a mode for analyzing morpholocial data -- GPL
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL
* http://www.code972.com/blog/hebmorph/ HebMorph is the analyser powering hspell's capabilities -- GPL
Line 159: Line 154:
===[[Nogai]]===
===[[Nogai]]===


'''Contents to be added'''
; Resources

* [http://ksirov.ru/%D1%8F%D0%B7%D1%8B%D0%BA%D0%B8/%D0%BD%D0%BE%D0%B3%D0%B0%D0%B9%D1%81%D0%BA%D0%B8%D0%B9 Grammar Sketch and Russian-Nogai dictionary]


===[[Ossetian]]===
===[[Ossetian]]===
Line 178: Line 171:
;Resources
;Resources


* http://members.fortunecity.it/dotorcarlo/vocen1.html Piemontese--English -- public domain
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."
* http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."


Line 193: Line 185:
===[[Punjabi]]===
===[[Punjabi]]===


'''Contents to be added'''
; Resources

* [http://www.lama.univ-savoie.fr/~humayoun/punjabi/index.html Punjabi lexicon]


===[[Quechua]]===
===[[Quechua]]===
Line 218: Line 208:
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
*[http://sourceforge.net/projects/pere/ Pere] - free translator, including Russian<->Ukranian<->English dictionaries. Built from alignments, low quality.
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].
* [http://www.revdanica.com/xdxf/tmp/Muzafarov/inXDXF/rus2taj.xdxf Russian--Tajik phrase dictionary, 41k entries].
* [http://www.lugattj.com/news.php?tid=1&ln=en Another Tajik--Russian dictionary]


===[[Sanskrit]] '''संस्कृतम्'''===
===[[Sanskrit]] '''संस्कृतम्'''===
Line 226: Line 215:
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]
* [http://www.sanskrit-lexicon.uni-koeln.de/ Sanskrit Lexicon at Uni-Koeln]
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary
* [http://www.sanskrit-lexicon.uni-koeln.de/aequery/index.html Apte's En-Sa] dictionary
* [http://www.sanskrit-lexicon.uni-koeln.de/download.html Material available for download].


===[[Slovakian]]===
===[[Slovakian]]===
Line 233: Line 221:
;Resources
;Resources


* http://old.bohemica.com/slovak/slovakgrammar.pdf (Slovakian, with some English)
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)
* http://pl.wiktionary.org/wiki/Aneks:J%C4%99zyk_s%C5%82owacki_-_tabele_koniugacji (In Polish)
* http://www.angelfire.com/sk3/quality/Slovak_declension.html
* http://www.angelfire.com/sk3/quality/Slovak_declension.html
* http://www.juls.savba.sk/msj/


===[[Thai]]===
===[[Thai]]===
Line 244: Line 230:
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''
:''Dictionary: [https://github.com/apertium/apertium-urd/blob/master/apertium-urd.urd.dix Urdu Monodix]''
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''
:''Bidix: [https://github.com/apertium/apertium-urd-hin/blob/master/apertium-urd-hin.urd-hin.dix Hindi-Urdu Monodix]''

;Resources
* http://www.lama.univ-savoie.fr/~humayoun/UrduMorph/ &mdash; GPL analyser of Urdu
* http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm -- Urdu--English MT system



==Github Migration==
==Github Migration==

Latest revision as of 20:40, 11 December 2019

The incubator can be found in the 'incubator' column in https://apertium.github.io/apertium-on-github/source-browser.html. It houses language pairs which haven't completely matured and are under work.


Specific resources per language[edit]

Here are some links to resources that might be useful for expanding on work in the Incubator. Below you can put resources which will be useful in the construction. Try and mark them for licence, or at least free/non-free.

See also the individual language pages.

Albanian[edit]

Dictionary: Albanian Monodix
Resources

Armenian[edit]

Dictionary: Armenian Monodix
Resources

Assamese and Hindi[edit]

Dictionary: Assemese-Hindi Bidix


--- Anusuya

Belarusian[edit]

Bengali[edit]

Bulgarian[edit]

Cornish[edit]

Dictionary: Cornish Monodix from SourceForge

This resource has not been migrated to GitHub from SVN

Resources

Czech[edit]

Dictionary: apertium-pl-cs.cs.dix.xml

This resource has not been migrated to GitHub from SVN

Dictionary: Czech-Esperanto Bidix
Dictionary: Czech-Slovenian Bidix
Resources

Faroese[edit]

Dictionary: Faroese Monodix
Resources

Finnish[edit]

See also: Omorfi
Resources
s = lemma
hn = homonymy ref
t = inflection info
tn = inflection number (referring to table)
av = ref to consonant gradation

German and English[edit]

German-English bilingual dictionary (>216,000 entries), generated from linguistic data (GPL Version 2 or later) available for "Ding: A Dictionary LookUp program" (version 1.5 2007-04-09) from Frank Richter, Technische Universität Chemnitz

German-English Dictionary

Greek[edit]

Dictionary: Greek Monodix
Greek-English Dictionary: Greek-English Dictionary
Resources

Hebrew[edit]

Resources

Hindi[edit]

See also: Hindi
Resources


Iranian Persian[edit]

Dictionary: Persian Monodix
Resources

Ingush[edit]

Resources

Latvian[edit]

Resources
See also

Lithuanian[edit]

Dictionary: Lithuanian Monodix
Resources

Nogai[edit]

Contents to be added

Ossetian[edit]

Dictionary: Ossetian Monodix
Resources

Piemontese[edit]

Dictionary: Piemontese Monodix from SourceForge

This resource has not been migrated to GitHub from SVN

Resources
  • http://digilander.libero.it/dotor43/indexit.html -- Piemontese grammar incl. 17k word Piemontese--Italian dictionary (POS tagged and partly annotated for inflection). site suggests "© These pages can be freely used for all purposes, but not for political reasons, and not against the laws (no matter what is the country)."

Portuguese[edit]

Even if Apertium has a stable es-pt pair, the coverage of the Brazilian Portuguese Dictionary built at NILC (Universidade de Sao Paulo) for Unitex is much better, and could be used perhaps to improve it.

Resources

We believe it has a LGPL license.

Punjabi[edit]

Contents to be added

Quechua[edit]

Resources

Russian[edit]

Dictionary: monodix
Bidix: Polish-Russian
Bidix: English-Russian
Resources

Sanskrit संस्कृतम्[edit]

Dictionary: Sanskrit Monodix
Resources

Slovakian[edit]

Dictionary: Slovak Monodix
Resources

Thai[edit]

Urdu[edit]

Dictionary: Urdu Monodix
Bidix: Hindi-Urdu Monodix

Github Migration[edit]

For languages whose resources are not yet on Github, you can use apertium-init to make their corresponding repository and add the files from SVN to that repositiry.