Difference between revisions of "Corpora"
Jump to navigation
Jump to search
(New page: Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.) ==Corpora== * Southeast European Times — http://xixona.dlsi.ua.es/~fran/SE-Times-Corpus.tar.gz — Eng...) |
|||
Line 3: | Line 3: | ||
==Corpora== |
==Corpora== |
||
* Southeast European Times — http://xixona.dlsi.ua.es/~fran/SE-Times-Corpus.tar.gz — English,Turkish,Bulgarian,Macedonian,Serbo-Croatian,Albanian,Greek,Romanian — |
* Southeast European Times — http://xixona.dlsi.ua.es/~fran/SE-Times-Corpus.tar.gz — English,Turkish,Bulgarian,Macedonian,Serbo-Croatian,Albanian,Greek,Romanian — 9,000 approx. paragraph aligned, 90,000—120,000 words. |
||
* South African Government Services — http://xixona.dlsi.ua.es/~fran/services-gov-za-en_ZA-af_ZA.txt — English—Afrikaans — 2,500 approx. sentence aligned, 49,375 words. |
* South African Government Services — http://xixona.dlsi.ua.es/~fran/services-gov-za-en_ZA-af_ZA.txt — English—Afrikaans — 2,500 approx. sentence aligned, 49,375 words. |
||
Revision as of 13:42, 20 September 2007
Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.)
Corpora
- Southeast European Times — http://xixona.dlsi.ua.es/~fran/SE-Times-Corpus.tar.gz — English,Turkish,Bulgarian,Macedonian,Serbo-Croatian,Albanian,Greek,Romanian — 9,000 approx. paragraph aligned, 90,000—120,000 words.
- South African Government Services — http://xixona.dlsi.ua.es/~fran/services-gov-za-en_ZA-af_ZA.txt — English—Afrikaans — 2,500 approx. sentence aligned, 49,375 words.