Difference between revisions of "Corpora"

From Apertium
Jump to navigation Jump to search
m
m (== Corpus tools == BootCaT/Bitextor)
Line 10: Line 10:
* South African Government Services — http://xixona.dlsi.ua.es/~fran/services-gov-za-en_ZA-af_ZA.txt — English—Afrikaans — 2,500 approx. sentence aligned, 49,375 words.
* South African Government Services — http://xixona.dlsi.ua.es/~fran/services-gov-za-en_ZA-af_ZA.txt — English—Afrikaans — 2,500 approx. sentence aligned, 49,375 words.
* IJS-ELAN — http://nl.ijs.si/elan/ — English-Slovenian
* IJS-ELAN — http://nl.ijs.si/elan/ — English-Slovenian

== Corpus tools ==

* BootCaT — http://sslmit.unibo.it/~baroni/bootcat.html Simple Utilities to Bootstrap Corpora and Terms from the Web
* Bitextor — http://sourceforge.net/projects/bitextor/ - Bootstrap bilingual corpora from the web


[[Category:Resources]]
[[Category:Resources]]

Revision as of 21:59, 29 August 2008

Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.)

Corpora

Use this if you want to do English--<something> (funny alignments for non-English pairs)
Use this if you want to do <anything>--<anything>

Corpus tools