Corpora

From Apertium
Revision as of 19:29, 30 January 2010 by 77.206.237.80 (talk) (→‎Corpora: change tatoeba.fr to tatoeba.org as .fr is redirected to .org)
Jump to navigation Jump to search

Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.).

You might also want to use Wikipedia as a corpus, see Tagger_training#Creating_a_corpus or Building_dictionaries#Wikipedia_dumps and the cleanup script at Calculating_coverage.

Corpora

Use this if you want to do English--<something> (funny alignments for non-English pairs)
Use this if you want to do <anything>--<anything>

Corpus tools