Difference between revisions of "Corpora"

From Apertium
Jump to navigation Jump to search
m
Line 1: Line 1:
 
Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.).
 
Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.).
   
You might also want to use Wikipedia as a corpus, see [[Tagger_training#Creating_a_corpus]].
+
You might also want to use Wikipedia as a corpus, see [[Tagger_training#Creating_a_corpus]] or [[Building_dictionaries#Wikipedia_dumps]] and the cleanup script at [[Calculating_coverage]].
   
 
==Corpora==
 
==Corpora==

Revision as of 12:25, 7 December 2009

Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.).

You might also want to use Wikipedia as a corpus, see Tagger_training#Creating_a_corpus or Building_dictionaries#Wikipedia_dumps and the cleanup script at Calculating_coverage.

Corpora

Use this if you want to do English--<something> (funny alignments for non-English pairs)
Use this if you want to do <anything>--<anything>

Corpus tools