Difference between revisions of "Corpora"

From Apertium
Jump to navigation Jump to search
(Tatoeba Project)
Line 14: Line 14:
 
* OPUS — http://urd.let.rug.nl/tiedeman/OPUS/index.php — Open Source multilingual corpora
 
* OPUS — http://urd.let.rug.nl/tiedeman/OPUS/index.php — Open Source multilingual corpora
 
* Open-Tran — http://www.open-tran.eu — single point of access to translations of open-source software in many languages (downloadable as SQLite databases)
 
* Open-Tran — http://www.open-tran.eu — single point of access to translations of open-source software in many languages (downloadable as SQLite databases)
  +
* Tatoeba Project — http://tatoeba.fr/ — Database of example sentences translated into several languages.
   
 
== Corpus tools ==
 
== Corpus tools ==

Revision as of 11:17, 8 December 2009

Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.).

You might also want to use Wikipedia as a corpus, see Tagger_training#Creating_a_corpus or Building_dictionaries#Wikipedia_dumps and the cleanup script at Calculating_coverage.

Corpora

Use this if you want to do English--<something> (funny alignments for non-English pairs)
Use this if you want to do <anything>--<anything>

Corpus tools