Difference between revisions of "Corpora"

From Apertium
Jump to navigation Jump to search
(Tatoeba Project)
(→‎Corpora: change tatoeba.fr to tatoeba.org as .fr is redirected to .org)
Line 14: Line 14:
 
* OPUS — http://urd.let.rug.nl/tiedeman/OPUS/index.php — Open Source multilingual corpora
 
* OPUS — http://urd.let.rug.nl/tiedeman/OPUS/index.php — Open Source multilingual corpora
 
* Open-Tran — http://www.open-tran.eu — single point of access to translations of open-source software in many languages (downloadable as SQLite databases)
 
* Open-Tran — http://www.open-tran.eu — single point of access to translations of open-source software in many languages (downloadable as SQLite databases)
* Tatoeba Project — http://tatoeba.fr/ — Database of example sentences translated into several languages.
+
* Tatoeba Project — http://tatoeba.org/ — Database of example sentences translated into several languages.
   
 
== Corpus tools ==
 
== Corpus tools ==

Revision as of 19:29, 30 January 2010

Lists of corpora under free licences (public domain, CC-BY-SA, GPL, etc.).

You might also want to use Wikipedia as a corpus, see Tagger_training#Creating_a_corpus or Building_dictionaries#Wikipedia_dumps and the cleanup script at Calculating_coverage.

Corpora

Use this if you want to do English--<something> (funny alignments for non-English pairs)
Use this if you want to do <anything>--<anything>

Corpus tools