RFERL corpora

From Apertium
Revision as of 07:07, 25 December 2011 by Firespeaker (talk | contribs)
Jump to navigation Jump to search

Radio Free Europe / Radio Liberty runs news services in a number of Central Asian languages. The information is essentially free for public use with attribution.

link to usage info

Recently we discovered how to build a corpus from their website.

Kyrgyz

2009

  • Number of stems: 4.1M
  • Coverage: 87.4

2010

  • Number of stems: 3.4M
  • Coverage: 88

Kazakh

2009

2010

  • Number of stems: 3.2M
  • Coverage: 85.4