Difference between revisions of "RFERL corpora"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
Radio Free Europe / Radio Liberty runs news services in a number of Central Asian languages. The information is essentially free for public use with attribution. {{comment|link to usage info}}
+
Radio Free Europe / Radio Liberty runs news services in a number of Central Asian languages. The information is essentially free for public use with attribution. {{comment|need link to usage info}}
   
Recently we discovered [[Turkish and Kyrgyz/Making a corpus from azattyk|how to build a corpus from their website]].
+
We discovered [[Turkish and Kyrgyz/Making a corpus from azattyk|how a corpus could be built from their website]], and have begun to build a few. Currently we have corpora for Kazakh and Kyrgyz, covering only a couple years' worth of articles.
   
 
== Kyrgyz ==
 
== Kyrgyz ==

Revision as of 17:32, 4 January 2012

Radio Free Europe / Radio Liberty runs news services in a number of Central Asian languages. The information is essentially free for public use with attribution.

need link to usage info

We discovered how a corpus could be built from their website, and have begun to build a few. Currently we have corpora for Kazakh and Kyrgyz, covering only a couple years' worth of articles.

Kyrgyz

2009

  • Number of stems: 4.1M
  • Coverage: 87.4

2010

  • Number of stems: 3.4M
  • Coverage: 88

Kazakh

2009

2010

  • Number of stems: 3.2M
  • Coverage: 85.4