RFERL corpora
From Apertium
Radio Free Europe / Radio Liberty runs news services in a number of Central Asian languages. The information is essentially free for public use with attribution.need link to usage info
We discovered how a corpus could be built from their website, and have begun to build a few. Currently we have corpora for Kazakh and Kyrgyz, covering only a couple years' worth of articles.
Contents |
[edit] Kyrgyz
- Site: azattyk.org
- Coverage with: kymorph
[edit] 2009
- Number of stems: 4.1M
- Coverage: ~87.4%
[edit] 2010
- Number of stems: 3.4M
- Coverage: ~88%
[edit] Kazakh
- Site: azattyq.org
[edit] 2009
- Number of stems: RFERL corpus/kk/2009/stems
- Coverage: ~Kazmorph/coverage/rferl2009%
[edit] 2010
- Number of stems: 3.2M
- Coverage: ~85.4%

