Difference between revisions of "RFERL corpora"

From Apertium
Jump to navigation Jump to search
m
Line 9: Line 9:
 
=== 2009 ===
 
=== 2009 ===
 
* Number of stems: {{:RFERL corpus/ky/2009/stems}}
 
* Number of stems: {{:RFERL corpus/ky/2009/stems}}
* Coverage: {{:Kymorph/coverage/rferl2009}}
+
* Coverage: ~{{:Kymorph/coverage/rferl2009}}%
   
 
=== 2010 ===
 
=== 2010 ===
 
* Number of stems: {{:RFERL corpus/ky/2010/stems}}
 
* Number of stems: {{:RFERL corpus/ky/2010/stems}}
* Coverage: {{:Kymorph/coverage/rferl2010}}
+
* Coverage: ~{{:Kymorph/coverage/rferl2010}}%
   
 
== Kazakh ==
 
== Kazakh ==
Line 20: Line 20:
 
=== 2009 ===
 
=== 2009 ===
 
* Number of stems: {{:RFERL corpus/kk/2009/stems}}
 
* Number of stems: {{:RFERL corpus/kk/2009/stems}}
* Coverage: {{:Kazmorph/coverage/rferl2009}}
+
* Coverage: ~{{:Kazmorph/coverage/rferl2009}}%
   
 
=== 2010 ===
 
=== 2010 ===
 
* Number of stems: {{:RFERL corpus/kk/2010/stems}}
 
* Number of stems: {{:RFERL corpus/kk/2010/stems}}
* Coverage: {{:Kazmorph/coverage/rferl2010}}
+
* Coverage: ~{{:Kazmorph/coverage/rferl2010}}%

Revision as of 17:33, 4 January 2012

Radio Free Europe / Radio Liberty runs news services in a number of Central Asian languages. The information is essentially free for public use with attribution.

need link to usage info

We discovered how a corpus could be built from their website, and have begun to build a few. Currently we have corpora for Kazakh and Kyrgyz, covering only a couple years' worth of articles.

Kyrgyz

2009

  • Number of stems: 4.1M
  • Coverage: ~87.4%

2010

  • Number of stems: 3.4M
  • Coverage: ~88%

Kazakh

2009

2010

  • Number of stems: 3.2M
  • Coverage: ~85.4%