Difference between revisions of "Turkish and Kyrgyz/Kymorph article"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
|||
Line 14: | Line 14: | ||
== Numbers == |
== Numbers == |
||
{|class="wikitable" |
|||
|+ size of corpora |
|||
|- |
|||
| |
|||
! wikipedia |
|||
! azattyk |
|||
|- |
|||
! num words |
|||
| 271005 |
|||
| |
|||
|- |
|||
! xml file size |
|||
| >3.8MB |
|||
| |
|||
|} |
Revision as of 07:29, 5 October 2011
Outline
Morphotactica
Morphophonologia
Corpora
- Which corpora to use?
- Wikipedia
- Azattyk
- concerns
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
- Use aq-wikicrp, this way it is reproducible .
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
Numbers
wikipedia | azattyk | |
---|---|---|
num words | 271005 | |
xml file size | >3.8MB |