Difference between revisions of "Turkish and Kyrgyz/Kymorph article"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
|||
Line 14: | Line 14: | ||
== Numbers == |
== Numbers == |
||
+ | {|class="wikitable" |
||
+ | |+ size of corpora |
||
+ | |- |
||
+ | | |
||
+ | ! wikipedia |
||
+ | ! azattyk |
||
+ | |- |
||
+ | ! num words |
||
+ | | 271005 |
||
+ | | |
||
+ | |- |
||
+ | ! xml file size |
||
+ | | >3.8MB |
||
+ | | |
||
+ | |} |
Revision as of 07:29, 5 October 2011
Outline
Morphotactica
Morphophonologia
Corpora
- Which corpora to use?
- Wikipedia
- Azattyk
- concerns
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
- Use aq-wikicrp, this way it is reproducible .
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
Numbers
wikipedia | azattyk | |
---|---|---|
num words | 271005 | |
xml file size | >3.8MB |