Difference between revisions of "Turkish and Kyrgyz/Kymorph article"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
Firespeaker (talk | contribs)   | 
				|||
| Line 14: | Line 14: | ||
== Numbers ==  | 
  == Numbers ==  | 
||
{|class="wikitable"  | 
|||
|+ size of corpora  | 
|||
|-  | 
|||
|  | 
|||
! wikipedia  | 
|||
! azattyk  | 
|||
|-  | 
|||
! num words  | 
|||
| 271005  | 
|||
|  | 
|||
|-  | 
|||
! xml file size  | 
|||
| >3.8MB  | 
|||
|  | 
|||
|}  | 
|||
Revision as of 07:29, 5 October 2011
Outline
Morphotactica
Morphophonologia
Corpora
- Which corpora to use?
- Wikipedia
 - Azattyk
 
 - concerns
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
- Use aq-wikicrp, this way it is reproducible .
 
 
 - Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
 
Numbers
| wikipedia | azattyk | |
|---|---|---|
| num words | 271005 | |
| xml file size | >3.8MB |