Difference between revisions of "Turkish and Kyrgyz/Kymorph article"
		
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
		
		
		
		
		
	
Firespeaker (talk | contribs)  | 
				Firespeaker (talk | contribs)   | 
				||
| Line 41: | Line 41: | ||
! num words  | 
  ! num words  | 
||
| 271005  | 
  | 271005  | 
||
|   | 
  | 3394686  | 
||
|  | 
  |  | 
||
|-  | 
  |-  | 
||
Revision as of 14:27, 7 October 2011
Contents
Outline
General background
- Submitting abstract to: LREC 2012 Istanbul
 - Deadline: October 15, 2011
 
Similar articles
- Abu Zaher Md. Faridee & Francis M. Tyers - Development of a morphological analyser for Bengali
 - Çagrı Çöltekin - A Freely Available Morphological Analyzer for Turkish
 
Morphotactica
- Irregular negatives of many verb forms
 
Morphophonologia
- /рн/ nouns
 
Corpora
- Which corpora to use?
- Wikipedia
- punktgen.py ky.crp.txt ky.pickle
 - aq-wikicrp -x -t ky.pickle kywiki-20110923-pages-articles.xml kywp.xml
 
 - Azattyk
 
 - Wikipedia
 - concerns
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
- Use aq-wikicrp, this way it is reproducible .
 
 
 - Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
 
Numbers
| wikipedia | azattyk 2010 | all azattyk | |
|---|---|---|---|
| num articles | 1531(?, ?) | 9803 (6627?) | |
| num words | 271005 | 3394686 | |
| xml file size | 3.8MB | 49MB |