Turkish and Kyrgyz/Kymorph article
< Turkish and Kyrgyz
Jump to navigation
Jump to search
Revision as of 16:50, 13 October 2011 by Firespeaker (talk | contribs) (Undo revision 28838 by Firespeaker (Talk))
Contents
Outline[edit]
General background[edit]
- Submitting abstract to: LREC 2012 Istanbul
- Deadline: October 15, 2011
Similar articles[edit]
- Abu Zaher Md. Faridee & Francis M. Tyers - Development of a morphological analyser for Bengali
- Çagrı Çöltekin - A Freely Available Morphological Analyzer for Turkish
Morphotactica[edit]
- Irregular negatives of many verb forms
Morphophonologia[edit]
- /рн/ nouns
Corpora[edit]
- Which corpora to use?
- Wikipedia
- punktgen.py ky.crp.txt ky.pickle
- aq-wikicrp -x -t ky.pickle kywiki-20110923-pages-articles.xml kywp.xml
- Azattyk
- Wikipedia
- concerns
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
- Use aq-wikicrp, this way it is reproducible .
- Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
Numbers[edit]
wikipedia | azattyk 2010 | all azattyk | |
---|---|---|---|
num articles | 1531(?, ?) | 9803 (6627?) | |
num words | 271005 | 3394686 | |
xml file size | 3.8MB | 49MB |