Difference between revisions of "Turkish and Kyrgyz/Kymorph article"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
== Outline ==
 
== Outline ==
  +
  +
== General background ==
  +
* Submitting abstract to: [http://www.lrec-conf.org/lrec2012/ LREC 2012 Istanbul]
  +
* Deadline: October 15, 2011
  +
  +
== Similar articles ==
  +
* [http://www.mt-archive.info/FreeRBMT-2009-Faridee.pdf Abu Zaher Md. Faridee & Francis M. Tyers - Development of a morphological analyser for Bengali]
  +
* [http://www.let.rug.nl/coltekin/papers/coltekin-lrec2010.pdf Çagrı Çöltekin - A Freely Available Morphological Analyzer for Turkish]
   
 
== Morphotactica ==
 
== Morphotactica ==
  +
* Irregular negatives of many verb forms
   
 
== Morphophonologia ==
 
== Morphophonologia ==
  +
* /рн/ nouns
   
 
== Corpora ==
 
== Corpora ==

Revision as of 18:18, 6 October 2011

Outline

General background

Similar articles

Morphotactica

  • Irregular negatives of many verb forms

Morphophonologia

  • /рн/ nouns

Corpora

  • Which corpora to use?
    • Wikipedia
      1. punktgen.py ky.crp.txt ky.pickle
      2. aq-wikicrp -x -t ky.pickle kywiki-20110923-pages-articles.xml kywp.xml
    • Azattyk
  • concerns
    • Wikipedia is messy; should we have an automated cleaning process or get stats as-is?
      • Use aq-wikicrp, this way it is reproducible .

Numbers

size of corpora
wikipedia azattyk
num words 271005
xml file size >3.8MB