Apertium-kaz-kir/TODO

From Apertium
< Apertium-kaz-kir
Revision as of 21:24, 8 July 2013 by Firespeaker (talk | contribs) (Created page with '== By 13 July == * '''add 800 stems''' *: ''mostly nouns, verbs, adjectives (i.e., simple categories)'' ** '''100''' top stems from wikipedia corpus ** '''100''' top stems from r…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

By 13 July

  • add 800 stems
    mostly nouns, verbs, adjectives (i.e., simple categories)
    • 100 top stems from wikipedia corpus
    • 100 top stems from rferl/azattyq corpus
    • 100 top stems from bible corpus
    • 100 top stems from quran corpus
    • any 400 words marked i="yes" in dix
      • sort these into their appropriate sections
      • fix the Kyrgyz translation if needed
  • Get WER on texts/azattyq_24455849.txt down to around 10%
  • Fix the following minor problems:
    • words should not be entered with different capitalisation:
      • құран=куран / Құран=Куран (remove one of them)
      • пайғамбар=пайгамбар / Пайғамбар=Пайгамбар (remove one of them)
    • "шәксіз" is not a Kyrgyz word
    • there's an issue with -ақ; I think we'll need to work on it together