Apertium-kaz-kir/TODO

From Apertium

< Apertium-kaz-kir

Revision as of 04:02, 23 July 2013 by Firespeaker (talk | contribs) (→‎By midterm)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

By midterm

primary goals:
- total 6500 stems in dix
- azattyq_24455849 WER ≤8%
- trimmed coverage ≥72%
- clean testvoc for the following categories:
  - <postadv> <ij>
  - <num> <post>
  - <cnjcoo> <cnjadv> <cnjsub>
  - <adv>

By 22 July

Add another 1000 words
Finish WER process for texts/azattyq_24455849.txt
Work with JNW on testvoc for closed categories.

By 14 July

add 800 stems
mostly nouns, verbs, adjectives (i.e., simple categories)
- 100 top stems from wikipedia corpus
- 100 top stems from rferl/azattyq corpus
- 100 top stems from bible corpus
- 100 top stems from quran corpus
- any 400 words marked i="yes" in dix
  - sort these into their appropriate sections
  - fix the Kyrgyz translation when needed (many will need to be fixed)
  - remove i="yes" part

Start work on WER for texts/azattyq_24455849.txt
- Use kaz-kir and output to texts/azattyq_24455849.kaz-kir.txt
- Add words/etc. to transducer needed until there are no */#/@
- Copy to texts/azattyq_24455849.kaz-kir-postedited.txt
- Post-edit until the postedited Kyrgyz is clean
- Add lexical selection rules and transfer rules as needed
- Goal: get WER down to around 10%

Fix the following minor problems:
- words should not be entered with different capitalisation:
  - құран=куран / Құран=Куран (remove one of them)
  - пайғамбар=пайгамбар / Пайғамбар=Пайгамбар (remove one of them)
- "шәксіз" is not a Kyrgyz word
- there's an issue with -ақ; I think we'll need to work on it together

Retrieved from "https://wiki.apertium.org/w/index.php?title=Apertium-kaz-kir/TODO&oldid=42858"