Difference between revisions of "Apertium-kaz-kir/TODO"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) |
||
Line 26: | Line 26: | ||
** there's an issue with -ақ; I think we'll need to work on it together |
** there's an issue with -ақ; I think we'll need to work on it together |
||
== By |
== By 22 July == |
||
* Add another '''1000''' words |
* Add another '''1000''' words |
||
* Finish WER process for <tt>texts/azattyq_24455849.txt</tt> |
* Finish WER process for <tt>texts/azattyq_24455849.txt</tt> |
Revision as of 16:45, 13 July 2013
By 14 July
- add 800 stems
- mostly nouns, verbs, adjectives (i.e., simple categories)
- 100 top stems from wikipedia corpus
- 100 top stems from rferl/azattyq corpus
- 100 top stems from bible corpus
- 100 top stems from quran corpus
- any 400 words marked i="yes" in dix
- sort these into their appropriate sections
- fix the Kyrgyz translation when needed (many will need to be fixed)
- remove i="yes" part
- Start work on WER for texts/azattyq_24455849.txt
- Use kaz-kir and output to texts/azattyq_24455849.kaz-kir.txt
- Add words/etc. to transducer needed until there are no */#/@
- Copy to texts/azattyq_24455849.kaz-kir-postedited.txt
- Post-edit until the postedited Kyrgyz is clean
- Add lexical selection rules and transfer rules as needed
- Goal: get WER down to around 10%
- Fix the following minor problems:
- words should not be entered with different capitalisation:
- құран=куран / Құран=Куран (remove one of them)
- пайғамбар=пайгамбар / Пайғамбар=Пайгамбар (remove one of them)
- "шәксіз" is not a Kyrgyz word
- there's an issue with -ақ; I think we'll need to work on it together
- words should not be entered with different capitalisation:
By 22 July
- Add another 1000 words
- Finish WER process for texts/azattyq_24455849.txt
- Work with JNW on testvoc for closed categories.