Difference between revisions of "User:Firespeaker/GSoC2014"

Revision as of 19:20, 23 January 2014

vanilla transducers:

Increase apertium-uzb coverage to >90%
- expand morphology
- expand lexicon
Clean up apertium-tur, bring coverage to >90%
- fix some phonology
- clean up some morphotactics
- bring in line with apertium-kaz/etc.
Clean up apertium-kir, bring coverage to >90%
- improve morphotactics
- bring in line with apertium-kaz/etc.

hard forms:

Keep lists of difficult-to-classify forms and take a shot at them periodically with concordancer

trimmed transducers:

especially in need of attention:

model basic transfer4 grammar for each language (with remapping rules to the other languages)
- Get Turkish relative "ki" to Kyrgyz relative clauses working
- Get transfer working for both directions

Get better corpus for Uzbek
Run transducers against corpora and add most frequently missing stems and any morphology
Keep regression test corpus
Run frequent WER tests and tweak grammars/dixes so that the texts consistently have <10% WER
Try as much as possible to work on everything in parallel, but have goals defined in series

@@ Line 12: / Line 12: @@
 === CG, lrx ===
+* We should start keeping track of number of lrx rules.
+* We could quantify CG progress with per-token ambiguity measures across coprora?
 == To-do list ==