Difference between revisions of "User:Firespeaker/GSoC2014"

Revision as of 05:25, 13 March 2014

vanilla transducers:

Increase apertium-uzb coverage to >90%
- expand morphology
- expand lexicon
Clean up apertium-tur, bring coverage to >90%
- fix some phonology
- clean up some morphotactics
- bring in line with apertium-kaz/etc.
Clean up apertium-kir, bring coverage to >90%
- improve morphotactics
- bring in line with apertium-kaz/etc.

hard forms:

Keep lists of difficult-to-classify forms and take a shot at them periodically with concordancer

trimmed transducers:

especially in need of attention:

model basic transfer4 grammar for each language (with remapping rules to the other languages)
- Get Turkish relative "ki" to Kyrgyz relative clauses working
- Get transfer working for both directions

Make an effort into getting clean testvoc for kaz-kir (both directions) and tur-kir (mainly tur→kir)
See if it'll be possible to get a clean testvoc for tur-uzb (take a stab at it a few times)

Get better corpus for Uzbek
Run transducers against corpora and add most frequently missing stems and any morphology
Keep regression test corpus
Run frequent WER tests and tweak grammars/dixes so that the texts consistently have <10% WER
Try as much as possible to work on everything in parallel, but have goals defined in series
Document tur-uzb better on the wiki
testvoc various categories for various translation directions regularly

@@ Line 1: / Line 1: @@
 {{TOCD}}
 '''Turkic pairs from nursery to release quality'''
-* [[User:Firespeaker/GSoC2014/Workplan|Workplan]]
 * [[User:Firespeaker/GSoC2014/Application draft|Application draft]]
+* [[User:Firespeaker/GSoC2014/Workplan|Workplan]]
+* [[User:Firespeaker/GSoC2014/Progress|Progress]]
 == Current status ==
 === Bidixes ===