User:Firespeaker/GSoC2014

From Apertium

< User:Firespeaker

Revision as of 19:02, 23 January 2014 by Firespeaker (talk | contribs) (Created page with "== Current status == === Pairs === * kaz-kir ({{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} stems) * tur-kir ({{#lst:Apertium-tur-kir/stats|tur-kir-stems}} stems) * [[...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Contents

1 Current status
2 To-do list
3 Plan of attack

Current status

Pairs

kaz-kir ( stems)
tur-kir ( stems)
tur-uzb ( stems)

Transducers

apertium-kaz - (~94.5% coverage, 36,595 stems) - production
apertium-kir - (~90.4% coverage, 14,424 stems) - working
apertium-tur - (~87.3% coverage, 17,221 stems) - working
apertium-uzb - (~82.9% coverage, 34,470 stems) - development

CG, lrx

To-do list

morphological transducer work

vanilla transducers:

Increase apertium-uzb coverage to >90%
- expand morphology
- expand lexicon
Clean up apertium-tur, bring coverage to >90%
- fix some phonology
- clean up some morphotactics
- bring in line with apertium-kaz/etc.
Clean up apertium-kir, bring coverage to >90%
- improve morphotactics
- bring in line with apertium-kaz/etc.

hard forms:

Keep lists of difficult-to-classify forms and take a shot at them periodically with concordancer

trimmed transducers:

bring trimmed coverage to approaching 90% for each transducer

CG and lrx work

especially in need of attention:

Apertium-uzb
Apertium-kir

Grammar stuff

model basic transfer4 grammar for each language (with remapping rules to the other languages)
- Get Turkish relative "ki" to Kyrgyz relative clauses working
- Get transfer working for both directions

Plan of attack

Get better corpus for Uzbek
Run transducers against corpora and add most frequently missing stems and any morphology
Keep regression test corpus
Run frequent WER tests and tweak grammars/dixes so that the texts consistently have <10% WER
Try as much as possible to work on everything in parallel, but have goals defined in series

Retrieved from "https://wiki.apertium.org/w/index.php?title=User:Firespeaker/GSoC2014&oldid=46464"