Difference between revisions of "Apertium-kaz"
Firespeaker (talk | contribs) m |
m (This TODO got obsolete) |
||
Line 19: | Line 19: | ||
== Current State == |
== Current State == |
||
{{LangStats | lang = kaz | corpus1 = Әуезов | corpus2 = bible | corpus3 = azattyq2010 | corpus4 = wp2011 | corpus5 = quran}} |
{{LangStats | lang = kaz | corpus1 = Әуезов | corpus2 = bible | corpus3 = azattyq2010 | corpus4 = wp2011 | corpus5 = quran}} |
||
== To-do == |
|||
=== Improve coverage === |
|||
* Causitives |
|||
* collective numbers |
|||
* fix demonstratives |
|||
* vowel harmony of single-syllable words with у and и |
|||
=== Future === |
|||
* run tests on morphophonology |
Revision as of 19:18, 9 August 2013
Apertium-kaz is a morphological analyser/generator for Kazakh, currently under development. It is intended to be compatible with transducers for other Turkic languages so that they can be translated between. It's used in the following language pairs:
Installation
apertium-kaz is currently located in incubator/apertium-kaz.
Dependency tree
- hfst (svn ≥r1916)
- foma
- flex
- foma
- apertium
- lttoolbox
Current State
{{#set_param_default | corpus1 | None }} {{#set_param_default | corpus2 | None }} {{#set_param_default | corpus3 | None }} {{#set_param_default | corpus4 | None }} {{#set_param_default | corpus5 | None }} {{#set_param_default | corpus6 | None }} {{#set_param_default | corpus7 | None }} {{#set_param_default | corpus8 | None }} {{#set_param_default | corpus9 | None }} {{#set_param_default | corpus10 | None }}
- Number of stems: 36,595 {{#ifneq | | | () }}
- Disambiguation rules: 150
- Coverage: ~94.5%
{{#ifneq | Әуезов | None |
{{#ifneq | Әуезов corpus | | | }}}}
{{#ifneq | bible | None |
{{#ifneq | | | | }}}}
{{#ifneq | azattyq2010 | None |
{{#ifneq | RFERL_corpora | | | }}}}
{{#ifneq | wp2011 | None |
{{#ifneq | | | | }}}}
{{#ifneq | quran | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus6}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus7}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus8}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus9}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus10}}} | None |
{{#ifneq | | | | }}}}
corpus | words | coverage | |
---|---|---|---|
<nowinter>Әуезов</nowinter> | Әуезов | 155K | ~92.89% |
<nowinter>[[|bible]]</nowinter> | bible | 577K | ~95.29% |
<nowinter>azattyq2010</nowinter> | azattyq2010 | 3.2M | ~95.07% |
<nowinter>[[|wp2011]]</nowinter> | wp2011 | 850K | ~90.72% |
<nowinter>[[|quran]]</nowinter> | quran | 107K | ~96.71% |
<nowinter>[[|{{{corpus6}}}]]</nowinter> | {{{corpus6}}} | ~% | |
<nowinter>[[|{{{corpus7}}}]]</nowinter> | {{{corpus7}}} | ~% | |
<nowinter>[[|{{{corpus8}}}]]</nowinter> | {{{corpus8}}} | ~% | |
<nowinter>[[|{{{corpus9}}}]]</nowinter> | {{{corpus9}}} | ~% | |
<nowinter>[[|{{{corpus10}}}]]</nowinter> | {{{corpus10}}} | ~% |