Difference between revisions of "Apertium-kaz"
Line 15: | Line 15: | ||
== Current State == |
== Current State == |
||
{{LangStats | lang = kaz | corpus1 = Әуезов | corpus2 = bible | corpus3 = azattyq2010 | corpus4 = wp2011 | corpus5 = quran}} |
|||
{{#set_param_default | corpus1 | None }} |
|||
{{#set_param_default | corpus2 | None }} |
|||
{{#set_param_default | corpus3 | None }} |
|||
{{#set_param_default | corpus4 | None }} |
|||
{{#set_param_default | corpus5 | None }} |
|||
{{#set_param_default | corpus6 | None }} |
|||
{{#set_param_default | corpus7 | None }} |
|||
{{#set_param_default | corpus8 | None }} |
|||
{{#set_param_default | corpus9 | None }} |
|||
{{#set_param_default | corpus10 | None }} |
|||
* Number of stems: {{#lst:Apertium-kaz/stats|stems}} |
|||
* Coverage: ~{{:Apertium-kaz/stats/average}}% |
|||
<table class="wikitable"> |
|||
<tr><th>corpus</th><th>words</th><th>coverage</th></tr> |
|||
{{#ifneq | Әуезов | None | |
|||
<tr><td>Әуезов</td> |
|||
<td>{{#lst:Apertium-kaz/stats|Әуезов-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|Әуезов-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | bible | None | |
|||
<tr><td>bible</td> |
|||
<td>{{#lst:Apertium-kaz/stats|bible-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|bible-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | azattyq2010 | None | |
|||
<tr><td>azattyq2010</td> |
|||
<td>{{#lst:Apertium-kaz/stats|azattyq2010-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|azattyq2010-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | wp2011 | None | |
|||
<tr><td>wp2011</td> |
|||
<td>{{#lst:Apertium-kaz/stats|wp2011-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|wp2011-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | quran | None | |
|||
<tr><td>quran</td> |
|||
<td>{{#lst:Apertium-kaz/stats|quran-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|quran-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | {{{corpus6}}} | None | |
|||
<tr><td>{{{corpus6}}}</td> |
|||
<td>{{#lst:Apertium-kaz/stats|{{{corpus6}}}-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|{{{corpus6}}}-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | {{{corpus7}}} | None | |
|||
<tr><td>{{{corpus7}}}</td> |
|||
<td>{{#lst:Apertium-kaz/stats|{{{corpus7}}}-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|{{{corpus7}}}-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | {{{corpus8}}} | None | |
|||
<tr><td>{{{corpus8}}}</td> |
|||
<td>{{#lst:Apertium-kaz/stats|{{{corpus8}}}-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|{{{corpus8}}}-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | {{{corpus9}}} | None | |
|||
<tr><td>{{{corpus9}}}</td> |
|||
<td>{{#lst:Apertium-kaz/stats|{{{corpus9}}}-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|{{{corpus9}}}-coverage}}%</td></tr> |
|||
}} |
|||
{{#ifneq | {{{corpus10}}} | None | |
|||
<tr><td>{{{corpus10}}}</td> |
|||
<td>{{#lst:Apertium-kaz/stats|{{{corpus10}}}-words}}</td> |
|||
<td>~{{#lst:Apertium-kaz/stats|{{{corpus10}}}-coverage}}%</td></tr> |
|||
}} |
|||
</table> |
|||
== To-do == |
== To-do == |
Revision as of 01:35, 9 January 2013
Kazmorph is a morphological analyser/generator for Kazakh, currently under development. It is intended to be compatible with transducers for other Turkic languages so that they can be translated between.
Installation
kazmorph is currently located in incubator/apertium-kaz.
Dependency tree
- hfst (svn ≥r1916)
- foma
- flex
- foma
- apertium
- lttoolbox
Current State
{{#set_param_default | corpus1 | None }} {{#set_param_default | corpus2 | None }} {{#set_param_default | corpus3 | None }} {{#set_param_default | corpus4 | None }} {{#set_param_default | corpus5 | None }} {{#set_param_default | corpus6 | None }} {{#set_param_default | corpus7 | None }} {{#set_param_default | corpus8 | None }} {{#set_param_default | corpus9 | None }} {{#set_param_default | corpus10 | None }}
- Number of stems: 36,595 {{#ifneq | | | () }}
- Disambiguation rules: 150
- Coverage: ~94.5%
{{#ifneq | Әуезов | None |
{{#ifneq | Әуезов corpus | | | }}}}
{{#ifneq | bible | None |
{{#ifneq | | | | }}}}
{{#ifneq | azattyq2010 | None |
{{#ifneq | RFERL_corpora | | | }}}}
{{#ifneq | wp2011 | None |
{{#ifneq | | | | }}}}
{{#ifneq | quran | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus6}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus7}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus8}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus9}}} | None |
{{#ifneq | | | | }}}}
{{#ifneq | {{{corpus10}}} | None |
{{#ifneq | | | | }}}}
corpus | words | coverage | |
---|---|---|---|
<nowinter>Әуезов</nowinter> | Әуезов | 155K | ~92.89% |
<nowinter>[[|bible]]</nowinter> | bible | 577K | ~95.29% |
<nowinter>azattyq2010</nowinter> | azattyq2010 | 3.2M | ~95.07% |
<nowinter>[[|wp2011]]</nowinter> | wp2011 | 850K | ~90.72% |
<nowinter>[[|quran]]</nowinter> | quran | 107K | ~96.71% |
<nowinter>[[|{{{corpus6}}}]]</nowinter> | {{{corpus6}}} | ~% | |
<nowinter>[[|{{{corpus7}}}]]</nowinter> | {{{corpus7}}} | ~% | |
<nowinter>[[|{{{corpus8}}}]]</nowinter> | {{{corpus8}}} | ~% | |
<nowinter>[[|{{{corpus9}}}]]</nowinter> | {{{corpus9}}} | ~% | |
<nowinter>[[|{{{corpus10}}}]]</nowinter> | {{{corpus10}}} | ~% |
To-do
Improve coverage
- Causitives
- collective numbers
- fix demonstratives
- vowel harmony of single-syllable words with у and и
Future
- run tests on morphophonology