Apertium-kaz

From Apertium
Revision as of 16:07, 22 February 2013 by Firespeaker (talk | contribs)
Jump to navigation Jump to search

Kazmorph is a morphological analyser/generator for Kazakh, currently under development. It is intended to be compatible with transducers for other Turkic languages so that they can be translated between. It's used in the following language pairs:

Installation

kazmorph is currently located in incubator/apertium-kaz.

Dependency tree

  • hfst (svn ≥r1916)
    • foma
      • flex
  • apertium
    • lttoolbox

Current State

{{#set_param_default | corpus1 | None }} {{#set_param_default | corpus2 | None }} {{#set_param_default | corpus3 | None }} {{#set_param_default | corpus4 | None }} {{#set_param_default | corpus5 | None }} {{#set_param_default | corpus6 | None }} {{#set_param_default | corpus7 | None }} {{#set_param_default | corpus8 | None }} {{#set_param_default | corpus9 | None }} {{#set_param_default | corpus10 | None }}

  • Number of stems: 36,595 {{#ifneq | | | () }}
  • Disambiguation rules: 150
  • Coverage: ~94.5%

{{#ifneq | Әуезов | None |

{{#ifneq | Әуезов corpus | | | }}

}}

{{#ifneq | bible | None |

{{#ifneq | | | | }}

}}

{{#ifneq | azattyq2010 | None |

{{#ifneq | RFERL_corpora | | | }}

}}

{{#ifneq | wp2011 | None |

{{#ifneq | | | | }}

}}

{{#ifneq | quran | None |

{{#ifneq | | | | }}

}}

{{#ifneq | {{{corpus6}}} | None |

{{#ifneq | | | | }}

}}

{{#ifneq | {{{corpus7}}} | None |

{{#ifneq | | | | }}

}}

{{#ifneq | {{{corpus8}}} | None |

{{#ifneq | | | | }}

}}

{{#ifneq | {{{corpus9}}} | None |

{{#ifneq | | | | }}

}}

{{#ifneq | {{{corpus10}}} | None |

{{#ifneq | | | | }}

}}

corpuswordscoverage
<nowinter>Әуезов</nowinter>Әуезов155K ~92.89%
<nowinter>[[|bible]]</nowinter>bible577K ~95.29%
<nowinter>azattyq2010</nowinter>azattyq20103.2M ~95.07%
<nowinter>[[|wp2011]]</nowinter>wp2011850K ~90.72%
<nowinter>[[|quran]]</nowinter>quran107K ~96.71%
<nowinter>[[|{{{corpus6}}}]]</nowinter>{{{corpus6}}} ~%
<nowinter>[[|{{{corpus7}}}]]</nowinter>{{{corpus7}}} ~%
<nowinter>[[|{{{corpus8}}}]]</nowinter>{{{corpus8}}} ~%
<nowinter>[[|{{{corpus9}}}]]</nowinter>{{{corpus9}}} ~%
<nowinter>[[|{{{corpus10}}}]]</nowinter>{{{corpus10}}} ~%

To-do

Improve coverage

  • Causitives
  • collective numbers
  • fix demonstratives
  • vowel harmony of single-syllable words with у and и

Future

  • run tests on morphophonology