Apertium-kaz

Installation

Apertium-kaz is currently located in languages/apertium-kaz.

Dependency tree

hfst (svn ≥r1916)
apertium
- lttoolbox
VISL-CG3

For spell checking

If you're compiling the apertium-kaz spell checker, you'll additionally need these dependencies:

hfst-ospell (./configure --enable-zhfst)
- see Installation, it is installable from Tino's repositories
corevoikko/libvoikko/src/tools/voikkospell (./configure --enable-hfst)

You'll want to configure apertium-kaz with --enable-ospell and then after making it, copy kaz.zhfst to ~/.voikko/3/kk.zhfst

Then you can do this:

$ echo "қазақша билмеймін" | sed 's/ /\n/' | voikkospell -d kk -s
C: қазақша
W: билмеймін
S: билеймін
S: білмеймін
S: билемеймін
S: бөлмеймін
S: билемейміз

Current State

{{#set_param_default | corpus1 | None }} {{#set_param_default | corpus2 | None }} {{#set_param_default | corpus3 | None }} {{#set_param_default | corpus4 | None }} {{#set_param_default | corpus5 | None }} {{#set_param_default | corpus6 | None }} {{#set_param_default | corpus7 | None }} {{#set_param_default | corpus8 | None }} {{#set_param_default | corpus9 | None }} {{#set_param_default | corpus10 | None }}

Number of stems: 36,595 {{#ifneq | | | () }}
Disambiguation rules: 150
Coverage: ~94.5%

{{#ifneq | Әуезов | None |

}}

{{#ifneq | bible | None |

}}

{{#ifneq | azattyq2010 | None |

}}

{{#ifneq | wp2013 | None |

}}

{{#ifneq | quran | None |

}}

{{#ifneq | udhr | None |

}}

{{#ifneq | {{{corpus7}}} | None |

}}

{{#ifneq | {{{corpus8}}} | None |

}}

{{#ifneq | {{{corpus9}}} | None |

}}

{{#ifneq | {{{corpus10}}} | None |

}}

corpus	words	coverage
<nowinter>Әуезов</nowinter>	Әуезов	155K	~92.89%
<nowinter>[[\|bible]]</nowinter>	bible	577K	~95.29%
<nowinter>azattyq2010</nowinter>	azattyq2010	3.2M	~95.07%
<nowinter>[[\|wp2013]]</nowinter>	wp2013	18.2M	~90.10%
<nowinter>[[\|quran]]</nowinter>	quran	107K	~96.71%
<nowinter>udhr</nowinter>	udhr	1.5K	~96.86%
<nowinter>[[\|{{{corpus7}}}]]</nowinter>	{{{corpus7}}}		~%
<nowinter>[[\|{{{corpus8}}}]]</nowinter>	{{{corpus8}}}		~%
<nowinter>[[\|{{{corpus9}}}]]</nowinter>	{{{corpus9}}}		~%
<nowinter>[[\|{{{corpus10}}}]]</nowinter>	{{{corpus10}}}		~%

Developers

We have several language pairs involving Kazakh, and in every pair there is a lexc, twol and rlx file for this language. But we don't work on these files directly. Instead, we edit kaz.lexc, kaz.twol and kaz.rlx files located in apertium-kaz, and then import these files to the language pair directories using a script (update-morphs.bash in language-pair directories). This script merely copies twol and rlx files, since they don't have to be tweaked to a particular language pair, but "trimms" the lexc file leaving only that stems in it, which are also found in the bilingual dictionary of the pair importing is made to.

For further details on how these works and a step-by-step guide (taking the Kazakh-Tatar pair as an example), see Kazakh_and_Tatar#Development_workflow.

A natural consequence of this approach is that any change made to apertium-kaz will affect all other Kazakh-to-X or X-to-Kazakh language pairs. This requires some rules to be followed. Here are some of them.

Do not change tags or order of tags and don't do any other change wich certainly will break things in other pairs without discussing it first on the apertium-turkic mailing list (or notifying others via the list if you are absolutely sure that this change was necessary (e.g. it was discussed earlier and hence was due));
If you encounter a word in the lexc which seems to be miscategorized, or some multiword which in reality is a combination of two lexemes (like барлық жерде from the example above), do not delete them! Mark them with Use/MT instead.
Write descriptive commit messages.

Kazakh - қазақ тілі
language transducer
Coverage:	~94.5%
Stems:	36,595
Vanilla stems:	27,433
Paradigms:
Location:	apertium-kaz (languages)
Families:	Turkic languages
Areas:	Languages of Central Asia, Languages of the former Soviet Union
Lang info	Kazakh

Apertium-kaz

Contents

Installation

Dependency tree

For spell checking

Current State

Developers

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools