Apertium-kaz

Installation

Apertium-kaz is currently located in incubator/apertium-kaz.

Dependency tree

hfst (svn ≥r1916)
- foma
  - flex
apertium
- lttoolbox
VISL-CG

Current State

{{#set_param_default | corpus1 | None }} {{#set_param_default | corpus2 | None }} {{#set_param_default | corpus3 | None }} {{#set_param_default | corpus4 | None }} {{#set_param_default | corpus5 | None }} {{#set_param_default | corpus6 | None }} {{#set_param_default | corpus7 | None }} {{#set_param_default | corpus8 | None }} {{#set_param_default | corpus9 | None }} {{#set_param_default | corpus10 | None }}

Number of stems: 36,595 {{#ifneq | | | () }}
Disambiguation rules: 150
Coverage: ~94.5%

{{#ifneq | Әуезов | None |

}}

{{#ifneq | bible | None |

}}

{{#ifneq | azattyq2010 | None |

}}

{{#ifneq | wp2011 | None |

}}

{{#ifneq | quran | None |

}}

{{#ifneq | {{{corpus6}}} | None |

}}

{{#ifneq | {{{corpus7}}} | None |

}}

{{#ifneq | {{{corpus8}}} | None |

}}

{{#ifneq | {{{corpus9}}} | None |

}}

{{#ifneq | {{{corpus10}}} | None |

}}

corpus	words	coverage
<nowinter>Әуезов</nowinter>	Әуезов	155K	~92.89%
<nowinter>[[\|bible]]</nowinter>	bible	577K	~95.29%
<nowinter>azattyq2010</nowinter>	azattyq2010	3.2M	~95.07%
<nowinter>[[\|wp2011]]</nowinter>	wp2011	850K	~90.72%
<nowinter>[[\|quran]]</nowinter>	quran	107K	~96.71%
<nowinter>[[\|{{{corpus6}}}]]</nowinter>	{{{corpus6}}}		~%
<nowinter>[[\|{{{corpus7}}}]]</nowinter>	{{{corpus7}}}		~%
<nowinter>[[\|{{{corpus8}}}]]</nowinter>	{{{corpus8}}}		~%
<nowinter>[[\|{{{corpus9}}}]]</nowinter>	{{{corpus9}}}		~%
<nowinter>[[\|{{{corpus10}}}]]</nowinter>	{{{corpus10}}}		~%

Developers

We have several language pairs involving Kazakh, and in every pair there is a lexc, twol and rlx file for this language. But we don't work on these files directly. Instead, we edit kaz.lexc, kaz.twol and kaz.rlx files located in apertium-kaz, and then import these files to the language pair directories using a script. This script merely copies twol and rlx files, since they don't have to be tweaked to a particular language pair, but "trimms" the lexc file leaving only that stems in it, which are also found in the bilingual dictionary of the pair importing is made to.

For further details on how these works and a step-by-step guide (taking the Kazakh-Tatar pair as an example), see Kazakh_and_Tatar#Development_workflow.

Apertium-kaz

Contents

Installation

Dependency tree

Current State

Developers

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools