Apertium-uzb

Installation

Apertium-uzb is currently located in languages/apertium-uzb.

Developers

Dealing with Cyrillic and Latin

Plan A

There will be two separate lexcs and twols (.lat and .cyr) with the continuation lexica and rules and all, though you may be able to get by with one twol considering how simple things are. There will also be a master .dix, in Latin, with comments in a standarised format in Cyrillic (also possible the other way around).

There will also be a simple script to check for dix entries without Cyrillic comments in the standard format in the master .dix, and automatically generate them, updating the Cyrillic dix, outputting "TOCHECK" or something in a comment with the converted words. Someone then goes through and checks anything with "TOCHECK", and fixes / gets rid of "TOCHECK".

This is how we can trivially "convert" the dix to Cyrillic, and even convert the stems in lexc when we copy/update it from -uzb.

Plan B

The Cyrillic lexc and dix will be generated from the Latin-script ones.

A script will take all the stems from dix and automatically convert them to Cyrillic, updating a three-column text-file database (Latin Cyrillic Checked). The Checked column will have two states: TOCHECK, GOOD. This will allow a checker to fix the output of the conversion script for corner cases (mostly Russian words).

Another script will then generate a Cyrillic version of dix and lexc from the Latin-script versions, using the above mentioned database.

Uzbek - o'zbek tili, ўзбек тили
language transducer
Coverage:	~82.9%
Stems:	34,470
Vanilla stems:	34,465
Paradigms:	1
Location:
Families:	Turkic languages
Areas:	Languages of Central Asia, Languages of the former Soviet Union
Lang info	Uzbek

Apertium-uzb

Contents

Installation

Developers

Dealing with Cyrillic and Latin

Plan A

Plan B

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools