Difference between revisions of "Apertium-kaz"

From Apertium
Jump to navigation Jump to search
(changed to use Labeled Section Transclusion)
Line 15: Line 15:
   
 
== Current State ==
 
== Current State ==
* Number of stems: {{:Apertium-kaz/stems}}
+
* Number of stems: {{#lst:Apertium-kaz/stats|stems}}
* Coverage: {{:Apertium-kaz/coverage/average}}
+
* Coverage: ~{{:Apertium-kaz/coverage/average}}%
   
 
{| class="wikitable"
 
{| class="wikitable"
Line 23: Line 23:
 
|-
 
|-
 
|[[Әуезов corpus|Әуезов]]
 
|[[Әуезов corpus|Әуезов]]
|align="right"|155K
+
|align="right"| {{#lst:Apertium-kaz/stats|Әуезов-words}}
| ~{{:Apertium-kaz/coverage/Әуезов}}%
+
| ~{{#lst:Apertium-kaz/stats|Әуезов-coverage}}%
 
|-
 
|-
 
| bible
 
| bible
|align="right"| {{:bible corpora/kk/stems}}
+
|align="right"| {{#lst:Apertium-kaz/stats|bible-words}}
| ~{{:Apertium-kaz/coverage/bible}}%
+
| ~{{#lst:Apertium-kaz/stats|bible-coverage}}%
 
|-
 
|-
 
| [[RFERL corpora|azattyq]] 2010
 
| [[RFERL corpora|azattyq]] 2010
|align="right"| {{:RFERL corpus/kk/2010/stems}}
+
|align="right"| {{#lst:Apertium-kaz/stats|azattyq2010-words}}
| ~{{:Apertium-kaz/coverage/rferl2010}}%
+
| ~{{#lst:Apertium-kaz/stats|azattyq2010-coverage}}%
 
|-
 
|-
 
|wp 2011-11
 
|wp 2011-11
|align="right"| 0.84M
+
|align="right"| {{#lst:Apertium-kaz/stats|wp2011-words}}
| ~{{:Apertium-kaz/coverage/wp}}%
+
| ~{{#lst:Apertium-kaz/stats|wp2011-coverage}}%
 
|-
 
|-
 
| quran
 
| quran
|align="right"| 107K
+
|align="right"| {{#lst:Apertium-kaz/stats|quran-words}}
| ~{{:Apertium-kaz/coverage/quran}}%
+
| ~{{#lst:Apertium-kaz/stats|quran-coverage}}%
 
|-
 
|-
 
|}
 
|}

Revision as of 18:15, 6 January 2013

Kazmorph is a morphological analyser/generator for Kazakh, currently under development. It is intended to be compatible with transducers for other Turkic languages so that they can be translated between.

Installation

kazmorph is currently located in incubator/apertium-kaz.

Dependency tree

  • hfst (svn ≥r1916)
    • foma
      • flex
  • apertium
    • lttoolbox

Current State

  • Number of stems: 36,595
  • Coverage: ~94.1%
corpus words coverage
Әуезов 155K ~92.89%
bible 577K ~95.29%
azattyq 2010 3.2M ~95.07%
wp 2011-11 850K ~90.72%
quran 107K ~96.71%

To-do

Improve coverage

  • Causitives
  • collective numbers
  • fix demonstratives
  • vowel harmony of single-syllable words with у and и

Future

  • run tests on morphophonology