Difference between revisions of "Turkic languages"

Revision as of 19:19, 10 December 2011

Status

Once a transducer has ~80% coverage on a range of corpora we can say it is "working". Over 90% and it can be considered to be "production".

Transducers

name	Language	ISO 639		formalism	state	stems	coverage			location	primary authors
name	Language	-2	-3	formalism	state	stems	corpus	words	%cov	location	primary authors
trmorph	Turkish	`tr`	`tur`	SFST	working		SETimes	4.1M	~88%		Çağri
kymorph	Kyrgyz	`ky`	`kir`	HFST (lexc+twol)	working	8,555	azattyk 2010	3.4M	~87%	trunk/apertium-tr-ky	Jonathan, Mirlan, Fran
turmorph	Turkish	`tr`	`tur`	HFST (lexc+twol)	development	18,227	SETimes	4.1M	~72%		Gianluca
kazmorph	Kazakh	`kk`	`kaz`	HFST (lexc+twol)	development	2,018	Әуезов	147.5K	~75.5%	incubator/apertium-ky-kk	Nathan, Jonathan, Fran
kazmorph	Kazakh	`kk`	`kaz`	HFST (lexc+twol)	development	2,018	wp 2011-11	0.84M	~59%	incubator/apertium-ky-kk	Nathan, Jonathan, Fran
	Chuvash	`cv`	`cuv`?	HFST (lexc+twol)	development	88		88.8K	~30%	incubator/apertium-cv-ru	Hèctor
	Tatar	`tt`	`tat`						-
azmorph	Azerbaijani	`az`	`aze`	SFST	working?				-	trunk/apertium-tr-az	Gianluca

Turkic-Turkic pairs

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in trunk, while text in bold denotes a stable well-working language pair.

	tr	az	tk	uz	ky	kk	tt	cv	ba	ug
tr	—	tr-az			tr-ky			tr-cv
az	az-tr	—
tk			—
uz				—
ky	ky-tr				—		ky-kk
kk						—	kk-tt
tt						tt-kk	—		tt-ba
cv	cv-tr							—
ba									—
ug										—

Pairs with non-Turkic languages

	tr	ky	kk	cv
en	tr-en	ky-en
fr
es
it
ru				cv-ru
mn			mn-kk

Tagset

Rough guide to tagsets in various Turkic language transducers, with an eye to keeping stuff that is basically the same tagged the same. In the following table, ^A stands for Apertium and ^T stands for TRmorph.

Phenomenon	Morphology	Description	Tag(s)	Language(s)
Case
Ablative case	-DAn	Case indicating movement away	`<abl>`	Pan-turkic
Tense, aspect, mood
Imperative	-ø	Mood for giving orders	`<imp>`^A, `<t_imp>`^T	Pan-turkic

@@ Line 41: / Line 41: @@
 |rowspan=2| HFST (lexc+twol)
 |rowspan=2| development
-|rowspan=2| 2,007
+|rowspan=2| 2,018
 |[[Әуезов corpus|Әуезов]]
 |147.5K
-| ~74.5%
+| ~75.5%
 |rowspan=2| incubator/apertium-[[ky-kk]]
 |rowspan=2| [[User:nathan0n5ire|Nathan]], [[User:Firespeaker|Jonathan]], [[User:Francis Tyers|Fran]]

Difference between revisions of "Turkic languages"

Revision as of 19:19, 10 December 2011

Contents

Status

Transducers

Turkic-Turkic pairs

Pairs with non-Turkic languages

Tagset

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools