Difference between revisions of "Talk:Turkic languages"

Latest revision as of 23:52, 20 June 2016

Classification[edit]

attributive attr = things that act like adjectives
predicative pred
substantive subst = things that act like nouns
adverbial advl = things that act like adverbs (??)

Hierarchy[edit]

noun (default 'subst') + DECL-NOUN
noun->adj = n.attr + DECL-ADJ !! No <comp>arison levels though

adj (default 'attr') + DECL-ADJ
adj->noun = adj.subst + DECL-NOUN

num (default 'attr') + DECL-NUM
num->noun = num.subst + DECL-NOUN

prn (default 'subst') + DECL-NOUN

det (default 'attr') + NO-DECL

v
v->noun = v.ger + DECL-NOUN

v->adj = v.glp + DECL-ADJ

v->adv = v.prc + DECL-ADV

Types of non-finite verbal forms:

Adverbial participle: <gnL> (e.g. <gnc> "Conditional adverbial participle")
Verbal adjectives: <gpL> (e.g. <gpi> "Imperfect verbal adjective")
Gerunds: <gerN> (e.g. <ger1> "Past/present gerund")
Participles: <prcN> (e.g. <prc1> "Realis participle")

What about 'cop' and 'pred'[edit]

The copula is i- (p.79)
- -(y) (pres)
- -(y)DI (past)
- -(y)mIş (evid)
- -(y)sA (cond)

Заметки разрешении морфологической неоднозначности[edit]

Arguments against just having a different tag:

 e.g. güzel<n>/güzel<adj>

We lose the tag denoting the principle function of the stem
We can't tell the CG to choose the principal function
We can't tell the difference between `real' N/A ambiguity and "derivation" ambiguity

Arguments against just piling one tag ontop of another:

 e.g. güzel<adj>/güzel<adj><n>

Having two POS in a word makes things confusing
Having two POS tags in a word makes it difficult to write CG rules

Arguments against having a "zero derivation":

 e.g. güzel<adj>/güzel<adj><D_n><n>

It's ugly and stupid
Having two POS tags in a word makes it difficult to write CG rules

Прилагательное[edit]

güzel            'beautiful'          güzel<adj>/güzel<adj><subst>/güzel<adj><advl>
güzelim          'my beauty'          güzel<adj><subst><px1sg>
güzel konuştu    'she spoke well'     güzel<adj>/güzel<adj><subst>/güzel<adj><advl>
güzel bir köpek  'a beautiful dog'    güzel<adj>/güzel<adj><subst>/güzel<adj><advl>

küçük            'small'              küçük<adj>/küçük<adj><subst>/küçük<adj><advl>
küçük kızlar     'little girls'       küçük<adj>/küçük<adj><subst>/küçük<adj><advl>
küçükler         'little one(s)'      küçük<adj><subst><pl>/küçük<n>+i<cop><pres><p3><pl>

kötü             'bad'                kötü<adj>/kötü<adj><subst>/kötü<adj><advl>
kötü araba       '(a) bad car'        kötü<adj>/kötü<adj><subst>/kötü<adj><advl>
kötü yüzmek      'to swim badly'      kötü<adj>/kötü<adj><subst>/kötü<adj><advl>

Наречии[edit]

şimde            'now'                şimde<adv>
şimdelerde       'nowadays'           şimdelerde<adv>

I think you want both of these as <adv>. Historically it's something like "şu emdi<n??>" and "şu emdilerde/emdi<n??><pl><loc>", but for our purposes this is irrelevant. —Firespeaker 19:40, 26 February 2012 (UTC)

More to the point, this isn't any sort of productive process we're seeing here; my point is that it's an isolated productive-looking form because of its unique history. —Firespeaker 19:41, 26 February 2012 (UTC)

Имема существительные[edit]

evdeki          'the one in the house'     ev<n><locattr>/ev<n><locsub>
evdekinde       'in the one in the house'  ev<n><locsub><loc>

Разное[edit]

$ echo Evlerimizdeymişler | hfst-proc tr-cv.automorf.hfst 
^Evlerimizdeymişler/Ev<n><pl><px1pl><loc>+i<cop><evid><p3><pl>$

Compound tenses[edit]

Things to think about:

analysis length:
- ^келген эмеспи/кел<v><iv><neg><past><p3><pl>+бы<qst>/кел<v><iv><neg><past><p3><sg>+бы<qst>/кел<vaux><neg><past><p3><pl>+бы<qst>/кел<vaux><neg><past><p3><sg>+бы<qst>$, vs.
- ^келген/кел<v><iv><past>/кел<vaux><past>$ ^эмеспи/эмес<neg><p3><sg>+бы<qst>/эмес<neg><p3><pl>+бы<qst>$
tag/morpheme reordering should be done by transfer, such as Turkish->Chuvash negative imperative, Chuvash->Turkish possessives.
what about different spacing, do you ever get >1 space, or nbsp or formatting between e.g. келген and эмеспи ? -- or anything that isn't a single ascii space ?

Resources[edit]

Following is a mail to the Corpora list. Might be a good idea to have a 'Resources' page/section for Turkic languages, as it is done on language pages.

Message: 6
Date: Wed, 25 Jun 2014 19:54:04 +0200
From: "Christian Chiarcos" <christian.chiarcos@web.de>
Subject: Re: [Corpora-List] Turkic dictionaries
To: "corpora@uib.no" <corpora@uib.no>

Dear all,

I would like to thank everyone who responded to my request and who helped
me in personal conversation, in particular, Emily Bender, Jost Gippert,
Max Ionov, Irina Nevskaya, Monika Rind-Pawlowski, Vit Suchomel, Francis
Tyers, and Mardan Wushouer. Please find a summary, with URLs, brief
description and licensing information below (no particular order):


(A) Dictionaries/Wordlists in machine-readable formats

(A.1) Gilles Sérasset's DBnary
http://kaiko.getalp.org/about-dbnary/
machine-readable (RDF) dictionaries generated from Wiktionary, incl.
Turkish
CC-BY-SA

(A.2) Mardan Wushouer's wordlists
http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Chinese_Uyghur_Bilingual_Dictionary_v1.zip
http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Chinese_Kazakh_Bilingual_Dictionary_v1.zip
http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Uyghur_Kazakh_Bilingual_Dictionary_v1.zip
plain word lists for Chinese-Uyghur, Chinese-Kazakh, Uyghur-Kazakh
CC-BY-NC

(A.3) Altaic etymological dictionary
http://starling.rinet.ru/cgi-bin/bdescr.cgi?root=config&morpho=0&basename=\data\alt\turcet
includes 26 Turkic languages, available online and as DBase dump
copyright restricted

(A.4) Freelang
http://freelang.net
English (and partially, French) word lists for 28 Turkic languages (mostly
small), proprietary list format
freeware (i.e., no modification)

(A.5) Apertium Turkic
http://wiki.apertium.org/wiki/Turkic_languages#Pairs
word lists for Turkic-Azeri, Kazakh-Tatar, 12 more pairs of Turkic
languages under development
open source (hosted at Sourceforge)

(A.6) RELISH
http://tla.mpi.nl/relish/
lexicons for Chalkan and Tuva, provided by the RELISH project
available online, XML
licensing to be clarified

(A.7) PanLex
http://panlex.org
huge collection of word lists in a unified representation (SQL, RDF)
incl. Azeri, Gagauz, Kazakh, Kirgiz, Turkish, Turkmen, Uzbek, etc.
different (mostly open) licenses depending on the original source

(A.8) Intercontinental Dictionary Series
http://lingweb.eva.mpg.de/ids/, http://datahub.io/de/dataset/ids
word lists of minimal core vocabulary
Azeri, Kumyk, Nogai, Terekeme (Azerbaijan dialect)
plain text or RDF
CC-BY-NC-ND


(B) Human-readable dictionaries/wordlists that can be easily converted
into machine-readable formats

(B.1) Wiktionary, various languages (see A.1)
http://wiktionary.org
incl. Azeri, Kazakh, Kirgiz, Tatar, Turkish, Turkmen
CC-BY-SA

(B.2) Chalkan dictionary
http://sprachen.sprachsignale.de/tschalkanisch/tschalkanisch.html
German
available for academic use, with attribution, non-commercial

(B.3) Shorica
http://shoriya.ngpi.rdtc.ru/
Shor dictionary and corpus
copyright to be clarified, currently offline (last accessed mid-May 2014)

(B.4) Karachay-Balkar dictionary
http://www.elbrusoid.org/dictionary/
Karachay-Balkar - Russian dictionary
copyright restricted

(B.5) Tatar dictionary
http://tatar.com.ru/dict/dict.php
Tatar-Russian dictionary
copyright restricted

(B.6) Khakassian dictionary
http://khakas.altaica.ru/dictionary/
Khakas - English and Khakas - Russian
copyright restricted



(C) other resources

(C.1) Altaica
http://altaica.narod.ru/e_v-turks.htm
link and resource collection, includes machine-readable and human-readable
dictionaries for 17 Turkic languages (not replicated above)

(C.2) Pre-Islamic Old Turkic Texts (VATEC)
http://vatec2.fkidg1.uni-frankfurt.de/
glossed corpus (XML) from which a German-Old Turkic word list can be
compiled
copyright restricted

(C.3) Glosbe
http://glosbe.com
online access to word lists and translation memories
Azeri, Karachay-Balkar, Kazakh, Tatar, Turkish, Turkmen, Uzbek, etc.
free online API (with severe capacity limits)


Certainly, this list is not exhaustive, so if you feel something important
is missing or incorrect, please let me know ;)

All the best,
Christian

Turkic-Turkish texts.

Difference between revisions of "Talk:Turkic languages"

Latest revision as of 23:52, 20 June 2016

Contents

Classification[edit]

Hierarchy[edit]

What about 'cop' and 'pred'[edit]

Заметки разрешении морфологической неоднозначности[edit]

Прилагательное[edit]

Наречии[edit]

Имема существительные[edit]

Разное[edit]

Compound tenses[edit]

Resources[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 10: / Line 10: @@
 * noun (default 'subst') + DECL-NOUN
-*:noun->adj = n.attr + DECL-ADJ
+*:noun->adj = n.attr + DECL-ADJ           !! No <comp>arison levels though
 *adj (default 'attr') + DECL-ADJ
@@ Line 18: / Line 18: @@
 *:  num->noun = num.subst + DECL-NOUN
-*pron (default 'subst') + DECL-NOUN
+*prn (default 'subst') + DECL-NOUN
 *det (default 'attr') + NO-DECL
 *v
-*: v->ger + DECL-NOUN
+*: v->noun = v.ger + DECL-NOUN
-*: v->prc + DECL-ADJ
+*: v->adj = v.glp + DECL-ADJ
-*: v->prc.adv + DECL-ADV
+*: v->adv = v.prc + DECL-ADV
+Types of non-finite verbal forms:
+* Adverbial participle: {{tag|gnL}} (e.g. {{tag|gnc}} "Conditional adverbial participle")
+* Verbal adjectives: {{tag|gpL}} (e.g. {{tag|gpi}} "Imperfect verbal adjective")
+* Gerunds: {{tag|gerN}} (e.g. {{tag|ger1}} "Past/present gerund")
+* Participles: {{tag|prcN}} (e.g. {{tag|prc1}} "Realis participle")
 ====What about 'cop' and 'pred'====
@@ Line 42: / Line 49: @@
 * We lose the tag denoting the principle function of the stem
 * We can't tell the CG to choose the principal function
-* We can't tell the difference between `real' N/A ambiguity and
+* We can't tell the difference between `real' N/A ambiguity and "derivation" ambiguity
-   derivation ambiguity
 Arguments against just piling one tag ontop of another:
@@ Line 75: / Line 81: @@
  şimde            'now'                şimde<adv>
- şimdelerde       'nowadays'
+ şimdelerde       'nowadays'           şimdelerde<adv>
+: I think you want both of these as <adv>.  Historically it's something like "şu emdi<n??>" and "şu emdilerde/emdi<n??><pl><loc>", but for our purposes this is irrelevant.  —[[User:Firespeaker|Firespeaker]] 19:40, 26 February 2012 (UTC)
+:: More to the point, this isn't any sort of productive process we're seeing here; my point is that it's an isolated ''productive-looking'' form because of its unique history. —[[User:Firespeaker|Firespeaker]] 19:41, 26 February 2012 (UTC)
 ====Имема существительные====
@@ Line 89: / Line 98: @@
 ^Evlerimizdeymişler/Ev<n><pl><px1pl><loc>+i<cop><evid><p3><pl>$
 </pre>
+==Compound tenses==
+Things to think about:
+* analysis length:
+** <code>^келген эмеспи/кел<v><iv><neg><past><p3><pl>+бы<qst>/кел<v><iv><neg><past><p3><sg>+бы<qst>/кел<vaux><neg><past><p3><pl>+бы<qst>/кел<vaux><neg><past><p3><sg>+бы<qst>$</code>, vs.
+** <code>^келген/кел<v><iv><past>/кел<vaux><past>$ ^эмеспи/эмес<neg><p3><sg>+бы<qst>/эмес<neg><p3><pl>+бы<qst>$</code>
+* tag/morpheme reordering should be done by transfer, such as Turkish->Chuvash negative imperative, Chuvash->Turkish possessives.
+* what about different spacing, do you ever get >1 space, or nbsp or formatting between e.g. келген and эмеспи ? -- or anything that isn't a single ascii space ?
+== Resources ==
+Following is a mail to the Corpora list. Might be a good idea to have a 'Resources' page/section for Turkic languages, as it is done on language pages.
+<pre>
+Message: 6
+Date: Wed, 25 Jun 2014 19:54:04 +0200
+From: "Christian Chiarcos" <christian.chiarcos@web.de>
+Subject: Re: [Corpora-List] Turkic dictionaries
+To: "corpora@uib.no" <corpora@uib.no>
+Dear all,
+I would like to thank everyone who responded to my request and who helped
+me in personal conversation, in particular, Emily Bender, Jost Gippert,
+Max Ionov, Irina Nevskaya, Monika Rind-Pawlowski, Vit Suchomel, Francis
+Tyers, and Mardan Wushouer. Please find a summary, with URLs, brief
+description and licensing information below (no particular order):
+(A) Dictionaries/Wordlists in machine-readable formats
+(A.1) Gilles Sérasset's DBnary
+http://kaiko.getalp.org/about-dbnary/
+machine-readable (RDF) dictionaries generated from Wiktionary, incl.
+Turkish
+CC-BY-SA
+(A.2) Mardan Wushouer's wordlists
+http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Chinese_Uyghur_Bilingual_Dictionary_v1.zip
+http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Chinese_Kazakh_Bilingual_Dictionary_v1.zip
+http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Uyghur_Kazakh_Bilingual_Dictionary_v1.zip
+plain word lists for Chinese-Uyghur, Chinese-Kazakh, Uyghur-Kazakh
+CC-BY-NC
+(A.3) Altaic etymological dictionary
+http://starling.rinet.ru/cgi-bin/bdescr.cgi?root=config&morpho=0&basename=\data\alt\turcet
+includes 26 Turkic languages, available online and as DBase dump
+copyright restricted
+(A.4) Freelang
+http://freelang.net
+English (and partially, French) word lists for 28 Turkic languages (mostly
+small), proprietary list format
+freeware (i.e., no modification)
+(A.5) Apertium Turkic
+http://wiki.apertium.org/wiki/Turkic_languages#Pairs
+word lists for Turkic-Azeri, Kazakh-Tatar, 12 more pairs of Turkic
+languages under development
+open source (hosted at Sourceforge)
+(A.6) RELISH
+http://tla.mpi.nl/relish/
+lexicons for Chalkan and Tuva, provided by the RELISH project
+available online, XML
+licensing to be clarified
+(A.7) PanLex
+http://panlex.org
+huge collection of word lists in a unified representation (SQL, RDF)
+incl. Azeri, Gagauz, Kazakh, Kirgiz, Turkish, Turkmen, Uzbek, etc.
+different (mostly open) licenses depending on the original source
+(A.8) Intercontinental Dictionary Series
+http://lingweb.eva.mpg.de/ids/, http://datahub.io/de/dataset/ids
+word lists of minimal core vocabulary
+Azeri, Kumyk, Nogai, Terekeme (Azerbaijan dialect)
+plain text or RDF
+CC-BY-NC-ND
+(B) Human-readable dictionaries/wordlists that can be easily converted
+into machine-readable formats
+(B.1) Wiktionary, various languages (see A.1)
+http://wiktionary.org
+incl. Azeri, Kazakh, Kirgiz, Tatar, Turkish, Turkmen
+CC-BY-SA
+(B.2) Chalkan dictionary
+http://sprachen.sprachsignale.de/tschalkanisch/tschalkanisch.html
+German
+available for academic use, with attribution, non-commercial
+(B.3) Shorica
+http://shoriya.ngpi.rdtc.ru/
+Shor dictionary and corpus
+copyright to be clarified, currently offline (last accessed mid-May 2014)
+(B.4) Karachay-Balkar dictionary
+http://www.elbrusoid.org/dictionary/
+Karachay-Balkar - Russian dictionary
+copyright restricted
+(B.5) Tatar dictionary
+http://tatar.com.ru/dict/dict.php
+Tatar-Russian dictionary
+copyright restricted
+(B.6) Khakassian dictionary
+http://khakas.altaica.ru/dictionary/
+Khakas - English and Khakas - Russian
+copyright restricted
+(C) other resources
+(C.1) Altaica
+http://altaica.narod.ru/e_v-turks.htm
+link and resource collection, includes machine-readable and human-readable
+dictionaries for 17 Turkic languages (not replicated above)
+(C.2) Pre-Islamic Old Turkic Texts (VATEC)
+http://vatec2.fkidg1.uni-frankfurt.de/
+glossed corpus (XML) from which a German-Old Turkic word list can be
+compiled
+copyright restricted
+(C.3) Glosbe
+http://glosbe.com
+online access to word lists and translation memories
+Azeri, Karachay-Balkar, Kazakh, Tatar, Turkish, Turkmen, Uzbek, etc.
+free online API (with severe capacity limits)
+Certainly, this list is not exhaustive, so if you feel something important
+is missing or incorrect, please let me know ;)
+All the best,
+Christian
+</pre>
+[http://ekitap.kulturturizm.gov.tr/TR,78479/turkiye-disindaki-turk-edebiyatlari-antolojisi.html Turkic-Turkish texts].