Difference between revisions of "Talk:Turkic languages"
Jump to navigation
Jump to search
Line 10: | Line 10: | ||
* noun (default 'subst') + DECL-NOUN |
* noun (default 'subst') + DECL-NOUN |
||
*:noun->adj = n.attr + DECL-ADJ |
*:noun->adj = n.attr + DECL-ADJ !! No <comp>arison levels though |
||
*adj (default 'attr') + DECL-ADJ |
*adj (default 'attr') + DECL-ADJ |
Revision as of 23:21, 23 April 2012
Classification
- attributive attr = things that act like adjectives
- predicative pred
- substantive subst = things that act like nouns
- adverbial advl = things that act like adverbs (??)
Hierarchy
- noun (default 'subst') + DECL-NOUN
- noun->adj = n.attr + DECL-ADJ !! No <comp>arison levels though
- adj (default 'attr') + DECL-ADJ
- adj->noun = adj.subst + DECL-NOUN
- num (default 'attr') + DECL-NUM
- num->noun = num.subst + DECL-NOUN
- prn (default 'subst') + DECL-NOUN
- det (default 'attr') + NO-DECL
- v
- v->noun = v.ger + DECL-NOUN
- v->adj = v.glp + DECL-ADJ
- v->adv = v.prc + DECL-ADV
Types of non-finite verbal forms:
- Adverbial participle:
<gnL>
(e.g.<gnc>
"Conditional adverbial participle") - Verbal adjectives:
<gpL>
(e.g.<gpi>
"Imperfect verbal adjective") - Gerunds:
<gerN>
(e.g.<ger1>
"Past/present gerund") - Participles:
<prcN>
(e.g.<prc1>
"Realis participle")
What about 'cop' and 'pred'
- The copula is i- (p.79)
- -(y) (pres)
- -(y)DI (past)
- -(y)mIş (evid)
- -(y)sA (cond)
Заметки разрешении морфологической неоднозначности
Arguments against just having a different tag:
e.g. güzel<n>/güzel<adj>
- We lose the tag denoting the principle function of the stem
- We can't tell the CG to choose the principal function
- We can't tell the difference between `real' N/A ambiguity and "derivation" ambiguity
Arguments against just piling one tag ontop of another:
e.g. güzel<adj>/güzel<adj><n>
- Having two POS in a word makes things confusing
- Having two POS tags in a word makes it difficult to write CG rules
Arguments against having a "zero derivation":
e.g. güzel<adj>/güzel<adj><D_n><n>
- It's ugly and stupid
- Having two POS tags in a word makes it difficult to write CG rules
Прилагательное
güzel 'beautiful' güzel<adj>/güzel<adj><subst>/güzel<adj><advl> güzelim 'my beauty' güzel<adj><subst><px1sg> güzel konuştu 'she spoke well' güzel<adj>/güzel<adj><subst>/güzel<adj><advl> güzel bir köpek 'a beautiful dog' güzel<adj>/güzel<adj><subst>/güzel<adj><advl>
küçük 'small' küçük<adj>/küçük<adj><subst>/küçük<adj><advl> küçük kızlar 'little girls' küçük<adj>/küçük<adj><subst>/küçük<adj><advl> küçükler 'little one(s)' küçük<adj><subst><pl>/küçük<n>+i<cop><pres><p3><pl>
kötü 'bad' kötü<adj>/kötü<adj><subst>/kötü<adj><advl> kötü araba '(a) bad car' kötü<adj>/kötü<adj><subst>/kötü<adj><advl> kötü yüzmek 'to swim badly' kötü<adj>/kötü<adj><subst>/kötü<adj><advl>
Наречии
şimde 'now' şimde<adv> şimdelerde 'nowadays' şimdelerde<adv>
- I think you want both of these as <adv>. Historically it's something like "şu emdi<n??>" and "şu emdilerde/emdi<n??><pl><loc>", but for our purposes this is irrelevant. —Firespeaker 19:40, 26 February 2012 (UTC)
- More to the point, this isn't any sort of productive process we're seeing here; my point is that it's an isolated productive-looking form because of its unique history. —Firespeaker 19:41, 26 February 2012 (UTC)
Имема существительные
evdeki 'the one in the house' ev<n><locattr>/ev<n><locsub> evdekinde 'in the one in the house' ev<n><locsub><loc>
Разное
$ echo Evlerimizdeymişler | hfst-proc tr-cv.automorf.hfst ^Evlerimizdeymişler/Ev<n><pl><px1pl><loc>+i<cop><evid><p3><pl>$
Compound tenses
Things to think about:
- analysis length:
^келген эмеспи/кел<v><iv><neg><past><p3><pl>+бы<qst>/кел<v><iv><neg><past><p3><sg>+бы<qst>/кел<vaux><neg><past><p3><pl>+бы<qst>/кел<vaux><neg><past><p3><sg>+бы<qst>$
, vs.^келген/кел<v><iv><past>/кел<vaux><past>$ ^эмеспи/эмес<neg><p3><sg>+бы<qst>/эмес<neg><p3><pl>+бы<qst>$
- tag/morpheme reordering should be done by transfer, such as Turkish->Chuvash negative imperative, Chuvash->Turkish possessives.
- what about different spacing, do you ever get >1 space, or nbsp or formatting between e.g. келген and эмеспи ? -- or anything that isn't a single ascii space ?