Difference between revisions of "Turkic lexicon"

From Apertium
Jump to navigation Jump to search
Line 198: Line 198:
Verbal adjectives are forms of verbs that allow one to use a verb phrase as an adjectival phrase. An example in English might be "running" in the sentence "The running man startled me" (as opposed to "the sitting man"), or "running home" in "The man running home startled me" (as opposed to "the man eating beshbarmaq"). These sentences in Kazakh would be:
Verbal adjectives are forms of verbs that allow one to use a verb phrase as an adjectival phrase. An example in English might be "running" in the sentence "The running man startled me" (as opposed to "the sitting man"), or "running home" in "The man running home startled me" (as opposed to "the man eating beshbarmaq"). These sentences in Kazakh would be:


:{{gl|Жүгіретін|жүгір{{tag|v}}{{tag|vi}}{{tag|glp_impf}}|running}}
:{{gl|Жүгіретін|жүгір{{tag|v}}{{tag|vi}}{{tag|gpr_impf}}|running}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|мені|мен{{tag|prn}}{{tag|acc}}|me}}
{{gl|мені|мен{{tag|prn}}{{tag|acc}}|me}}
Line 205: Line 205:


:{{gl|Үйге|үй{{tag|n}}{{tag|dat}}|to home}}
:{{gl|Үйге|үй{{tag|n}}{{tag|dat}}|to home}}
{{gl|жүгіретін|жүгір{{tag|v}}{{tag|vi}}{{tag|glp_impf}}|running}}
{{gl|жүгіретін|жүгір{{tag|v}}{{tag|vi}}{{tag|gpr_impf}}|running}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|мені|мен{{tag|prn}}{{tag|acc}}|me}}
{{gl|мені|мен{{tag|prn}}{{tag|acc}}|me}}
Line 218: Line 218:
{{gl|мен|мен{{tag|prn}}{{tag|nom}}|I}}
{{gl|мен|мен{{tag|prn}}{{tag|nom}}|I}}
{{gl|шанышқы|шанышқы{{tag|n}}{{tag|acc}}|fork}}
{{gl|шанышқы|шанышқы{{tag|n}}{{tag|acc}}|fork}}
{{gl|берген|бер{{tag|v}}{{tag|tv}}{{tag|glp_past}}|having given}}
{{gl|берген|бер{{tag|v}}{{tag|tv}}{{tag|gpr_past}}|having given}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|беспармақ|беспармақ{{tag|n}}{{tag|nom}}|beshbarmaq}}
{{gl|беспармақ|беспармақ{{tag|n}}{{tag|nom}}|beshbarmaq}}
Line 228: Line 228:
{{gl|маған|мен{{tag|prn}}{{tag|dat}}|to me}}
{{gl|маған|мен{{tag|prn}}{{tag|dat}}|to me}}
{{gl|шанышқы|шанышқы{{tag|n}}{{tag|acc}}|fork}}
{{gl|шанышқы|шанышқы{{tag|n}}{{tag|acc}}|fork}}
{{gl|берген|бер{{tag|v}}{{tag|tv}}{{tag|glp_past}}|having given}}
{{gl|берген|бер{{tag|v}}{{tag|tv}}{{tag|gpr_past}}|having given}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|беспармақ|беспармақ{{tag|n}}{{tag|nom}}|beshbarmaq}}
{{gl|беспармақ|беспармақ{{tag|n}}{{tag|nom}}|beshbarmaq}}
Line 239: Line 239:
:{{gl|Мен|мен{{tag|prn}}{{tag|nom}}|I}}
:{{gl|Мен|мен{{tag|prn}}{{tag|nom}}|I}}
{{gl|кеше|кеше{{tag|adv}}|yesterday}}
{{gl|кеше|кеше{{tag|adv}}|yesterday}}
{{gl|көрген|көр{{tag|v}}{{tag|vt}}{{tag|glp_past}}|seen}}
{{gl|көрген|көр{{tag|v}}{{tag|vt}}{{tag|gpr_past}}|seen}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|адам|адам{{tag|n}}{{tag|nom}}|man}}
{{gl|мені|мен{{tag|prn}}{{tag|acc}}|me}}
{{gl|мені|мен{{tag|prn}}{{tag|acc}}|me}}
Line 251: Line 251:
:{{gl|Мен|мен{{tag|prn}}{{tag|nom}}|I}}
:{{gl|Мен|мен{{tag|prn}}{{tag|nom}}|I}}
{{gl|кеше|кеше{{tag|adv}}|yesterday}}
{{gl|кеше|кеше{{tag|adv}}|yesterday}}
{{gl|көргендер|көр{{tag|v}}{{tag|vt}}{{tag|glp_past}}{{tag|subst}}{{tag|pl}}|ones seen}}
{{gl|көргендер|көр{{tag|v}}{{tag|vt}}{{tag|gpr_past}}{{tag|subst}}{{tag|pl}}|ones seen}}
{{gl|беспармақ|беспармақ{{tag|n}}{{tag|nom}}|beshbarmaq}}
{{gl|беспармақ|беспармақ{{tag|n}}{{tag|nom}}|beshbarmaq}}
{{gl|жеп|же{{tag|v}}{{tag|tv}}{{tag|prc}}|eating}}
{{gl|жеп|же{{tag|v}}{{tag|tv}}{{tag|prc}}|eating}}
Line 257: Line 257:
:{{glend|"The ones/people I saw were eating beshbarmaq."}}
:{{glend|"The ones/people I saw were eating beshbarmaq."}}


For Turkic languages in apertium, the tags for verbal adjectives are based on {{tag|glp}}, with a brief TMAE specification following, such as {{tag|glp_past}} (for "past-tense verbal adjective") or {{tag|ger_impf}} (for "imperfect verbal adjective"). The abbreviation "glp" comes from the Russian phrase "<b>гл</b>агольное <b>п</b>рилагательное" [<b>gl</b>aˈgolʲnəjɪ <b>p</b>rʲilaˈgatʲɪlʲnəjɪ], which means "verbal adjective".
For Turkic languages in apertium, the tags for verbal adjectives are based on {{tag|gpr}}, with a brief TMAE specification following, such as {{tag|gpr_past}} (for "past-tense verbal adjective") or {{tag|ger_impf}} (for "imperfect verbal adjective"). The abbreviation "gpr" comes from the Russian phrase "<b>г</b>лагольное <b>пр</b>илагательное" [<b>g</b>aˈgolʲnəjɪ <b>pr</b>ʲilaˈgatʲɪlʲnəjɪ], which means "verbal adjective".


====Participles ====
====Participles ====

Revision as of 05:49, 18 July 2012

Some notes on how to go about making a Turkic lexicon for use in Apertium.

Layout

General points:

  • The lexicon will be made in one file, it will have the suffix .lexc
  • The file will be laid out in the following order:
    1. The multicharacter symbols
    2. The Root lexicon, pointing to the stem lexicons
    3. The morphotactics (continuation lexica)
    4. The stem lexicons

Multicharacter symbols

Morphological categories must be encased in < and > tags. They may contain the letters a-z and numbers 0-9. In extreme cases they may include the letters A-Z They must begin with a letter, they may not begin with a number.

Examples:

  • %<n%> Noun
  • %<p3%> Third person
  • %<evid%> Evidential

For information on archiphonemes, see the corresponding page.

The list of symbols should be laid out in the following order:

  • The major parts of speech
  • The morphological categories
  • Archiphonemes
  • Other symbols, e.g. Morpheme boundary, ' ', '-' etc.

Every symbol should have a comment. The comments should line up.

Morphotactics

Naming continuation lexica

  • Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.
    • Examples: LEXICON N1, LEXICON DET-DEM, LEXICON ADV

What sorts of distinctions to make

TODO: TV vs. IV, Russian vs. non-Russian in Chuvash

Stem lexicons

TODO: Why stems go in lexicon and not infinitives

Lines in the stem lexicons should follow the following pattern:

  • Left side (lexical form)
  • Colon :
  • Right side (surface form)
  • Space
  • Continuation lexicon
  • Space
  • Semicolon ;
  • Space
  • Exclamation mark
  • Open quote "
  • Gloss (optional)
  • Close quote "

Example:

кӗнеке:кӗнек N2 ; ! "llibre, книга"

Morphophonology

TODO: px3 is sIn (and why)

Categorisation

Nominals

Compound Nouns

TODO: N-N compounds with <px3>

Adjectives

  • A1: adjectives that can be both substantivised and adverbialised;
    • All three readings (<adj>, <adj.subst> and <adj.advl>)
    • have comparison levels.
  • A2: derived/not fully lexicalised adjectives without adverbial reading
    • <adj> and <adj.subst> readings
    • have comparison levels.
  • A3: derived/not fully lexicalised adjectives without adverbial reading
    • so-called "predicatives" (бар, жоқ)
    • no comparison levels at all.
  • A4: "pure" adjectives
    • no adverbial and substantive readings,
    • no comparison levels;

Examples by language

Chuvash
Type Example Reading Phrase
A1 лайӑх "good" <adj> Ку лайӑх кĕнеке.
лайӑхтарах <adj><comp> Ку лайӑхтарахчĕ.
лайӑх <adj><advl> Вӑл лайӑх ишет.
лайӑххисене <adj><subst><pl><dat> Лаиӑххисене куратӑп.
A2 кӑвак "blue" <adj> Эпĕ çак кавак кĕнекене куратӑп.
кӑвактарах <adj><comp> Ку кӗнеке кавактарахче.
*кӑвак <adj><advl>
кӑваккисем <adj><subst><pl> Каваккисем куратӑп.
A3 вилĕ "dead" <adj> Эпĕ çак вилӗ сынна куратӑп.
*вилĕрех, *вилĕтерех <adj><comp>
*вилĕ <adj><advl>
виллисем <adj><subst><pl> Эпĕ виллисем куратӑп.
A4 тĕп "main" <adj> Ку тӗп кӗнеке.
*тĕпрех, *тĕптерех <adj><comp>
*тĕп <adj><advl>
*тĕп <adj><subst>
Subtypes
  • A1/A2
    Comparatives can be -рах, тарах or both, depending on the stem ending sound
  • A1/A2/A3
    Substantivation can be done by the means of a suffix, e.g. лайӑхх·и, (in fact, the 3rd person possessive) or without it
Kazakh
Tatar
Turkish

Adverbs

Postpositions

TODO: "postpositions" which take poss./case are nouns

Finite verbs

Non-finite verbs

This section outlines what categories of non-finite verb forms exist in Turkic, and how to identify the type of category created by a given affix.

Verbal nouns / gerunds

Verbal nouns are forms of verbs that allow one to use a verb phrase as a noun phrase. An example in English might be "running" in the sentence "I like running", or "eating beshbarmaq with my hands" in "I believe in eating beshbarmaq with my hands". The former sentence in Kazakh would be:

Мен
мен<prn><nom>
I
жүгіруді
жүгір<v><iv><ger><acc>
running
жақсы
жақсы<adv>
well
көремін
көр<v><tv><aor><p1><sg>
I see
"I like running."

You can also embed subjects, kind of like the English "I saw him/his running home."

Мен
мен<prn><nom>
I
оның
ол<prn><gen>
his
үйге
үй<n><dat>
to home
қарай
қарай<pst>
towards
жүгіретінін
жүгір<v><iv><ger_impf><px3sp><acc>
his running
көрдім
көр<v><tv><ifi><p1><sg>
I saw
"I saw him running home."

This same sentence could also be translated as follows, depending on whether you're focusing on the fact that he was running (previous) or that you saw him run home (following):

Мен
мен<prn><nom>
I
оның
ол<prn><gen>
his
үйге
үй<n><dat>
to home
қарай
қарай<pst>
towards
жүгіргенін
жүгір<v><iv><ger_prf><px3sp><acc>
his running
көрдім
көр<v><tv><ifi><p1><sg>
I saw
"I saw him run home."

As implied by this example, while the tense of gerunds is limited in English, gerunds in most Turkic languages can have a wide range of tense/mood/aspect/evidentiality (TMAE) combinations. Many of these are translated to languages like English as relative clauses, e.g. "I believe that he eats beshbarmaq with his hands.":

Беспармақ
беспармақ<n><acc>
beshbarmaq
қолымен
қол<n><px3sp><inst>
with his hands
жейтініне
же<v><ger_impf><px3sg><dat>
to his eating
сенемін
сен<v><tv><aor><p1><sg>
I believe
I believe that he eats beshbarmaq with his hands."

Notice that in these examples, the verb phrase is being used as a subject, object, adjunct, etc. That is, in Turkic languages, gerunds can take any grammatical role (and morphology) that a noun phrase can take. In Kazakh, for example, the verbal nouns can take any combination of possession and/or case suffixes. They may sometimes even take plural suffixes, though often forms that appear to be a gerund followed by a plural suffix are actually plural substantivised verbal adjectives (<gpr><subst><pl>).

For Turkic languages in apertium, the most fundamental gerund of a language (often used as an infinitive form) takes the <ger> tag and other gerunds take tags based on <ger> and something about their TMAE specification, such as <ger_impf> (for "imperfective gerund") or <ger_fut> (for "future gerund").

Verbal adjectives

Verbal adjectives are forms of verbs that allow one to use a verb phrase as an adjectival phrase. An example in English might be "running" in the sentence "The running man startled me" (as opposed to "the sitting man"), or "running home" in "The man running home startled me" (as opposed to "the man eating beshbarmaq"). These sentences in Kazakh would be:

Жүгіретін
жүгір<v><vi><gpr_impf>
running
адам
адам<n><nom>
man
мені
мен<prn><acc>
me
шошытты.
шошы<v><iv><caus><ifi><p3><sg>
he startled.
"That running man startled me."
Үйге
үй<n><dat>
to home
жүгіретін
жүгір<v><vi><gpr_impf>
running
адам
адам<n><nom>
man
мені
мен<prn><acc>
me
шошытты.
шошы<v><iv><caus><ifi><p3><sg>
he startled.
"The man running home startled me."

Notice that while in English verbal adjective phrases that are longer than just a verb must be placed after the noun they modify, in Kazakh verbal adjective phrases of any length are only ever placed before the noun.

Phrases formed using verbal adjectives in Turkic languages are often translated using relative clauses in languages like English (e.g., "The man who was running [home] startled me."). Note that in Turkic languages, usually any part of the verb phrase can be relativised (i.e., "extracted" from the embedded verb phrase and made into a nominal argument which the verbal adjective phrase then "modifies"). For example, "The man who I gave a fork to yesterday was eating beshbarmaq" and "The man that gave me a fork yesterday was eating beshbarmaq" can both be translated into Kazakh using verbal adjectives.

Кеше
кеше<adv>
yesterday
мен
мен<prn><nom>
I
шанышқы
шанышқы<n><acc>
fork
берген
бер<v><tv><gpr_past>
having given
адам
адам<n><nom>
man
беспармақ
беспармақ<n><nom>
beshbarmaq
жеп
же<v><tv><prc>
eating
жатқан.
жат<vaux><past><p3><sg>
was
"The man who I gave the fork to yesterday was eating beshbarmaq."
Кеше
кеше<adv>
yesterday
маған
мен<prn><dat>
to me
шанышқы
шанышқы<n><acc>
fork
берген
бер<v><tv><gpr_past>
having given
адам
адам<n><nom>
man
беспармақ
беспармақ<n><nom>
beshbarmaq
жеп
же<v><tv><prc>
eating
жатқан.
жат<vaux><past><p3><sg>
was
"The man who gave me the fork yesterday was eating beshbarmaq."

In English, there is a difference between relative clauses that limit/restrict (what's the right term?) the noun and those that don't. E.g., "The man that I saw yesterday startled me" (specifically that mean startled me) versus "The man, who I saw yesterday, startled me" (a man startled me; it happens that I saw him yesterday). In Turkic languages the default meaning of a verbal adjective is usually the restricted meaning, e.g. in Kazakh:

Мен
мен<prn><nom>
I
кеше
кеше<adv>
yesterday
көрген
көр<v><vt><gpr_past>
seen
адам
адам<n><nom>
man
мені
мен<prn><acc>
me
шошытты.
шошы<v><iv><caus><ifi><p3><sg>
he startled.
"The man I saw yesterday startled me."

To get the non-restricted meaning in a Turkic language, two finite verb forms would normally be used (e.g., "The man scared me; I saw him yesterday").

Verbal adjectives can also be substantivised to mean "the ones who ...". For example, in Kazakh, in a translation of the sentence "the ones/people I saw yesterday were eating beshbarmaq", "the ones/people I saw yesterday" would be formed like the first part of "the man I saw yesterday", but without the word "man", and a plural marker instead:

Мен
мен<prn><nom>
I
кеше
кеше<adv>
yesterday
көргендер
көр<v><vt><gpr_past><subst><pl>
ones seen
беспармақ
беспармақ<n><nom>
beshbarmaq
жеп
же<v><tv><prc>
eating
жатқан.
жат<vaux><past><p3><pl>
were
"The ones/people I saw were eating beshbarmaq."

For Turkic languages in apertium, the tags for verbal adjectives are based on <gpr>, with a brief TMAE specification following, such as <gpr_past> (for "past-tense verbal adjective") or <ger_impf> (for "imperfect verbal adjective"). The abbreviation "gpr" comes from the Russian phrase "глагольное прилагательное" [gaˈgolʲnəjɪ prʲilaˈgatʲɪlʲnəjɪ], which means "verbal adjective".

Participles

Participles are verb forms that allow a verb phrase to be combined with other verbs for the purpose of adding information about tense/mood/aspect/voice/evidentiality (TMAVE) to the utterance. That is, it's usually used in the creation of "compound verb tenses", so the following word is almost always a verbal auxiliary.

Examples of relevant compound verb phrases in Kazakh include the following:

gloss gloss participle
тамақ жеп біттің ‘you finished eating’ ^жеп/же<v><tv><prc_prf>$
тамақ жей бердің ‘you kept eating’ ^жей/же<v><tv><prc_impf>$
тамақ жейтін шығармын ‘you seem to be eating’ ^жейтін/же<v><tv><prc_irre>$
тамақ жесең болады ‘you may/can eat’ ^жесең/же<v><tv><prc_cnd><p2><sg>$
тамақ жегелі жатырсың ‘you're about to eat’ ^жегелі/же<v><tv><prc_purp>$
etc.

For Turkic languages in apertium, the tags for participles are based on <prc>, with a brief TMAE specification following, such as <prc_impf> (for "imperfective participle") or <prc_irre> (for "irrealis participle"). The abbreviation "prc" comes from the English term "participle" or the Russian equivalent "причастие" [prʲiˈtɕastʲijɪ].

Verbal adverbs

Language specific issues

Turkmen: stem-final voiced and voiceless stops

In Turkmen, there are three types of stem-final stops:

  • voiced stops
  • voiceless stops
  • stops that are voiceless syllable finally and voiced intervocalically

TODO: finish description of this and explain how it can be / is dealt with

Chuvash: Russian loans ending in -a with non-final stress