Turkic lexicon
Some notes on how to go about making a Turkic lexicon for use in Apertium.
Layout
General points:
- The lexicon will be made in one file, it will have the suffix .lexc
- The file will be laid out in the following order:
- The multicharacter symbols
- The Rootlexicon, pointing to the stem lexicons
- The morphotactics (continuation lexica)
- The stem lexicons
 
Multicharacter symbols
Morphological categories must be encased in < and > tags. They may contain the letters a-z and numbers 0-9. In extreme cases they may include the letters A-Z They must begin with a letter, they may not begin with a number.
Examples:
- %<n%>Noun
- %<p3%>Third person
- %<evid%>Evidential
For information on archiphonemes, see the corresponding page.
The list of symbols should be laid out in the following order:
- The major parts of speech
- The morphological categories
- Archiphonemes
- Other symbols, e.g. Morpheme boundary, ' ', '-' etc.
Every symbol should have a comment. The comments should line up.
Morphotactics
Naming continuation lexica
- Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.- Examples: LEXICON N1,LEXICON DET-DEM,LEXICON ADV
 
- Examples: 
What sorts of distinctions to make
TODO: TV vs. IV, Russian vs. non-Russian in Chuvash
Stem lexicons
TODO: Why stems go in lexicon and not infinitives
Lines in the stem lexicons should follow the following pattern:
- Left side (lexical form)
- Colon :
- Right side (surface form)
- Space 
- Continuation lexicon
- Space 
- Semicolon ;
- Space 
- Exclamation mark
- Open quote "
- Gloss (optional)
- Close quote "
Example:
кӗнеке:кӗнек N2 ; ! "llibre, книга"
Morphophonology
TODO: px3 is sIn (and why)
Categorisation
Nominals
Compound Nouns
TODO: N-N compounds with <px3>
Adjectives
- A1: adjectives that can be both substantivised and adverbialised;
- All three readings (<adj>, <adj.subst> and <adj.advl>)
- have comparison levels.
 
- A2: derived/not fully lexicalised adjectives without adverbial reading
- <adj> and <adj.subst> readings
- have comparison levels.
 
- A3: derived/not fully lexicalised adjectives without adverbial reading
- so-called "predicatives" (бар, жоқ)
- no comparison levels at all.
 
- A4: "pure" adjectives
- no adverbial and substantive readings,
- no comparison levels;
 
Examples by language
Chuvash
| Type | Example | Reading | Phrase | 
|---|---|---|---|
| A1 | лайӑх "good" | <adj> | Ку лайӑх кĕнеке. | 
| лайӑхтарах | <adj><comp> | Ку лайӑхтарахчĕ. | |
| лайӑх | <adj><advl> | Вӑл лайӑх ишет. | |
| лайӑххисем | <adj><subst><pl> | ||
| A2 | кӑвак "blue" | <adj> | |
| кӑвакрах | <adj><comp> | ||
| *кӑвак | <adj><advl> | ||
| кӑвак | <adj><subst><pl> | ||
| A3 | вилĕ "dead" | <adj> | |
| *вилĕрех, *вилĕтерех | <adj><comp> | ||
| *вилĕ | <adj><advl> | ||
| вилĕ | <adj><subst><pl> | ||
| A4 | тĕп "main" | <adj> | |
| *тĕпрех, *тĕптерех | <adj><comp> | — | |
| *тĕп | <adj><advl> | — | |
| *тĕп | <adj><subst> | — | 
Kazakh
Tatar
Turkish
Adverbs
Postpositions
TODO: "postpositions" which take poss./case are nouns
Finite verbs
Non-finite verbs
This section outlines what categories of non-finite verb forms exist in Turkic, and how to identify the type of category created by a given affix.
Verbal nouns / gerunds
Verbal nouns are forms of verbs that allow one to use a verb phrase as a noun phrase. An example in English might be "running" in the sentence "I like running", or "eating beshbarmaq with my hands" in "I believe in eating beshbarmaq with my hands". The former sentence in Kazakh would be:
- Мен
- мен<prn><nom>
- I
 
 
- жүгіруді
- жүгір<v><iv><ger><acc>
- running
- жақсы
- жақсы<adv>
- well
- көремін
- көр<v><tv><aor><p1><sg>
- I see
- "I like running."
 
You can also embed subjects, kind of like the English "I saw him/his running home."
- Мен
- мен<prn><nom>
- I
 
 
- оның
- ол<prn><gen>
- his
- үйге
- үй<n><dat>
- to home
- қарай
- қарай<pst>
- towards
- жүгіретінін
- жүгір<v><iv><ger_impf><px3sp><acc>
- his running
- көрдім
- көр<v><tv><ifi><p1><sg>
- I saw
- "I saw him running home."
 
This same sentence could also be translated as follows, depending on whether you're focusing on the fact that he was running (previous) or that you saw him run home (following):
- Мен
- мен<prn><nom>
- I
 
 
- оның
- ол<prn><gen>
- his
- үйге
- үй<n><dat>
- to home
- қарай
- қарай<pst>
- towards
- жүгіргенін
- жүгір<v><iv><ger_prf><px3sp><acc>
- his running
- көрдім
- көр<v><tv><ifi><p1><sg>
- I saw
- "I saw him run home."
 
As implied by this example, while the tense of gerunds is limited in English, gerunds in most Turkic languages can have a wide range of tense/mood/aspect/evidentiality (TMAE) combinations. Many of these are translated to languages like English as relative clauses, e.g. "I believe that he eats beshbarmaq with his hands.":
- Беспармақ
- беспармақ<n><acc>
- beshbarmaq
 
 
- қолымен
- қол<n><px3sp><inst>
- with his hands
- жейтініне
- же<v><ger_impf><px3sg><dat>
- to his eating
- сенемін
- сен<v><tv><aor><p1><sg>
- I believe
- I believe that he eats beshbarmaq with his hands."
 
Notice that in these examples, the verb phrase is being used as a subject, object, adjunct, etc.  That is, in Turkic languages, gerunds can take any grammatical role (and morphology) that a noun phrase can take.  In Kazakh, for example, the verbal nouns can take any combination of possession and/or case suffixes.  They may sometimes even take plural suffixes, though often forms that appear to be a gerund followed by a plural suffix are actually plural substantivised verbal adjectives (<gpr><subst><pl>).
For Turkic languages in apertium, the most fundamental gerund of a language (often used as an infinitive form) takes the <ger> tag and other gerunds take tags based on <ger> and something about their TMAE specification, such as <ger_impf> (for "imperfective gerund") or <ger_fut> (for "future gerund").
Verbal adjectives
Participals
Verbal adverbs
Language specific issues
Turkmen: stem-final voiced and voiceless stops
In Turkmen, there are three types of stem-final stops:
- voiced stops
- voiceless stops
- stops that are voiceless syllable finally and voiced intervocalically
TODO: finish description of this and explain how it can be / is dealt with

