Difference between revisions of "Turkic lexicon"
| Line 95: | Line 95: | ||
| ==== Examples by language ==== | ==== Examples by language ==== | ||
| ===== Chuvash ===== | |||
| {|class=wikitable | {|class=wikitable | ||
| ! Type | ! Type !! Example                !! Reading                !! Phrase | ||
| |- | |- | ||
| | A1  | | A1   || лайӑх                  || {{tag|adj}}            || Ку лайӑх кĕнеке. | ||
| |- | |- | ||
| |      || лайӑхтӑрӑх             ||  {{tag|adj><comp}}     || Ку лайӑхтӑрӑхче. | |||
| |- | |- | ||
| |      || лайӑх                  || {{tag|adj><advl}}      || Вӑл лайӑх иҫет. | |||
| |- | |- | ||
| |      || лайӑхисем              || {{tag|adj><subst><pl}} || | |||
| |- | |- | ||
| |- | |- | ||
| | A2 | | A2   || кӑвак                  || {{tag|adj}}            ||  | ||
| |- | |- | ||
| |      || кӑвакрӑх               || {{tag|adj><comp}}      ||  | |||
| |- | |- | ||
| |      || *кӑвак                 || {{tag|adj><advl}}      ||  | |||
| |- | |- | ||
| |      || кӑвак                  || {{tag|adj><subst><pl}} ||  | |||
| |- | |- | ||
| |- | |- | ||
| | A3 | | A3   || вилĕ                   || {{tag|adj}}            ||  | ||
| |- | |- | ||
| |      || *вилĕрӑх, *вилĕтĕрĕх   || {{tag|adj><comp}}      ||  | |||
| |- | |- | ||
| |      || *вилĕ                  || {{tag|adj><advl}}      ||  | |||
| |- | |- | ||
| |      || вилĕ                   || {{tag|adj><subst><pl}} ||  | |||
| |- | |- | ||
| |- | |- | ||
| | A4 | | A4   || тĕп                    || {{tag|adj}}            ||  | ||
| |- | |- | ||
| |      || *тĕпрĕх, *тĕптĕрĕх     ||  {{tag|adj><comp}}     || — | |||
| |- | |- | ||
| |      || *тĕп                   ||  {{tag|adj><advl}}     || — | |||
| |- | |- | ||
| |      || *тĕп                   ||  {{tag|adj><subst}}     || — | |||
| |} | |} | ||
| ===== Kazakh ===== | |||
| ===== Tatar ===== | |||
| ===== Turkish ===== | |||
| === Adverbs === | === Adverbs === | ||
Revision as of 19:00, 11 July 2012
Some notes on how to go about making a Turkic lexicon for use in Apertium.
Layout
General points:
- The lexicon will be made in one file, it will have the suffix .lexc
- The file will be laid out in the following order:
- The multicharacter symbols
- The Rootlexicon, pointing to the stem lexicons
- The morphotactics (continuation lexica)
- The stem lexicons
 
Multicharacter symbols
Morphological categories must be encased in < and > tags. They may contain the letters a-z and numbers 0-9. In extreme cases they may include the letters A-Z They must begin with a letter, they may not begin with a number.
Examples:
- %<n%>Noun
- %<p3%>Third person
- %<evid%>Evidential
For information on archiphonemes, see the corresponding page.
The list of symbols should be laid out in the following order:
- The major parts of speech
- The morphological categories
- Archiphonemes
- Other symbols, e.g. Morpheme boundary, ' ', '-' etc.
Every symbol should have a comment. The comments should line up.
Morphotactics
Naming continuation lexica
- Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.- Examples: LEXICON N1,LEXICON DET-DEM,LEXICON ADV
 
- Examples: 
What sorts of distinctions to make
TODO: TV vs. IV, Russian vs. non-Russian in Chuvash
Stem lexicons
TODO: Why stems go in lexicon and not infinitives
Lines in the stem lexicons should follow the following pattern:
- Left side (lexical form)
- Colon :
- Right side (surface form)
- Space 
- Continuation lexicon
- Space 
- Semicolon ;
- Space 
- Exclamation mark
- Open quote "
- Gloss (optional)
- Close quote "
Example:
кӗнеке:кӗнек N2 ; ! "llibre, книга"
Morphophonology
TODO: px3 is sIn (and why)
Categorisation
Nominals
Compound Nouns
TODO: N-N compounds with <px3>
Adjectives
- A1: adjectives that can be both substantivised and adverbialised;
- All three readings (<adj>, <adj.subst> and <adj.advl>)
- have comparison levels.
 
- A2: derived/not fully lexicalised adjectives without adverbial reading
- <adj> and <adj.subst> readings
- have comparison levels.
 
- A3: derived/not fully lexicalised adjectives without adverbial reading
- so-called "predicatives" (бар, жоқ)
- no comparison levels at all.
 
- A4: "pure" adjectives
- no adverbial and substantive readings,
- no comparison levels;
 
Examples by language
Chuvash
| Type | Example | Reading | Phrase | 
|---|---|---|---|
| A1 | лайӑх | <adj> | Ку лайӑх кĕнеке. | 
| лайӑхтӑрӑх | <adj><comp> | Ку лайӑхтӑрӑхче. | |
| лайӑх | <adj><advl> | Вӑл лайӑх иҫет. | |
| лайӑхисем | <adj><subst><pl> | ||
| A2 | кӑвак | <adj> | |
| кӑвакрӑх | <adj><comp> | ||
| *кӑвак | <adj><advl> | ||
| кӑвак | <adj><subst><pl> | ||
| A3 | вилĕ | <adj> | |
| *вилĕрӑх, *вилĕтĕрĕх | <adj><comp> | ||
| *вилĕ | <adj><advl> | ||
| вилĕ | <adj><subst><pl> | ||
| A4 | тĕп | <adj> | |
| *тĕпрĕх, *тĕптĕрĕх | <adj><comp> | — | |
| *тĕп | <adj><advl> | — | |
| *тĕп | <adj><subst> | — | 
Kazakh
Tatar
Turkish
Adverbs
Postpositions
TODO: "postpositions" which take poss./case are nouns
Finite verbs
Non-finite verbs
This section outlines what categories of non-finite verb forms exist in Turkic, and how to identify the type of category created by a given affix.
Language specific issues
Turkmen: stem-final voiced and voiceless stops
In Turkmen, there are three types of stem-final stops:
- voiced stops
- voiceless stops
- stops that are voiceless syllable finally and voiced intervocalically
TODO: finish description of this and explain how it can be / is dealt with

