Turkic lexicon
Some notes on how to go about making a Turkic lexicon for use in Apertium.
Layout
General points:
- The lexicon will be made in one file, it will have the suffix .lexc
- The file will be laid out in the following order:
- The multicharacter symbols
- The Rootlexicon, pointing to the stem lexicons
- The morphotactics (continuation lexica)
- The stem lexicons
 
Multicharacter symbols
Morphological categories must be encased in < and > tags. They may contain the letters a-z and numbers 0-9. In extreme cases they may include the letters A-Z They must begin with a letter, they may not begin with a number.
Examples:
- %<n%>Noun
- %<p3%>Third person
- %<evid%>Evidential
For information on archiphonemes, see the corresponding page.
The list of symbols should be laid out in the following order:
- The major parts of speech
- The morphological categories
- Archiphonemes
- Other symbols, e.g. Morpheme boundary, ' ', '-' etc.
Every symbol should have a comment. The comments should line up.
Morphotactics
Naming continuation lexica
- Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.- Examples: LEXICON N1,LEXICON DET-DEM,LEXICON ADV
 
- Examples: 
Stem lexicons
TODO: Why stems go in lexicon and not infinitives
Lines in the stem lexicons should follow the following pattern:
- Left side (lexical form)
- Colon :
- Right side (surface form)
- Space 
- Continuation lexicon
- Space 
- Semicolon ;
- Space 
- Exclamation mark
- Open quote "
- Gloss (optional)
- Close quote "
Example:
кӗнеке:кӗнек N2 ; ! "llibre, книга"
Morphophonology
{{{1}}}
Categorisation
Nominals
Compound Nouns
TODO: N-N compounds with <px3>
Finite verbs
Non-finite verbs
Language specific issues
Turkmen: stem-final voiced and voiceless stops
In Turkmen, there are three types of stem-final stops:
- voiced stops
- voiceless stops
- stops that are voiceless syllable finally and voiced intervocalically
TODO: finish description of this and explain how it can be / is dealt with it

