Difference between revisions of "Turkic lexicon"
Line 103: | Line 103: | ||
| A1 || лайӑх "good" || {{tag|adj}} || Ку лайӑх кĕнеке. |
| A1 || лайӑх "good" || {{tag|adj}} || Ку лайӑх кĕнеке. |
||
|- |
|- |
||
| || |
| || лайӑхтарах || {{tag|adj><comp}} || Ку лайӑхтарахчĕ. |
||
|- |
|- |
||
| || лайӑх || {{tag|adj><advl}} || Вӑл лайӑх |
| || лайӑх || {{tag|adj><advl}} || Вӑл лайӑх ишет. |
||
|- |
|- |
||
| || |
| || лайӑххисем || {{tag|adj><subst><pl}} || |
||
|- |
|- |
||
|- |
|- |
||
| A2 || кӑвак "blue" || {{tag|adj}} || |
| A2 || кӑвак "blue" || {{tag|adj}} || |
||
|- |
|- |
||
| || |
| || кӑвакрах || {{tag|adj><comp}} || |
||
|- |
|- |
||
| || *кӑвак || {{tag|adj><advl}} || |
| || *кӑвак || {{tag|adj><advl}} || |
||
Line 121: | Line 121: | ||
| A3 || вилĕ "dead" || {{tag|adj}} || |
| A3 || вилĕ "dead" || {{tag|adj}} || |
||
|- |
|- |
||
| || * |
| || *вилĕрех, *вилĕтерех || {{tag|adj><comp}} || |
||
|- |
|- |
||
| || *вилĕ || {{tag|adj><advl}} || |
| || *вилĕ || {{tag|adj><advl}} || |
||
Line 130: | Line 130: | ||
| A4 || тĕп "main" || {{tag|adj}} || |
| A4 || тĕп "main" || {{tag|adj}} || |
||
|- |
|- |
||
| || * |
| || *тĕпрех, *тĕптерех || {{tag|adj><comp}} || — |
||
|- |
|- |
||
| || *тĕп || {{tag|adj><advl}} || — |
| || *тĕп || {{tag|adj><advl}} || — |
||
Line 136: | Line 136: | ||
| || *тĕп || {{tag|adj><subst}} || — |
| || *тĕп || {{tag|adj><subst}} || — |
||
|} |
|} |
||
===== Kazakh ===== |
===== Kazakh ===== |
Revision as of 13:01, 14 July 2012
Some notes on how to go about making a Turkic lexicon for use in Apertium.
Layout
General points:
- The lexicon will be made in one file, it will have the suffix
.lexc
- The file will be laid out in the following order:
- The multicharacter symbols
- The
Root
lexicon, pointing to the stem lexicons - The morphotactics (continuation lexica)
- The stem lexicons
Multicharacter symbols
Morphological categories must be encased in <
and >
tags. They may contain the letters a-z
and numbers 0-9
. In extreme cases they may include the letters A-Z
They must begin with a letter, they may not begin with a number.
Examples:
%<n%>
Noun%<p3%>
Third person%<evid%>
Evidential
For information on archiphonemes, see the corresponding page.
The list of symbols should be laid out in the following order:
- The major parts of speech
- The morphological categories
- Archiphonemes
- Other symbols, e.g. Morpheme boundary, ' ', '-' etc.
Every symbol should have a comment. The comments should line up.
Morphotactics
Naming continuation lexica
- Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol
-
.- Examples:
LEXICON N1
,LEXICON DET-DEM
,LEXICON ADV
- Examples:
What sorts of distinctions to make
TODO: TV vs. IV, Russian vs. non-Russian in Chuvash
Stem lexicons
TODO: Why stems go in lexicon and not infinitives
Lines in the stem lexicons should follow the following pattern:
- Left side (lexical form)
- Colon
:
- Right side (surface form)
- Space
- Continuation lexicon
- Space
- Semicolon
;
- Space
- Exclamation mark
- Open quote
"
- Gloss (optional)
- Close quote
"
Example:
кӗнеке:кӗнек N2 ; ! "llibre, книга"
Morphophonology
TODO: px3 is sIn (and why)
Categorisation
Nominals
Compound Nouns
TODO: N-N compounds with <px3>
Adjectives
- A1: adjectives that can be both substantivised and adverbialised;
- All three readings (<adj>, <adj.subst> and <adj.advl>)
- have comparison levels.
- A2: derived/not fully lexicalised adjectives without adverbial reading
- <adj> and <adj.subst> readings
- have comparison levels.
- A3: derived/not fully lexicalised adjectives without adverbial reading
- so-called "predicatives" (бар, жоқ)
- no comparison levels at all.
- A4: "pure" adjectives
- no adverbial and substantive readings,
- no comparison levels;
Examples by language
Chuvash
Type | Example | Reading | Phrase |
---|---|---|---|
A1 | лайӑх "good" | <adj> |
Ку лайӑх кĕнеке. |
лайӑхтарах | <adj><comp> |
Ку лайӑхтарахчĕ. | |
лайӑх | <adj><advl> |
Вӑл лайӑх ишет. | |
лайӑххисем | <adj><subst><pl> |
||
A2 | кӑвак "blue" | <adj> |
|
кӑвакрах | <adj><comp> |
||
*кӑвак | <adj><advl> |
||
кӑвак | <adj><subst><pl> |
||
A3 | вилĕ "dead" | <adj> |
|
*вилĕрех, *вилĕтерех | <adj><comp> |
||
*вилĕ | <adj><advl> |
||
вилĕ | <adj><subst><pl> |
||
A4 | тĕп "main" | <adj> |
|
*тĕпрех, *тĕптерех | <adj><comp> |
— | |
*тĕп | <adj><advl> |
— | |
*тĕп | <adj><subst> |
— |
Kazakh
Tatar
Turkish
Adverbs
Postpositions
TODO: "postpositions" which take poss./case are nouns
Finite verbs
Non-finite verbs
This section outlines what categories of non-finite verb forms exist in Turkic, and how to identify the type of category created by a given affix.
Language specific issues
Turkmen: stem-final voiced and voiceless stops
In Turkmen, there are three types of stem-final stops:
- voiced stops
- voiceless stops
- stops that are voiceless syllable finally and voiced intervocalically
TODO: finish description of this and explain how it can be / is dealt with