Difference between revisions of "Turkic lexicon"

From Apertium
Jump to navigation Jump to search
Line 103: Line 103:
 
| A1 || лайӑх "good" || {{tag|adj}} || Ку лайӑх кĕнеке.
 
| A1 || лайӑх "good" || {{tag|adj}} || Ку лайӑх кĕнеке.
 
|-
 
|-
| || лайӑхтӑрӑх || {{tag|adj><comp}} || Ку лайӑхтӑрӑхче.
+
| || лайӑхтарах || {{tag|adj><comp}} || Ку лайӑхтарахчĕ.
 
|-
 
|-
| || лайӑх || {{tag|adj><advl}} || Вӑл лайӑх иҫет.
+
| || лайӑх || {{tag|adj><advl}} || Вӑл лайӑх ишет.
 
|-
 
|-
| || лайӑхисем || {{tag|adj><subst><pl}} ||
+
| || лайӑххисем || {{tag|adj><subst><pl}} ||
 
|-
 
|-
 
|-
 
|-
 
| A2 || кӑвак "blue" || {{tag|adj}} ||
 
| A2 || кӑвак "blue" || {{tag|adj}} ||
 
|-
 
|-
| || кӑвакрӑх || {{tag|adj><comp}} ||
+
| || кӑвакрах || {{tag|adj><comp}} ||
 
|-
 
|-
 
| || *кӑвак || {{tag|adj><advl}} ||
 
| || *кӑвак || {{tag|adj><advl}} ||
Line 121: Line 121:
 
| A3 || вилĕ "dead" || {{tag|adj}} ||
 
| A3 || вилĕ "dead" || {{tag|adj}} ||
 
|-
 
|-
| || *вилĕрӑх, *вилĕтĕрĕх || {{tag|adj><comp}} ||
+
| || *вилĕрех, *вилĕтерех || {{tag|adj><comp}} ||
 
|-
 
|-
 
| || *вилĕ || {{tag|adj><advl}} ||
 
| || *вилĕ || {{tag|adj><advl}} ||
Line 130: Line 130:
 
| A4 || тĕп "main" || {{tag|adj}} ||
 
| A4 || тĕп "main" || {{tag|adj}} ||
 
|-
 
|-
| || *тĕпрĕх, *тĕптĕрĕх || {{tag|adj><comp}} || &mdash;
+
| || *тĕпрех, *тĕптерех || {{tag|adj><comp}} || &mdash;
 
|-
 
|-
 
| || *тĕп || {{tag|adj><advl}} || &mdash;
 
| || *тĕп || {{tag|adj><advl}} || &mdash;
Line 136: Line 136:
 
| || *тĕп || {{tag|adj><subst}} || &mdash;
 
| || *тĕп || {{tag|adj><subst}} || &mdash;
 
|}
 
|}
 
   
 
===== Kazakh =====
 
===== Kazakh =====

Revision as of 13:01, 14 July 2012

Some notes on how to go about making a Turkic lexicon for use in Apertium.

Layout

General points:

  • The lexicon will be made in one file, it will have the suffix .lexc
  • The file will be laid out in the following order:
    1. The multicharacter symbols
    2. The Root lexicon, pointing to the stem lexicons
    3. The morphotactics (continuation lexica)
    4. The stem lexicons

Multicharacter symbols

Morphological categories must be encased in < and > tags. They may contain the letters a-z and numbers 0-9. In extreme cases they may include the letters A-Z They must begin with a letter, they may not begin with a number.

Examples:

  • %<n%> Noun
  • %<p3%> Third person
  • %<evid%> Evidential

For information on archiphonemes, see the corresponding page.

The list of symbols should be laid out in the following order:

  • The major parts of speech
  • The morphological categories
  • Archiphonemes
  • Other symbols, e.g. Morpheme boundary, ' ', '-' etc.

Every symbol should have a comment. The comments should line up.

Morphotactics

Naming continuation lexica

  • Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.
    • Examples: LEXICON N1, LEXICON DET-DEM, LEXICON ADV

What sorts of distinctions to make

TODO: TV vs. IV, Russian vs. non-Russian in Chuvash

Stem lexicons

TODO: Why stems go in lexicon and not infinitives

Lines in the stem lexicons should follow the following pattern:

  • Left side (lexical form)
  • Colon :
  • Right side (surface form)
  • Space
  • Continuation lexicon
  • Space
  • Semicolon ;
  • Space
  • Exclamation mark
  • Open quote "
  • Gloss (optional)
  • Close quote "

Example:

кӗнеке:кӗнек N2 ; ! "llibre, книга"

Morphophonology

TODO: px3 is sIn (and why)

Categorisation

Nominals

Compound Nouns

TODO: N-N compounds with <px3>

Adjectives

  • A1: adjectives that can be both substantivised and adverbialised;
    • All three readings (<adj>, <adj.subst> and <adj.advl>)
    • have comparison levels.
  • A2: derived/not fully lexicalised adjectives without adverbial reading
    • <adj> and <adj.subst> readings
    • have comparison levels.
  • A3: derived/not fully lexicalised adjectives without adverbial reading
    • so-called "predicatives" (бар, жоқ)
    • no comparison levels at all.
  • A4: "pure" adjectives
    • no adverbial and substantive readings,
    • no comparison levels;

Examples by language

Chuvash
Type Example Reading Phrase
A1 лайӑх "good" <adj> Ку лайӑх кĕнеке.
лайӑхтарах <adj><comp> Ку лайӑхтарахчĕ.
лайӑх <adj><advl> Вӑл лайӑх ишет.
лайӑххисем <adj><subst><pl>
A2 кӑвак "blue" <adj>
кӑвакрах <adj><comp>
*кӑвак <adj><advl>
кӑвак <adj><subst><pl>
A3 вилĕ "dead" <adj>
*вилĕрех, *вилĕтерех <adj><comp>
*вилĕ <adj><advl>
вилĕ <adj><subst><pl>
A4 тĕп "main" <adj>
*тĕпрех, *тĕптерех <adj><comp>
*тĕп <adj><advl>
*тĕп <adj><subst>
Kazakh
Tatar
Turkish

Adverbs

Postpositions

TODO: "postpositions" which take poss./case are nouns

Finite verbs

Non-finite verbs

This section outlines what categories of non-finite verb forms exist in Turkic, and how to identify the type of category created by a given affix.

Language specific issues

Turkmen: stem-final voiced and voiceless stops

In Turkmen, there are three types of stem-final stops:

  • voiced stops
  • voiceless stops
  • stops that are voiceless syllable finally and voiced intervocalically

TODO: finish description of this and explain how it can be / is dealt with

Chuvash: Russian loans ending in -a with non-final stress