Difference between revisions of "Turkic lexicon"

From Apertium
Jump to navigation Jump to search
Line 38: Line 38:
* Colon <code>:</code>
* Colon <code>:</code>
* Right side (surface form)
* Right side (surface form)
* Space <code> </code>
* Continuation lexicon
* Continuation lexicon
* Space <code> </code>
* Semicolon <code>;</code>
* Semicolon <code>;</code>
* Space <code> </code>
* Space <code> </code>
Line 45: Line 47:
* Gloss (optional)
* Gloss (optional)
* Close quote <code>"</code>
* Close quote <code>"</code>

Example:

<pre>
кӗнеке:кӗнек N2 ; ! "llibre, книга"
</pre>


==Categorisation==
==Categorisation==

Revision as of 03:40, 20 April 2012

Some notes on how to go about making a Turkic lexicon for use in Apertium.

Layout

General points:

  • The lexicon will be made in one file, it will have the suffix .lexc
  • The file will be laid out in the following order:
    1. The multicharacter symbols
    2. The Root lexicon, pointing to the stem lexicons
    3. The morphotactics (continuation lexica)
    4. The stem lexicons

Multicharacter symbols

The list of symbols should be laid out in the following order:

  • The major parts of speech
  • The morphological categories
  • Archiphonemes
  • Other symbols, e.g. Morpheme boundary, ' ', '-' etc.

Every symbol should have a comment. The comments should line up.

Morphotactics

Naming continuation lexica

  • Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.
    • Examples: LEXICON N1, LEXICON DET-DEM, LEXICON ADV

Stem lexicons

Lines in the stem lexicons should follow the following pattern:

  • Left side (lexical form)
  • Colon :
  • Right side (surface form)
  • Space
  • Continuation lexicon
  • Space
  • Semicolon ;
  • Space
  • Exclamation mark
  • Open quote "
  • Gloss (optional)
  • Close quote "

Example:

кӗнеке:кӗнек N2 ; ! "llibre, книга"

Categorisation

Nominals

Finite verbs

Non-finite verbs

Language specific issues