Difference between revisions of "Turkic lexicon"
Jump to navigation
Jump to search
Line 38: | Line 38: | ||
* Colon <code>:</code> |
* Colon <code>:</code> |
||
* Right side (surface form) |
* Right side (surface form) |
||
* Space <code> </code> |
|||
* Continuation lexicon |
* Continuation lexicon |
||
* Space <code> </code> |
|||
* Semicolon <code>;</code> |
* Semicolon <code>;</code> |
||
* Space <code> </code> |
* Space <code> </code> |
||
Line 45: | Line 47: | ||
* Gloss (optional) |
* Gloss (optional) |
||
* Close quote <code>"</code> |
* Close quote <code>"</code> |
||
Example: |
|||
<pre> |
|||
кӗнеке:кӗнек N2 ; ! "llibre, книга" |
|||
</pre> |
|||
==Categorisation== |
==Categorisation== |
Revision as of 03:40, 20 April 2012
Some notes on how to go about making a Turkic lexicon for use in Apertium.
Layout
General points:
- The lexicon will be made in one file, it will have the suffix
.lexc
- The file will be laid out in the following order:
- The multicharacter symbols
- The
Root
lexicon, pointing to the stem lexicons - The morphotactics (continuation lexica)
- The stem lexicons
Multicharacter symbols
The list of symbols should be laid out in the following order:
- The major parts of speech
- The morphological categories
- Archiphonemes
- Other symbols, e.g. Morpheme boundary, ' ', '-' etc.
Every symbol should have a comment. The comments should line up.
Morphotactics
Naming continuation lexica
- Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol
-
.- Examples:
LEXICON N1
,LEXICON DET-DEM
,LEXICON ADV
- Examples:
Stem lexicons
Lines in the stem lexicons should follow the following pattern:
- Left side (lexical form)
- Colon
:
- Right side (surface form)
- Space
- Continuation lexicon
- Space
- Semicolon
;
- Space
- Exclamation mark
- Open quote
"
- Gloss (optional)
- Close quote
"
Example:
кӗнеке:кӗнек N2 ; ! "llibre, книга"