Difference between revisions of "Turkic lexicon"
Jump to navigation
Jump to search
Line 15: | Line 15: | ||
===Multicharacter symbols=== |
===Multicharacter symbols=== |
||
Morphological categories must be encased in <code><</code> and <code>></code> tags. They may contain the letters |
Morphological categories must be encased in <code><</code> and <code>></code> tags. They may contain the letters <code>a-z</code> and numbers <code>0-9</code>. In extreme cases they may include the letters <code>A-Z</code> They must begin with a letter, they may not begin with a number. |
||
Examples: |
Examples: |
||
* <code>%<n%></code> |
* <code>%<n%></code> Noun |
||
* <code>%<p3%></code> Third person |
|||
* <code>%<evid%></code> Evidential |
|||
For information on archiphonemes, see the [[Archiphonemes|corresponding page]]. |
For information on archiphonemes, see the [[Archiphonemes|corresponding page]]. |
Revision as of 03:48, 20 April 2012
Some notes on how to go about making a Turkic lexicon for use in Apertium.
Layout
General points:
- The lexicon will be made in one file, it will have the suffix
.lexc
- The file will be laid out in the following order:
- The multicharacter symbols
- The
Root
lexicon, pointing to the stem lexicons - The morphotactics (continuation lexica)
- The stem lexicons
Multicharacter symbols
Morphological categories must be encased in <
and >
tags. They may contain the letters a-z
and numbers 0-9
. In extreme cases they may include the letters A-Z
They must begin with a letter, they may not begin with a number.
Examples:
%<n%>
Noun%<p3%>
Third person%<evid%>
Evidential
For information on archiphonemes, see the corresponding page.
The list of symbols should be laid out in the following order:
- The major parts of speech
- The morphological categories
- Archiphonemes
- Other symbols, e.g. Morpheme boundary, ' ', '-' etc.
Every symbol should have a comment. The comments should line up.
Morphotactics
Naming continuation lexica
- Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol
-
.- Examples:
LEXICON N1
,LEXICON DET-DEM
,LEXICON ADV
- Examples:
Stem lexicons
Lines in the stem lexicons should follow the following pattern:
- Left side (lexical form)
- Colon
:
- Right side (surface form)
- Space
- Continuation lexicon
- Space
- Semicolon
;
- Space
- Exclamation mark
- Open quote
"
- Gloss (optional)
- Close quote
"
Example:
кӗнеке:кӗнек N2 ; ! "llibre, книга"