Difference between revisions of "Turkic lexicon"

From Apertium
Jump to navigation Jump to search
Line 42: Line 42:


===Stem lexicons===
===Stem lexicons===

{{comment|TODO: Why stems go in lexicon and not infinitives}}


Lines in the stem lexicons should follow the following pattern:
Lines in the stem lexicons should follow the following pattern:
Line 63: Line 65:
кӗнеке:кӗнек N2 ; ! "llibre, книга"
кӗнеке:кӗнек N2 ; ! "llibre, книга"
</pre>
</pre>

== Morphophonology ==
{{comment|TODO: <px3> = {s}{I}{n} (and why)}}


==Categorisation==
==Categorisation==
Line 69: Line 74:


==== Compound Nouns ====
==== Compound Nouns ====
* N-N compounds with <px3>
{{comment|TODO: N-N compounds with <px3>}}




Line 82: Line 87:
* voiceless stops
* voiceless stops
* stops that are voiceless syllable finally and voiced intervocalically
* stops that are voiceless syllable finally and voiced intervocalically
{{comment|TODO: finish description of this and explain how it can be / is dealt with it}}



[[Category:Turkic languages]]
[[Category:Turkic languages]]

Revision as of 04:09, 20 April 2012

Some notes on how to go about making a Turkic lexicon for use in Apertium.

Layout

General points:

  • The lexicon will be made in one file, it will have the suffix .lexc
  • The file will be laid out in the following order:
    1. The multicharacter symbols
    2. The Root lexicon, pointing to the stem lexicons
    3. The morphotactics (continuation lexica)
    4. The stem lexicons

Multicharacter symbols

Morphological categories must be encased in < and > tags. They may contain the letters a-z and numbers 0-9. In extreme cases they may include the letters A-Z They must begin with a letter, they may not begin with a number.

Examples:

  • %<n%> Noun
  • %<p3%> Third person
  • %<evid%> Evidential

For information on archiphonemes, see the corresponding page.

The list of symbols should be laid out in the following order:

  • The major parts of speech
  • The morphological categories
  • Archiphonemes
  • Other symbols, e.g. Morpheme boundary, ' ', '-' etc.

Every symbol should have a comment. The comments should line up.

Morphotactics

Naming continuation lexica

  • Continuation lexica will be named in upper case, and may contain letters, numbers and the symbol -.
    • Examples: LEXICON N1, LEXICON DET-DEM, LEXICON ADV

Stem lexicons

TODO: Why stems go in lexicon and not infinitives

Lines in the stem lexicons should follow the following pattern:

  • Left side (lexical form)
  • Colon :
  • Right side (surface form)
  • Space
  • Continuation lexicon
  • Space
  • Semicolon ;
  • Space
  • Exclamation mark
  • Open quote "
  • Gloss (optional)
  • Close quote "

Example:

кӗнеке:кӗнек N2 ; ! "llibre, книга"

Morphophonology

{{{1}}}

Categorisation

Nominals

Compound Nouns

TODO: N-N compounds with <px3>


Finite verbs

Non-finite verbs

Language specific issues

Turkmen: stem-final voiced and voiceless stops

In Turkmen, there are three types of stem-final stops:

  • voiced stops
  • voiceless stops
  • stops that are voiceless syllable finally and voiced intervocalically

TODO: finish description of this and explain how it can be / is dealt with it