Difference between revisions of "Apertium-specific conventions for lexc"

Revision as of 22:49, 28 September 2014

For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below.

Preferred format for stem definitions

The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss", with optional following conditions, for example:

бул:бу DET-DEM ; ! "this" ! Dir/LR

Morpheme boundary

We use %> as a morpheme boundary indicator in lexc.

Conditions

There are a few special conditions we use: ! Dir/LR, ! Dir/RL, and ! Use/MT. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.

Bracketed multi-character symbols

We define certain types of multi-character symbols for various purposes.

Archiphonemes

Archiphonemes are defined with curly braces, e.g. %{A%}.

Features

Features are defined with square brackets, e.g. %[%-coop%].

@@ Line 5: / Line 5: @@
 <code>бул:бу DET-DEM ; ! "this" ! Dir/LR</code>
+== Morpheme boundary ==
+We use <code>%&gt;</code> as a morpheme boundary indicator in lexc.
 == Conditions ==

Difference between revisions of "Apertium-specific conventions for lexc"

Revision as of 22:49, 28 September 2014

Contents

Preferred format for stem definitions

Morpheme boundary

Conditions

Bracketed multi-character symbols

Tags

Archiphonemes

Features

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools