Apertium-specific conventions for lexc

From Apertium
Jump to: navigation, search

Contents

For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of lttoolbox and lexc formats.

[edit] Preferred format for stem definitions

The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss", with optional following conditions, for example:

бул:бу DET-DEM ; ! "this" ! Dir/LR

[edit] Morpheme boundary

We use %> as a morpheme boundary indicator in lexc.

[edit] Conditions

There are a few special conditions we use: ! Dir/LR, ! Dir/RL, and ! Use/MT. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.

We also use the comment ! TOCHECK to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.

[edit] Bracketed multi-character symbols

We define certain types of multi-character symbols for various purposes.

[edit] Tags

Tags are defined with less-than and greater-than signs, e.g. %<pl%>.

[edit] Archiphonemes

Archiphonemes are defined with curly braces, e.g. %{A%}.

[edit] Features

Features are defined with square brackets, e.g. %[%-coop%].

[edit] Syntax highlighting and folding in vim

If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at this github address. Feel free to fork, add features, and submit pull requests :)

Some other options are listed on the vim page.

Personal tools