Difference between revisions of "Apertium-specific conventions for lexc"

From Apertium
Jump to navigation Jump to search
m
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  +
{{TOCD}}
For Apertium, we use the [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstLexcAndTwolcTutorial lexc] for certain transducers. There are some apertium-specific conventions we employ, outlined below.
 
  +
 
For Apertium, we use the [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstLexcAndTwolcTutorial lexc] for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of [[lttoolbox and lexc]] formats.
   
 
== Preferred format for stem definitions ==
 
== Preferred format for stem definitions ==
Line 11: Line 13:
 
== Conditions ==
 
== Conditions ==
 
There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.
 
There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.
  +
  +
We also use the comment <code>! TOCHECK</code> to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.
   
 
== Bracketed multi-character symbols ==
 
== Bracketed multi-character symbols ==
Line 24: Line 28:
 
Features are defined with square brackets, e.g. <code>%[%-coop%]</code>.
 
Features are defined with square brackets, e.g. <code>%[%-coop%]</code>.
   
  +
== Syntax highlighting and folding in vim ==
  +
If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at [https://github.com/jonorthwash/dotfiles/tree/master/vim this github address]. Feel free to fork, add features, and submit pull requests :)
  +
  +
Some other options are listed on the [[vim]] page.
  +
  +
[[Category:Documentation in English]]
 
[[Category:Documentation]]
 
[[Category:Documentation]]
 
[[Category:lexc]]
 
[[Category:lexc]]
 
[[Category:HFST]]
 
[[Category:HFST]]
  +
[[Category:Writing dictionaries]]

Latest revision as of 11:48, 26 September 2016

For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of lttoolbox and lexc formats.

Preferred format for stem definitions[edit]

The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss", with optional following conditions, for example:

бул:бу DET-DEM ; ! "this" ! Dir/LR

Morpheme boundary[edit]

We use %> as a morpheme boundary indicator in lexc.

Conditions[edit]

There are a few special conditions we use: ! Dir/LR, ! Dir/RL, and ! Use/MT. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.

We also use the comment ! TOCHECK to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.

Bracketed multi-character symbols[edit]

We define certain types of multi-character symbols for various purposes.

Tags[edit]

Tags are defined with less-than and greater-than signs, e.g. %<pl%>.

Archiphonemes[edit]

Archiphonemes are defined with curly braces, e.g. %{A%}.

Features[edit]

Features are defined with square brackets, e.g. %[%-coop%].

Syntax highlighting and folding in vim[edit]

If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at this github address. Feel free to fork, add features, and submit pull requests :)

Some other options are listed on the vim page.