Difference between revisions of "Apertium-specific conventions for lexc"
m |
|||
(3 intermediate revisions by 3 users not shown) | |||
Line 13: | Line 13: | ||
== Conditions == |
== Conditions == |
||
There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments. |
There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments. |
||
+ | |||
+ | We also use the comment <code>! TOCHECK</code> to indicate that a stem needs to be verified for accuracy, spelling, classification, etc. |
||
== Bracketed multi-character symbols == |
== Bracketed multi-character symbols == |
||
Line 27: | Line 29: | ||
== Syntax highlighting and folding in vim == |
== Syntax highlighting and folding in vim == |
||
− | If you want to have lexc syntax |
+ | If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at [https://github.com/jonorthwash/dotfiles/tree/master/vim this github address]. Feel free to fork, add features, and submit pull requests :) |
− | <pre> |
||
− | " Vim syntax file |
||
− | " Language: lexc/twolc |
||
− | " Maintainer: Jonathan Washington |
||
− | " Last Change: 2014-09-28 |
||
− | " Version: 0.2 |
||
− | |||
− | |||
− | if version < 600 |
||
− | syntax clear |
||
− | elseif exists("b:current_syntax") |
||
− | finish |
||
− | endif |
||
− | |||
− | " Keywords |
||
− | syn keyword lexcLexicon LEXICON nextgroup=lexcLexiconB skipwhite |
||
− | syn match lexcLexiconB "[a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú_\-][a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú0-9_\-]*" contained |
||
− | syn match lexcLexiconB "[\#\_]" |
||
− | |||
− | " Identifiers |
||
− | syn match lexcFlagDiacritic "@[^@][^@]*@" |
||
− | |||
− | " Symbols |
||
− | syn match lexcSymbol +\\["'\\]+ contained |
||
− | syn match lexcSymbol "[\:]" |
||
− | |||
− | " Comment |
||
− | syn match lexcComment "\!.*$" |
||
− | |||
− | " Operators |
||
− | syn match lexcOperator "[\.\*\+\?|\\\^]" |
||
− | syn match lexcEscapedChar "%." |
||
− | syn match lexcApertiumMorphBoundary "%>" |
||
− | |||
− | syn keyword lexcTodo contained TODO FIXME CHECK NOTE BUG |
||
− | |||
− | syn match lexcApertiumSpecial contained "Dir\/[LR][LR]" |
||
− | syn match lexcApertiumSpecial contained "Use\/MT" |
||
− | |||
− | syn match lexcComment "\!.*$" contains=lexcApertiumSpecial,lexcTodo |
||
− | |||
− | " More Identifiers |
||
− | " This stuff needs to come after the lexcEscapedChar |
||
− | |||
− | syn match lexcApertiumLeftBrackets "%[{<\[]" |
||
− | syn match lexcApertiumRightBrackets "%[}>\]]" |
||
− | |||
− | syn match lexcApertiumMC "%[<{\[].\{-}%[>}\]]"hs=s+2,he=e-2 contains=lexcApertiumLeftBrackets,lexcApertiumRightBrackets |
||
− | |||
− | |||
− | hi def link lexcLabel Label |
||
− | hi def link lexcLexicon Statement |
||
− | hi def link lexcLexiconB Function |
||
− | hi def link lexcComment Comment |
||
− | hi def link lexcOperator Operator |
||
− | hi def link lexcFlagDiacritic Identifier |
||
− | hi def link lexcString String |
||
− | hi def link lexcSymbol String |
||
− | hi def link lexcTodo Todo |
||
− | hi def link lexcPointer Operator |
||
− | hi def link lexcEscapedChar Delimiter |
||
− | |||
− | " Apertium-specific stuff |
||
− | hi def link lexcApertiumMorphBoundary String |
||
− | hi def link lexcApertiumSpecial PreCondit |
||
− | hi def link lexcApertiumLeftBrackets Delimiter |
||
− | hi def link lexcApertiumRightBrackets Delimiter |
||
− | hi def link lexcApertiumMC Label |
||
− | |||
− | let b:current_syntax = "lexc" |
||
− | </pre> |
||
− | |||
− | The ab;ove file is maintained at [https://github.com/jonorthwash/dotfiles/tree/master/vim this github address]. Feel free to fork, add features, and submit pull requests :) |
||
Some other options are listed on the [[vim]] page. |
Some other options are listed on the [[vim]] page. |
||
+ | [[Category:Documentation in English]] |
||
− | |||
[[Category:Documentation]] |
[[Category:Documentation]] |
||
[[Category:lexc]] |
[[Category:lexc]] |
||
[[Category:HFST]] |
[[Category:HFST]] |
||
+ | [[Category:Writing dictionaries]] |
Latest revision as of 11:48, 26 September 2016
For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of lttoolbox and lexc formats.
Preferred format for stem definitions[edit]
The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss"
, with optional following conditions, for example:
бул:бу DET-DEM ; ! "this" ! Dir/LR
Morpheme boundary[edit]
We use %>
as a morpheme boundary indicator in lexc.
Conditions[edit]
There are a few special conditions we use: ! Dir/LR
, ! Dir/RL
, and ! Use/MT
. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.
We also use the comment ! TOCHECK
to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.
Bracketed multi-character symbols[edit]
We define certain types of multi-character symbols for various purposes.
Tags[edit]
Tags are defined with less-than and greater-than signs, e.g. %<pl%>
.
Archiphonemes[edit]
Archiphonemes are defined with curly braces, e.g. %{A%}
.
Features[edit]
Features are defined with square brackets, e.g. %[%-coop%]
.
Syntax highlighting and folding in vim[edit]
If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at this github address. Feel free to fork, add features, and submit pull requests :)
Some other options are listed on the vim page.