Difference between revisions of "Apertium-specific conventions for lexc"
Firespeaker (talk | contribs) |
Firespeaker (talk | contribs) (lexc.vim) |
||
Line 23: | Line 23: | ||
=== Features === |
=== Features === |
||
Features are defined with square brackets, e.g. <code>%[%-coop%]</code>. |
Features are defined with square brackets, e.g. <code>%[%-coop%]</code>. |
||
== Syntax highlighting in vim == |
|||
If you want to have lexc syntax highlighted in vim, you can use something like the following, which you should put in ~/.vim/syntax/lexc.vim (don't forget to put <code>au BufRead,BufNewFile *.lexc set filetype=lexc</code> in ~/.vim/ftdetect/lexc.vim). |
|||
<pre> |
|||
" Vim syntax file |
|||
" Language: lexc/twolc |
|||
" Maintainer: Jonathan Washington |
|||
" Last Change: 2014-09-28 |
|||
" Version: 0.2 |
|||
if version < 600 |
|||
syntax clear |
|||
elseif exists("b:current_syntax") |
|||
finish |
|||
endif |
|||
" Keywords |
|||
syn keyword lexcLexicon LEXICON nextgroup=lexcLexiconB skipwhite |
|||
syn match lexcLexiconB "[a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú_\-][a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú0-9_\-]*" contained |
|||
syn match lexcLexiconB "[\#\_]" |
|||
" Identifiers |
|||
syn match lexcFlagDiacritic "@[^@][^@]*@" |
|||
" Symbols |
|||
syn match lexcSymbol +\\["'\\]+ contained |
|||
syn match lexcSymbol "[\:]" |
|||
" Comment |
|||
syn match lexcComment "\!.*$" |
|||
" Operators |
|||
syn match lexcOperator "[\.\*\+\?|\\\^]" |
|||
syn match lexcEscapedChar "%." |
|||
syn match lexcApertiumMorphBoundary "%>" |
|||
syn keyword lexcTodo contained TODO FIXME CHECK NOTE BUG |
|||
syn match lexcApertiumSpecial contained "Dir\/[LR][LR]" |
|||
syn match lexcApertiumSpecial contained "Use\/MT" |
|||
syn match lexcComment "\!.*$" contains=lexcApertiumSpecial,lexcTodo |
|||
" More Identifiers |
|||
" This stuff needs to come after the lexcEscapedChar |
|||
syn match lexcApertiumLeftBrackets "%[{<\[]" |
|||
syn match lexcApertiumRightBrackets "%[}>\]]" |
|||
syn match lexcApertiumMC "%[<{\[].\{-}%[>}\]]"hs=s+2,he=e-2 contains=lexcApertiumLeftBrackets,lexcApertiumRightBrackets |
|||
hi def link lexcLabel Label |
|||
hi def link lexcLexicon Statement |
|||
hi def link lexcLexiconB Function |
|||
hi def link lexcComment Comment |
|||
hi def link lexcOperator Operator |
|||
hi def link lexcFlagDiacritic Identifier |
|||
hi def link lexcString String |
|||
hi def link lexcSymbol String |
|||
hi def link lexcTodo Todo |
|||
hi def link lexcPointer Operator |
|||
hi def link lexcEscapedChar Delimiter |
|||
" Apertium-specific stuff |
|||
hi def link lexcApertiumMorphBoundary String |
|||
hi def link lexcApertiumSpecial PreCondit |
|||
hi def link lexcApertiumLeftBrackets Delimiter |
|||
hi def link lexcApertiumRightBrackets Delimiter |
|||
hi def link lexcApertiumMC Label |
|||
let b:current_syntax = "lexc" |
|||
</pre> |
|||
[[Category:Documentation]] |
[[Category:Documentation]] |
Revision as of 01:19, 29 September 2014
For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below.
Contents
Preferred format for stem definitions
The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss"
, with optional following conditions, for example:
бул:бу DET-DEM ; ! "this" ! Dir/LR
Morpheme boundary
We use %>
as a morpheme boundary indicator in lexc.
Conditions
There are a few special conditions we use: ! Dir/LR
, ! Dir/RL
, and ! Use/MT
. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.
Bracketed multi-character symbols
We define certain types of multi-character symbols for various purposes.
Tags
Tags are defined with less-than and greater-than signs, e.g. %<pl%>
.
Archiphonemes
Archiphonemes are defined with curly braces, e.g. %{A%}
.
Features
Features are defined with square brackets, e.g. %[%-coop%]
.
Syntax highlighting in vim
If you want to have lexc syntax highlighted in vim, you can use something like the following, which you should put in ~/.vim/syntax/lexc.vim (don't forget to put au BufRead,BufNewFile *.lexc set filetype=lexc
in ~/.vim/ftdetect/lexc.vim).
" Vim syntax file " Language: lexc/twolc " Maintainer: Jonathan Washington " Last Change: 2014-09-28 " Version: 0.2 if version < 600 syntax clear elseif exists("b:current_syntax") finish endif " Keywords syn keyword lexcLexicon LEXICON nextgroup=lexcLexiconB skipwhite syn match lexcLexiconB "[a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú_\-][a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú0-9_\-]*" contained syn match lexcLexiconB "[\#\_]" " Identifiers syn match lexcFlagDiacritic "@[^@][^@]*@" " Symbols syn match lexcSymbol +\\["'\\]+ contained syn match lexcSymbol "[\:]" " Comment syn match lexcComment "\!.*$" " Operators syn match lexcOperator "[\.\*\+\?|\\\^]" syn match lexcEscapedChar "%." syn match lexcApertiumMorphBoundary "%>" syn keyword lexcTodo contained TODO FIXME CHECK NOTE BUG syn match lexcApertiumSpecial contained "Dir\/[LR][LR]" syn match lexcApertiumSpecial contained "Use\/MT" syn match lexcComment "\!.*$" contains=lexcApertiumSpecial,lexcTodo " More Identifiers " This stuff needs to come after the lexcEscapedChar syn match lexcApertiumLeftBrackets "%[{<\[]" syn match lexcApertiumRightBrackets "%[}>\]]" syn match lexcApertiumMC "%[<{\[].\{-}%[>}\]]"hs=s+2,he=e-2 contains=lexcApertiumLeftBrackets,lexcApertiumRightBrackets hi def link lexcLabel Label hi def link lexcLexicon Statement hi def link lexcLexiconB Function hi def link lexcComment Comment hi def link lexcOperator Operator hi def link lexcFlagDiacritic Identifier hi def link lexcString String hi def link lexcSymbol String hi def link lexcTodo Todo hi def link lexcPointer Operator hi def link lexcEscapedChar Delimiter " Apertium-specific stuff hi def link lexcApertiumMorphBoundary String hi def link lexcApertiumSpecial PreCondit hi def link lexcApertiumLeftBrackets Delimiter hi def link lexcApertiumRightBrackets Delimiter hi def link lexcApertiumMC Label let b:current_syntax = "lexc"