Difference between revisions of "Apertium-specific conventions for lexc"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
   
For Apertium, we use the [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstLexcAndTwolcTutorial lexc] for certain transducers. There are some apertium-specific conventions we employ, outlined below.
+
For Apertium, we use the [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstLexcAndTwolcTutorial lexc] for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of [[lttoolbox and lexc]] formats.
   
 
== Preferred format for stem definitions ==
 
== Preferred format for stem definitions ==
Line 99: Line 99:
 
let b:current_syntax = "lexc"
 
let b:current_syntax = "lexc"
 
</pre>
 
</pre>
  +
  +
Some other options are listed on the [[vim]] page.
   
   

Revision as of 05:51, 22 November 2014

For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of lttoolbox and lexc formats.

Preferred format for stem definitions

The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss", with optional following conditions, for example:

бул:бу DET-DEM ; ! "this" ! Dir/LR

Morpheme boundary

We use %> as a morpheme boundary indicator in lexc.

Conditions

There are a few special conditions we use: ! Dir/LR, ! Dir/RL, and ! Use/MT. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.

Bracketed multi-character symbols

We define certain types of multi-character symbols for various purposes.

Tags

Tags are defined with less-than and greater-than signs, e.g. %<pl%>.

Archiphonemes

Archiphonemes are defined with curly braces, e.g. %{A%}.

Features

Features are defined with square brackets, e.g. %[%-coop%].

Syntax highlighting in vim

If you want to have lexc syntax highlighted in vim, you can use something like the following, which you should put in ~/.vim/syntax/lexc.vim (don't forget to put au BufRead,BufNewFile *.lexc set filetype=lexc in ~/.vim/ftdetect/lexc.vim).

" Vim syntax file
" Language: lexc/twolc
" Maintainer: Jonathan Washington
" Last Change: 2014-09-28
" Version: 0.2
 
 
if version < 600
  syntax clear
elseif exists("b:current_syntax")
  finish
endif
 
" Keywords
syn keyword lexcLexicon LEXICON nextgroup=lexcLexiconB skipwhite
syn match lexcLexiconB "[a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú_\-][a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú0-9_\-]*" contained
syn match lexcLexiconB "[\#\_]"
 
" Identifiers
syn match lexcFlagDiacritic   "@[^@][^@]*@"
 
" Symbols
syn match lexcSymbol +\\["'\\]+ contained
syn match lexcSymbol "[\:]"
 
" Comment
syn match lexcComment "\!.*$"
 
" Operators
syn match lexcOperator "[\.\*\+\?|\\\^]"
syn match lexcEscapedChar "%."
syn match lexcApertiumMorphBoundary "%>"

syn keyword lexcTodo contained TODO FIXME CHECK NOTE BUG

syn match lexcApertiumSpecial contained "Dir\/[LR][LR]"
syn match lexcApertiumSpecial contained "Use\/MT"

syn match lexcComment "\!.*$" contains=lexcApertiumSpecial,lexcTodo

" More Identifiers
" This stuff needs to come after the lexcEscapedChar

syn match lexcApertiumLeftBrackets "%[{<\[]"
syn match lexcApertiumRightBrackets "%[}>\]]"

syn match lexcApertiumMC "%[<{\[].\{-}%[>}\]]"hs=s+2,he=e-2 contains=lexcApertiumLeftBrackets,lexcApertiumRightBrackets

 
hi def link lexcLabel                   Label
hi def link lexcLexicon                 Statement
hi def link lexcLexiconB                Function
hi def link lexcComment                 Comment
hi def link lexcOperator                Operator
hi def link lexcFlagDiacritic           Identifier
hi def link lexcString                  String
hi def link lexcSymbol                  String
hi def link lexcTodo                    Todo
hi def link lexcPointer                 Operator
hi def link lexcEscapedChar             Delimiter

" Apertium-specific stuff
hi def link lexcApertiumMorphBoundary   String
hi def link lexcApertiumSpecial         PreCondit
hi def link lexcApertiumLeftBrackets    Delimiter
hi def link lexcApertiumRightBrackets   Delimiter
hi def link lexcApertiumMC              Label
 
let b:current_syntax = "lexc"

Some other options are listed on the vim page.