Difference between revisions of "Apertium-specific conventions for lexc"

From Apertium
Jump to navigation Jump to search
(lexc.vim)
m
 
(7 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  +
{{TOCD}}
For Apertium, we use the [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstLexcAndTwolcTutorial lexc] for certain transducers. There are some apertium-specific conventions we employ, outlined below.
 
  +
 
For Apertium, we use the [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstLexcAndTwolcTutorial lexc] for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of [[lttoolbox and lexc]] formats.
   
 
== Preferred format for stem definitions ==
 
== Preferred format for stem definitions ==
Line 11: Line 13:
 
== Conditions ==
 
== Conditions ==
 
There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.
 
There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.
  +
  +
We also use the comment <code>! TOCHECK</code> to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.
   
 
== Bracketed multi-character symbols ==
 
== Bracketed multi-character symbols ==
Line 24: Line 28:
 
Features are defined with square brackets, e.g. <code>%[%-coop%]</code>.
 
Features are defined with square brackets, e.g. <code>%[%-coop%]</code>.
   
== Syntax highlighting in vim ==
+
== Syntax highlighting and folding in vim ==
If you want to have lexc syntax highlighted in vim, you can use something like the following, which you should put in ~/.vim/syntax/lexc.vim (don't forget to put <code>au BufRead,BufNewFile *.lexc set filetype=lexc</code> in ~/.vim/ftdetect/lexc.vim).
+
If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at [https://github.com/jonorthwash/dotfiles/tree/master/vim this github address]. Feel free to fork, add features, and submit pull requests :)
<pre>
 
" Vim syntax file
 
" Language: lexc/twolc
 
" Maintainer: Jonathan Washington
 
" Last Change: 2014-09-28
 
" Version: 0.2
 
 
 
if version < 600
 
syntax clear
 
elseif exists("b:current_syntax")
 
finish
 
endif
 
 
" Keywords
 
syn keyword lexcLexicon LEXICON nextgroup=lexcLexiconB skipwhite
 
syn match lexcLexiconB "[a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú_\-][a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú0-9_\-]*" contained
 
syn match lexcLexiconB "[\#\_]"
 
 
" Identifiers
 
syn match lexcFlagDiacritic "@[^@][^@]*@"
 
 
" Symbols
 
syn match lexcSymbol +\\["'\\]+ contained
 
syn match lexcSymbol "[\:]"
 
 
" Comment
 
syn match lexcComment "\!.*$"
 
 
" Operators
 
syn match lexcOperator "[\.\*\+\?|\\\^]"
 
syn match lexcEscapedChar "%."
 
syn match lexcApertiumMorphBoundary "%>"
 
 
syn keyword lexcTodo contained TODO FIXME CHECK NOTE BUG
 
 
syn match lexcApertiumSpecial contained "Dir\/[LR][LR]"
 
syn match lexcApertiumSpecial contained "Use\/MT"
 
 
syn match lexcComment "\!.*$" contains=lexcApertiumSpecial,lexcTodo
 
 
" More Identifiers
 
" This stuff needs to come after the lexcEscapedChar
 
 
syn match lexcApertiumLeftBrackets "%[{<\[]"
 
syn match lexcApertiumRightBrackets "%[}>\]]"
 
 
syn match lexcApertiumMC "%[<{\[].\{-}%[>}\]]"hs=s+2,he=e-2 contains=lexcApertiumLeftBrackets,lexcApertiumRightBrackets
 
 
 
hi def link lexcLabel Label
 
hi def link lexcLexicon Statement
 
hi def link lexcLexiconB Function
 
hi def link lexcComment Comment
 
hi def link lexcOperator Operator
 
hi def link lexcFlagDiacritic Identifier
 
hi def link lexcString String
 
hi def link lexcSymbol String
 
hi def link lexcTodo Todo
 
hi def link lexcPointer Operator
 
hi def link lexcEscapedChar Delimiter
 
 
" Apertium-specific stuff
 
hi def link lexcApertiumMorphBoundary String
 
hi def link lexcApertiumSpecial PreCondit
 
hi def link lexcApertiumLeftBrackets Delimiter
 
hi def link lexcApertiumRightBrackets Delimiter
 
hi def link lexcApertiumMC Label
 
 
let b:current_syntax = "lexc"
 
</pre>
 
   
  +
Some other options are listed on the [[vim]] page.
   
  +
[[Category:Documentation in English]]
 
[[Category:Documentation]]
 
[[Category:Documentation]]
 
[[Category:lexc]]
 
[[Category:lexc]]
 
[[Category:HFST]]
 
[[Category:HFST]]
  +
[[Category:Writing dictionaries]]

Latest revision as of 11:48, 26 September 2016

For Apertium, we use the lexc for certain transducers. There are some apertium-specific conventions we employ, outlined below. This wiki also has a comparison of lttoolbox and lexc formats.

Preferred format for stem definitions[edit]

The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss", with optional following conditions, for example:

бул:бу DET-DEM ; ! "this" ! Dir/LR

Morpheme boundary[edit]

We use %> as a morpheme boundary indicator in lexc.

Conditions[edit]

There are a few special conditions we use: ! Dir/LR, ! Dir/RL, and ! Use/MT. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.

We also use the comment ! TOCHECK to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.

Bracketed multi-character symbols[edit]

We define certain types of multi-character symbols for various purposes.

Tags[edit]

Tags are defined with less-than and greater-than signs, e.g. %<pl%>.

Archiphonemes[edit]

Archiphonemes are defined with curly braces, e.g. %{A%}.

Features[edit]

Features are defined with square brackets, e.g. %[%-coop%].

Syntax highlighting and folding in vim[edit]

If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at this github address. Feel free to fork, add features, and submit pull requests :)

Some other options are listed on the vim page.