Difference between revisions of "Apertium-specific conventions for lexc"

Latest revision as of 11:48, 26 September 2016

Preferred format for stem definitions[edit]

The preferred format for stem definitions is underlying:surface CLASS ; ! "gloss", with optional following conditions, for example:

бул:бу DET-DEM ; ! "this" ! Dir/LR

Morpheme boundary[edit]

We use %> as a morpheme boundary indicator in lexc.

Conditions[edit]

There are a few special conditions we use: ! Dir/LR, ! Dir/RL, and ! Use/MT. These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers. Otherwise lexc simply interprets these as comments.

We also use the comment ! TOCHECK to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.

Bracketed multi-character symbols[edit]

We define certain types of multi-character symbols for various purposes.

Tags[edit]

Tags are defined with less-than and greater-than signs, e.g. %<pl%>.

Archiphonemes[edit]

Archiphonemes are defined with curly braces, e.g. %{A%}.

Features[edit]

Features are defined with square brackets, e.g. %[%-coop%].

Syntax highlighting and folding in vim[edit]

If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at this github address. Feel free to fork, add features, and submit pull requests :)

Some other options are listed on the vim page.

@@ Line 13: / Line 13: @@
 == Conditions ==
 There are a few special conditions we use: <code>! Dir/LR</code>, <code>! Dir/RL</code>, and <code>! Use/MT</code>.  These allow us to grep out lines to have different right-to-left and left-to-right transducers, and also have separate MT-specific and vanilla transducers.  Otherwise lexc simply interprets these as comments.
+We also use the comment <code>! TOCHECK</code> to indicate that a stem needs to be verified for accuracy, spelling, classification, etc.
 == Bracketed multi-character symbols ==
@@ Line 26: / Line 28: @@
 Features are defined with square brackets, e.g. <code>%[%-coop%]</code>.
-== Syntax highlighting in vim ==
+== Syntax highlighting and folding in vim ==
-If you want to have lexc syntax highlighted in vim, you can use something like the following, which you should put in ~/.vim/syntax/lexc.vim (don't forget to put <code>au BufRead,BufNewFile *.lexc set filetype=lexc</code> in ~/.vim/ftdetect/lexc.vim).
+If you want to have lexc syntax highlighting and/or folding in vim, you can get latest version of lexc vim plugin at [https://github.com/jonorthwash/dotfiles/tree/master/vim this github address].  Feel free to fork, add features, and submit pull requests :)
-<pre>
-" Vim syntax file
-" Language: lexc/twolc
-" Maintainer: Jonathan Washington
-" Last Change: 2014-09-28
-" Version: 0.2
-if version < 600
-  syntax clear
-elseif exists("b:current_syntax")
-  finish
-endif
-" Keywords
-syn keyword lexcLexicon LEXICON nextgroup=lexcLexiconB skipwhite
-syn match lexcLexiconB "[a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú_\-][a-zA-ZА-ЯӐ-ӲҪÁ-Úá-ú0-9_\-]*" contained
-syn match lexcLexiconB "[\#\_]"
-" Identifiers
-syn match lexcFlagDiacritic   "@[^@][^@]*@"
-" Symbols
-syn match lexcSymbol +\\["'\\]+ contained
-syn match lexcSymbol "[\:]"
-" Comment
-syn match lexcComment "\!.*$"
-" Operators
-syn match lexcOperator "[\.\*\+\?|\\\^]"
-syn match lexcEscapedChar "%."
-syn match lexcApertiumMorphBoundary "%>"
-syn keyword lexcTodo contained TODO FIXME CHECK NOTE BUG
-syn match lexcApertiumSpecial contained "Dir\/[LR][LR]"
-syn match lexcApertiumSpecial contained "Use\/MT"
-syn match lexcComment "\!.*$" contains=lexcApertiumSpecial,lexcTodo
-" More Identifiers
-" This stuff needs to come after the lexcEscapedChar
-syn match lexcApertiumLeftBrackets "%[{<\[]"
-syn match lexcApertiumRightBrackets "%[}>\]]"
-syn match lexcApertiumMC "%[<{\[].\{-}%[>}\]]"hs=s+2,he=e-2 contains=lexcApertiumLeftBrackets,lexcApertiumRightBrackets
-hi def link lexcLabel                   Label
-hi def link lexcLexicon                 Statement
-hi def link lexcLexiconB                Function
-hi def link lexcComment                 Comment
-hi def link lexcOperator                Operator
-hi def link lexcFlagDiacritic           Identifier
-hi def link lexcString                  String
-hi def link lexcSymbol                  String
-hi def link lexcTodo                    Todo
-hi def link lexcPointer                 Operator
-hi def link lexcEscapedChar             Delimiter
-" Apertium-specific stuff
-hi def link lexcApertiumMorphBoundary   String
-hi def link lexcApertiumSpecial         PreCondit
-hi def link lexcApertiumLeftBrackets    Delimiter
-hi def link lexcApertiumRightBrackets   Delimiter
-hi def link lexcApertiumMC              Label
-let b:current_syntax = "lexc"
-</pre>
 Some other options are listed on the [[vim]] page.
+[[Category:Documentation in English]]
 [[Category:Documentation]]
 [[Category:lexc]]
 [[Category:HFST]]
+[[Category:Writing dictionaries]]

Difference between revisions of "Apertium-specific conventions for lexc"

Latest revision as of 11:48, 26 September 2016

Contents

Preferred format for stem definitions[edit]

Morpheme boundary[edit]

Conditions[edit]

Bracketed multi-character symbols[edit]

Tags[edit]

Archiphonemes[edit]

Features[edit]

Syntax highlighting and folding in vim[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools