Voikkospell
Contents
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
. voikkospell then expects apertium-stream-formatted input instead of a list of words.
Words in apertium Stream Format
apertium stream format encodes words as lexical units. Each begins with a ^
^ . . .
and ends with a $
.
^ . . . $
The word immediately follows the ^
,
^word . . .$
and a /
immediately follows the word. If the word is unknown, *word
follows;
^word/*word$
otherwise, all the word's analyses follow, delimited by /
's.
^word/word<n><sg>/word<vblex><inf>/word<vblex><pres>$
Escaping
To use ^
, $
, /
, <
, and >
as characters, one must escape them. Each escape sequence begins with a \
,
\ . . .
and a character follows. voikkospell then interprets the character literally. Note that the character can be any wide character, including newlines.
To use \
's as characters, one must escape them.
Superblanks
One can also escape multiple characters not encoded in lexical units by encoding them as a superblank. Each superblank begins with a [
[ . . .
and ends with a ]
.
[ . . . ]
Each ^
, $
, /
, <
, and >
between the [
and the ]
is interpreted literally.
To use [
and ]
as characters, one must escape them.
An HTML Example
Let's spellcheck the following webpage.
<!DOCTYPE html> <html> <head> <title>An HTML Example</title> </head> <body> <p> This is an HTML example. </p> </body> </html>
Running apertium-deshtml
on it yields the following.
.[][<!DOCTYPE html> <html> <head> <title>]An HTML Example.[][<\/title> <\/head> <body> <p> ]This is an HTML example..[][ <\/p> <\/body> <\/html> ]
Note that all the <
's, >
's, and /
's are encoded as superblanks. In fact, everything except the title and body paragraph is escaped. However, those words are not yet encoded as lexical units. Running lt-proc
on the output yields the following, suitable for voikkospell --apertium-stream
.
^./.<sent>$[][<!DOCTYPE html> <html> <head> <title>]^An/A<det><ind><sg>$ ^HTML/HTML<n><acr><sp>$ ^Example/Example<n><sg>$^./.<sent>$[][<\/title> <\/head> <body> <p> ]^This/This<det><dem><sg>/This<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^an/a<det><ind><sg>$ ^HTML/HTML<n><acr><sp>$ ^example/example<n><sg>$^./.<sent>$^./.<sent>$[][ <\/p> <\/body> <\/html>