Difference between revisions of "Voikkospell"
Line 98: | Line 98: | ||
===An HTML Example=== |
===An HTML Example=== |
||
Let's spellcheck the following webpage |
Let's spellcheck the following webpage. |
||
<pre> |
<pre> |
||
Line 113: | Line 113: | ||
</html> |
</html> |
||
</pre> |
</pre> |
||
Running <code>apertium-deshtml</code> on it yields the following. |
|||
<pre> |
|||
.[][<!DOCTYPE html> |
|||
<html> |
|||
<head> |
|||
<title>]An HTML Example.[][<\/title> |
|||
<\/head> |
|||
<body> |
|||
<p> |
|||
]This is an HTML example..[][ |
|||
<\/p> |
|||
<\/body> |
|||
<\/html> |
|||
]</pre> |
|||
[[Category:Spellchecking]] |
[[Category:Spellchecking]] |
Revision as of 21:51, 17 December 2015
Contents
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
. voikkospell then expects apertium-stream-formatted input instead of a list of words.
Words in apertium Stream Format
apertium stream format encodes words as lexical units. Each begins with a ^
^ . . .
and ends with a $
.
^ . . . $
The word immediately follows the ^
,
^word . . .$
and a /
immediately follows the word. If the word is unknown, *word
follows;
^word/*word$
otherwise, all the word's analyses follow, delimited by /
's.
^word/word<n><sg>/word<vblex><inf>/word<vblex><pres>$
Escaping
To use ^
, $
, /
, <
, and >
as characters, one must escape them. Each escape sequence begins with a \
,
\ . . .
and a character follows. voikkospell then interprets the character literally. Note that the character can be any wide character, including newlines.
To use \
's as characters, one must escape them.
Superblanks
One can also escape multiple characters not encoded in lexical units by encoding them as a superblank. Each superblank begins with a [
[ . . .
and ends with a ]
.
[ . . . ]
Each ^
, $
, /
, <
, and >
between the [
and the ]
is interpreted literally.
To use [
and ]
as characters, one must escape them.
An HTML Example
Let's spellcheck the following webpage.
<!DOCTYPE html> <html> <head> <title>An HTML Example</title> </head> <body> <p> This is an HTML example. </p> </body> </html>
Running apertium-deshtml
on it yields the following.
.[][<!DOCTYPE html> <html> <head> <title>]An HTML Example.[][<\/title> <\/head> <body> <p> ]This is an HTML example..[][ <\/p> <\/body> <\/html> ]