Voikkospell
Contents
- 1 Installation
- 2 Using voikkospell with apertium Stream Format
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
. voikkospell then expects apertium-stream-formatted input instead of a list of words.
Words in apertium Stream Format
apertium stream format encodes words as lexical units. Each begins with a ^
^ . . .
and ends with a $
.
^ . . . $
The word immediately follows the ^
,
^word . . .$
and a /
immediately follows the word. If the word is unknown, *word
follows;
^word/*word$
otherwise, all the word's analyses follow, delimited by /
's.
^word/word<n><sg>/word<vblex><inf>/word<vblex><pres>$
Escaping
To use ^
, $
, /
, <
, and >
as characters, one must escape them. Each escape sequence begins with a \
,
\ . . .
and a character follows. voikkospell then interprets the character literally. Note that the character can be any wide character, including newlines.
To use \
's as characters, one must escape them.
\\
Superblanks
One can also escape multiple characters by encoding them as a superblank. Each superblank begins with a [
[ . . .
and ends with a ]
.
[ . . . ]
Each ^
, $
, /
, <
, and >
between the [
and the ]
is interpreted literally.
To use [
and ]
as characters, one must escape them.
Examples
Trailing Newline
$ echo '' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '\n', '\n' expected to follow '[' ^ Aborted
For this reason, when piping text directly to voikkospell --apertium-stream
, use echo -n
. It is not necessary to do this when piping through tools such as apertium-deshtml
, which encapsulate all newlines in superblanks.
One could also escape the newline:
$ echo '\' | voikkospell --apertium-stream
Unanalysed Word
$ echo -n '^a/*a$' | voikkospell --apertium-stream W: a
Analysed Words
One Tag
$ echo -n '^b/b<A>$' | voikkospell --apertium-stream W: b
More Than One Tag
$ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream W: c
Ambiguous Word
$ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream W: d
Multiwords
One Word with Inner Inflection
$ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream W: e f
More Than One Word
Without Inner Inflection
$ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream W: gh
With Inner Inflection
$ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \ voikkospell --apertium-stream W: i jk W: lm n
Reserved Characters
\
, ^
, /
, <
, >
, and $
are reserved.
\
$ echo -n '\' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:1: unexpected end-of-file following '\', end-of-file expected to fo llow ']' or '$' \ ^ Aborted
^
$ echo -n '^' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:2: unexpected end-of-file following '^', end-of-file expected to fo llow ']' or '$' ^ ^ Aborted
/
$ echo -n '/' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi ately, or to follow '^' or '#' not immediately / ^ Aborted
<
$ echo -n '<' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi ately, or to follow '/' or '+' not immediately < ^ Aborted
>
$ echo -n '>' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not immediately > ^ Aborted
$
$ echo -n '$' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi ately, or to follow '*' or '#' not immediately $ ^ Aborted
Escape
To avoid these errors, escape all reserved characters.
$ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream
Superblank
Alternatively, one can enclose reserved characters in superblanks.
$ echo -n '[^/<>$]' | voikkospell --apertium-stream
However, \
must be escaped.
$ echo -n '[\]' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:3: unexpected end-of-file following '[', end-of-file expected to fo llow ']' or '$' [\] ^ Aborted
Putting It All Together
Let's spellcheck a webpage!
voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings.
Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use apertium-en-ca
's English analyser.
$ curl -s http://voikko.puimula.org/ | apertium-deshtml | \ lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | \ voikkospell --apertium-stream W: . C: Voikko W: Free W: linguistic W: software W: for W: Finnish W: . W: Free W: linguistic W: software W: and C: data W: for W: Finnish W: . C: Käyttäjät W: Users W: . C: Käytä C: Voikkoa C: verkossa W: . W: Use C: Voikko W: online W: . C: Lataa C: Voikon C: asennuspaketti W: . C: Käyttö C: sovellusohjelmissa W: . C: Käyttö C: Linux W: - C: jakeluissa W: . C: Kielityökalut C: LibreOfficessa W: . C: Usein C: kysyttyjä C: kysymyksiä W: . C: Yhteystiedot W: . W: Developers W: . W: Source W: code W: repositories W: . W: Development W: wiki W: . W: Using W: with W: Java W: . W: Contributors W: . W: Contributing W: . C: Joukahainen W: ( W: Finnish W: vocabulary W: ) W: . C: Ohjeita C: testaajille W: . W: Additional W: reading W: . C: Jakelijat W: Distributors W: . W: Source C: file W: releases W: . C: Release W: notes W: . W: Supported W: platforms W: . C: Linux W: . W: FreeBSD W: . W: Mac W: OS C: X W: . C: Windows W: . W: Architecture W: and W: history W: . W: Bugs W: and W: feature W: requests W: . W: Communication W: and W: contact W: information W: . C: Voikko W: is C: a W: spelling W: and W: grammar W: checker W: , W: hyphenator W: and W: collection W: of W: related W: linguistic C: data W: for W: Finnish W: language W: . W: Most of W: the W: material C: on W: this C: web W: site W: is W: in W: English W: . W: Pages W: written W: in W: Finnish W: contain W: information W: for W: end W: users W: who W: may W: not W: always W: understand W: English W: . W: . C: Tämä C: on C: Voikko W: - C: kielityökalujen C: kotisivu W: . C: Voikko C: on C: ohjelmisto C: suomen C: kielen C: oikeinkirjoituksen C: ja C: kieliopin C: tarkistamiseen W: , C: tavutukseen C: sekä C: sanojen C: analysointiin W: . C: Tämä C: sivusto C: on C: suurelta C: osin C: englanniksi W: , C: koska C: kaikki C: Voikon C: kanssa C: työskentelevät C: ohjelmistokehittäjät C: eivät C: osaa C: suomea W: . W: . C: Uutisia W: News W: . C: 2015 W: - C: 11 W: - C: 12 W: : W: Transitioning W: the W: Finnish W: dictionary W: from W: Malaga W: to W: VFST W: . C: 2014 W: - C: 01 W: - C: 26 W: : C: Tilastoja C: vuodelta C: 2013 C: ja C: kehityssuunnitelmia C: alkuvuodelle C: 2014 W: . C: 2013 W: - C: 10 W: - C: 07 W: : C: Käyttäjäkyselyn C: tulokset C: ja C: tilannepäivitystä W: . C: 2013 W: - C: 02 W: - C: 03 W: : C: Tilastoja C: vuodelta C: 2012 C: ja C: kehityssuunnitelmia C: vuodelle C: 2013 W: . C: 2012 W: - C: 08 W: - C: 23 W: : C: Voikko W: for W: Android W: available W: for W: early W: preview W: . C: 2012 W: - C: 04 W: - C: 25 W: : C: Suomen C: kielen W: VFST W: - C: morfologian C: kehitys C: aloitettu W: .