Voikkospell
Contents
- 1 Installation
- 2 Using voikkospell with apertium Stream Format
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX" to your "$PATH" by appending the following lines to your .profile:
PREFIX="$HOME/install/corevoikko" # e.g.
if [ -d "$PREFIX" ]; then
export PATH="$PREFIX/bin:$PATH"
fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream. voikkospell then expects apertium-stream-formatted input instead of a list of words.
Words in apertium Stream Format
apertium stream format encodes words as lexical units. Each begins with a ^
^ . . .
and ends with a $.
^ . . . $
The word immediately follows the ^,
^word . . .$
and a / immediately follows the word. If the word is unknown, *word follows;
^word/*word$
otherwise, all the word's analyses follow, delimited by /'s.
^word/word<n><sg>/word<vblex><inf>/word<vblex><pres>$
Escaping
To use ^, $, /, <, and > as characters, one must escape them. Each escape sequence begins with a \,
\ . . .
and a character follows. voikkospell then interprets the character literally. Note that the character can be any wide character, including newlines.
To use \'s as characters, one must escape them.
Superblanks
One can also escape multiple characters not encoded in lexical units by encoding them as a superblank. Each superblank begins with a [
[ . . .
and ends with a ].
[ . . . ]
Each ^, $, /, <, and > between the [ and the ] is interpreted literally.
To use [ and ] as characters, one must escape them.
Examples
Trailing Newline
$ echo '' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '\n', '\n' expected to follow '[' ^ Aborted
For this reason, when piping text directly to voikkospell --apertium-stream, use echo -n. It is not necessary to do this when piping through tools such as apertium-deshtml, which encapsulate all newlines in superblanks.
One could also escape the newline:
$ echo '\' | voikkospell --apertium-stream
Unanalysed Word
$ echo -n '^a/*a$' | voikkospell --apertium-stream W: a
Analysed Words
One Tag
$ echo -n '^b/b<A>$' | voikkospell --apertium-stream W: b
More Than One Tag
$ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream W: c
Ambiguous Word
$ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream W: d
Multiwords
One Word with Inner Inflection
$ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream W: e f
More Than One Word
Without Inner Inflection
$ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream W: gh
With Inner Inflection
$ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \ voikkospell --apertium-stream W: i jk W: lm n
Reserved Characters
\, ^, /, <, >, and $ are reserved.
\
$ echo -n '\' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:1: unexpected end-of-file following '\', end-of-file expected to fo llow ']' or '$' \ ^ Aborted
^
$ echo -n '^' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:2: unexpected end-of-file following '^', end-of-file expected to fo llow ']' or '$' ^ ^ Aborted
/
$ echo -n '/' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi ately, or to follow '^' or '#' not immediately / ^ Aborted
<
$ echo -n '<' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi ately, or to follow '/' or '+' not immediately < ^ Aborted
>
$ echo -n '>' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not immediately > ^ Aborted
$
$ echo -n '$' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi ately, or to follow '*' or '#' not immediately $ ^ Aborted
Escape
To avoid these errors, escape all reserved characters.
$ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream
Superblank
Alternatively, one can enclose reserved characters in superblanks.
$ echo -n '[^/<>$]' | voikkospell --apertium-stream
However, \ must be escaped.
$ echo -n '[\]' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:3: unexpected end-of-file following '[', end-of-file expected to fo llow ']' or '$' [\] ^ Aborted
Putting It All Together
Let's spellcheck a webpage!
voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings.
Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use apertium-en-ca's English analyser.
$ curl -s http://voikko.puimula.org/ | apertium-deshtml | \ lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | \ voikkospell --apertium-stream W: . C: Voikko W: Free W: linguistic W: software W: for W: Finnish W: . W: Free W: linguistic W: software W: and C: data W: for W: Finnish W: . C: Käyttäjät W: Users W: . C: Käytä C: Voikkoa C: verkossa W: . W: Use C: Voikko W: online W: . C: Lataa C: Voikon C: asennuspaketti W: . C: Käyttö C: sovellusohjelmissa W: . C: Käyttö C: Linux W: - C: jakeluissa W: . C: Kielityökalut C: LibreOfficessa W: . C: Usein C: kysyttyjä C: kysymyksiä W: . C: Yhteystiedot W: . W: Developers W: . W: Source W: code W: repositories W: . W: Development W: wiki W: . W: Using W: with W: Java W: . W: Contributors W: . W: Contributing W: . C: Joukahainen W: ( W: Finnish W: vocabulary W: ) W: . C: Ohjeita C: testaajille W: . W: Additional W: reading W: . C: Jakelijat W: Distributors W: . W: Source C: file W: releases W: . C: Release W: notes W: . W: Supported W: platforms W: . C: Linux W: . W: FreeBSD W: . W: Mac W: OS C: X W: . C: Windows W: . W: Architecture W: and W: history W: . W: Bugs W: and W: feature W: requests W: . W: Communication W: and W: contact W: information W: . C: Voikko W: is C: a W: spelling W: and W: grammar W: checker W: , W: hyphenator W: and W: collection W: of W: related W: linguistic C: data W: for W: Finnish W: language W: . W: Most of W: the W: material C: on W: this C: web W: site W: is W: in W: English W: . W: Pages W: written W: in W: Finnish W: contain W: information W: for W: end W: users W: who W: may W: not W: always W: understand W: English W: . W: . C: Tämä C: on C: Voikko W: - C: kielityökalujen C: kotisivu W: . C: Voikko C: on C: ohjelmisto C: suomen C: kielen C: oikeinkirjoituksen C: ja C: kieliopin C: tarkistamiseen W: , C: tavutukseen C: sekä C: sanojen C: analysointiin W: . C: Tämä C: sivusto C: on C: suurelta C: osin C: englanniksi W: , C: koska C: kaikki C: Voikon C: kanssa C: työskentelevät C: ohjelmistokehittäjät C: eivät C: osaa C: suomea W: . W: . C: Uutisia W: News W: . C: 2015 W: - C: 11 W: - C: 12 W: : W: Transitioning W: the W: Finnish W: dictionary W: from W: Malaga W: to W: VFST W: . C: 2014 W: - C: 01 W: - C: 26 W: : C: Tilastoja C: vuodelta C: 2013 C: ja C: kehityssuunnitelmia C: alkuvuodelle C: 2014 W: . C: 2013 W: - C: 10 W: - C: 07 W: : C: Käyttäjäkyselyn C: tulokset C: ja C: tilannepäivitystä W: . C: 2013 W: - C: 02 W: - C: 03 W: : C: Tilastoja C: vuodelta C: 2012 C: ja C: kehityssuunnitelmia C: vuodelle C: 2013 W: . C: 2012 W: - C: 08 W: - C: 23 W: : C: Voikko W: for W: Android W: available W: for W: early W: preview W: . C: 2012 W: - C: 04 W: - C: 25 W: : C: Suomen C: kielen W: VFST W: - C: morfologian C: kehitys C: aloitettu W: .