Voikkospell

From Apertium
Revision as of 13:46, 15 December 2015 by M5w (talk | contribs) (→‎Examples)
Jump to navigation Jump to search

Installation

m5w/corevoikko, a fork of corevoikko, supports apertium stream format.

To clone it, execute the following command:

git clone https://github.com/m5w/corevoikko.git corevoikko

First, install libvoikko's dependencies. Next, execute the following commands:

cd corevoikko/libvoikko
./configure
make
sudo make install

If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)

cd corevoikko/libvoikko
PREFIX="$HOME/install/corevoikko" # e.g.
./configure --prefix="$PREFIX"
make
make install

Finally, add your "$PREFIX" to your "$PATH" by appending the following lines to your .profile:

PREFIX="$HOME/install/corevoikko" # e.g.
if [ -d "$PREFIX" ]; then
        export PATH="$PREFIX/bin:$PATH"
fi

Using voikkospell with apertium Stream Format

Invoke voikkospell with --apertium-stream.

Unlike in normal voikkospell usage, each word does not need to be on its own line.

Examples

Trailing Newline

$ echo '' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedReservedCharacter'
  what():  1:1: unexpected '\n', '\n' expected to follow '['


^
Aborted

For this reason, when piping text directly to voikkospell --apertium-stream, use echo -n. It is not necessary to do this when piping through tools such as apertium-deshtml, which encapsulate all newlines in superblanks.

One could also escape the newline:

$ echo '\' | voikkospell --apertium-stream

Unanalysed Word

$ echo -n '^a/*a$' | voikkospell --apertium-stream
W: a

Analysed Words

One Tag
$ echo -n '^b/b<A>$' | voikkospell --apertium-stream
W: b
More Than One Tag
$ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream
W: c

Ambiguous Word

$ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream
W: d

Multiwords

One Word with Inner Inflection
$ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream
W: e f
More Than One Word
Without Inner Inflection
$ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream
W: gh
With Inner Inflection
$ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \
voikkospell --apertium-stream
W: i jk
W: lm n

Reserved Characters

\, ^, /, <, >, and $ are reserved.

\
$ echo -n '\' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedEndOfFile'
  what():  1:1: unexpected end-of-file following '\', end-of-file expected to fo
llow ']' or '$'
\
^
Aborted
^
$ echo -n '^' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedEndOfFile'
  what():  1:2: unexpected end-of-file following '^', end-of-file expected to fo
llow ']' or '$'
^
^
Aborted
/
$ echo -n '/' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedReservedCharacter'
  what():  1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi
ately, or to follow '^' or '#' not immediately
/
^
Aborted
<
$ echo  -n '<' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedReservedCharacter'
  what():  1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi
ately, or to follow '/' or '+' not immediately
<
^
Aborted
>
$ echo -n '>' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedReservedCharacter'
  what():  1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not 
immediately
>
^
Aborted
$
$ echo -n '$' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedReservedCharacter'
  what():  1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi
ately, or to follow '*' or '#' not immediately
$
^
Aborted
Escape

To avoid these errors, escape all reserved characters.

$ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream
Superblank

Alternatively, one can enclose reserved characters in superblanks.

$ echo -n '[^/<>$]' | voikkospell --apertium-stream

However, \ must be escaped.

$ echo -n '[\]' | voikkospell --apertium-stream
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe
ctedEndOfFile'
  what():  1:3: unexpected end-of-file following '[', end-of-file expected to fo
llow ']' or '$'
[\]
  ^
Aborted

Putting It All Together

Let's spellcheck a webpage!

voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings.

Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use apertium-en-ca's English analyser.

$ curl -s http://voikko.puimula.org/ | apertium-deshtml | lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | voikkospell --apertium-stream
W: .
C: Voikko
W: Free
W: linguistic
W: software
W: for
W: Finnish
W: .
W: Free
W: linguistic
W: software
W: and
C: data
W: for
W: Finnish
W: .
C: Käyttäjät
W: Users
W: .
C: Käytä
C: Voikkoa
C: verkossa
W: .
W: Use
C: Voikko
W: online
W: .
C: Lataa
C: Voikon
C: asennuspaketti
W: .
C: Käyttö
C: sovellusohjelmissa
W: .
C: Käyttö
C: Linux
W: -
C: jakeluissa
W: .
C: Kielityökalut
C: LibreOfficessa
W: .
C: Usein
C: kysyttyjä
C: kysymyksiä
W: .
C: Yhteystiedot
W: .
W: Developers
W: .
W: Source
W: code
W: repositories
W: .
W: Development
W: wiki
W: .
W: Using
W: with
W: Java
W: .
W: Contributors
W: .
W: Contributing
W: .
C: Joukahainen
W: (
W: Finnish
W: vocabulary
W: )
W: .
C: Ohjeita
C: testaajille
W: .
W: Additional
W: reading
W: .
C: Jakelijat
W: Distributors
W: .
W: Source
C: file
W: releases
W: .
C: Release
W: notes
W: .
W: Supported
W: platforms
W: .
C: Linux
W: .
W: FreeBSD
W: .
W: Mac
W: OS
C: X
W: .
C: Windows
W: .
W: Architecture
W: and
W: history
W: .
W: Bugs
W: and
W: feature
W: requests
W: .
W: Communication
W: and
W: contact
W: information
W: .
C: Voikko
W: is
C: a
W: spelling
W: and
W: grammar
W: checker
W: ,
W: hyphenator
W: and
W: collection
W: of
W: related
W: linguistic
C: data
W: for
W: Finnish
W: language
W: .
W: Most of
W: the
W: material
C: on
W: this
C: web
W: site
W: is
W: in
W: English
W: .
W: Pages
W: written
W: in
W: Finnish
W: contain
W: information
W: for
W: end
W: users
W: who
W: may
W: not
W: always
W: understand
W: English
W: .
W: .
C: Tämä
C: on
C: Voikko
W: -
C: kielityökalujen
C: kotisivu
W: .
C: Voikko
C: on
C: ohjelmisto
C: suomen
C: kielen
C: oikeinkirjoituksen
C: ja
C: kieliopin
C: tarkistamiseen
W: ,
C: tavutukseen
C: sekä
C: sanojen
C: analysointiin
W: .
C: Tämä
C: sivusto
C: on
C: suurelta
C: osin
C: englanniksi
W: ,
C: koska
C: kaikki
C: Voikon
C: kanssa
C: työskentelevät
C: ohjelmistokehittäjät
C: eivät
C: osaa
C: suomea
W: .
W: .
C: Uutisia
W: News
W: .
C: 2015
W: -
C: 11
W: -
C: 12
W: :
W: Transitioning
W: the
W: Finnish
W: dictionary
W: from
W: Malaga
W: to
W: VFST
W: .
C: 2014
W: -
C: 01
W: -
C: 26
W: :
C: Tilastoja
C: vuodelta
C: 2013
C: ja
C: kehityssuunnitelmia
C: alkuvuodelle
C: 2014
W: .
C: 2013
W: -
C: 10
W: -
C: 07
W: :
C: Käyttäjäkyselyn
C: tulokset
C: ja
C: tilannepäivitystä
W: .
C: 2013
W: -
C: 02
W: -
C: 03
W: :
C: Tilastoja
C: vuodelta
C: 2012
C: ja
C: kehityssuunnitelmia
C: vuodelle
C: 2013
W: .
C: 2012
W: -
C: 08
W: -
C: 23
W: :
C: Voikko
W: for
W: Android
W: available
W: for
W: early
W: preview
W: .
C: 2012
W: -
C: 04
W: -
C: 25
W: :
C: Suomen
C: kielen
W: VFST
W: -
C: morfologian
C: kehitys
C: aloitettu
W: .