Difference between revisions of "Voikkospell"
(→Escape) |
|||
Line 208: | Line 208: | ||
^ |
^ |
||
Aborted |
Aborted |
||
</pre> |
|||
====Putting It All Together==== |
|||
Let's spellcheck a webpage! |
|||
voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings. |
|||
Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use <code>apertium-en-ca</code>'s English analyser. |
|||
<pre> |
|||
$ curl -s http://voikko.puimula.org/ | apertium-deshtml | lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | voikkospell --apertium-stream |
|||
W: . |
|||
C: Voikko |
|||
W: Free |
|||
W: linguistic |
|||
W: software |
|||
W: for |
|||
W: Finnish |
|||
W: . |
|||
W: Free |
|||
W: linguistic |
|||
W: software |
|||
W: and |
|||
C: data |
|||
W: for |
|||
W: Finnish |
|||
W: . |
|||
C: Käyttäjät |
|||
W: Users |
|||
W: . |
|||
C: Käytä |
|||
C: Voikkoa |
|||
C: verkossa |
|||
W: . |
|||
W: Use |
|||
C: Voikko |
|||
W: online |
|||
W: . |
|||
C: Lataa |
|||
C: Voikon |
|||
C: asennuspaketti |
|||
W: . |
|||
C: Käyttö |
|||
C: sovellusohjelmissa |
|||
W: . |
|||
C: Käyttö |
|||
C: Linux |
|||
W: - |
|||
C: jakeluissa |
|||
W: . |
|||
C: Kielityökalut |
|||
C: LibreOfficessa |
|||
W: . |
|||
C: Usein |
|||
C: kysyttyjä |
|||
C: kysymyksiä |
|||
W: . |
|||
C: Yhteystiedot |
|||
W: . |
|||
W: Developers |
|||
W: . |
|||
W: Source |
|||
W: code |
|||
W: repositories |
|||
W: . |
|||
W: Development |
|||
W: wiki |
|||
W: . |
|||
W: Using |
|||
W: with |
|||
W: Java |
|||
W: . |
|||
W: Contributors |
|||
W: . |
|||
W: Contributing |
|||
W: . |
|||
C: Joukahainen |
|||
W: ( |
|||
W: Finnish |
|||
W: vocabulary |
|||
W: ) |
|||
W: . |
|||
C: Ohjeita |
|||
C: testaajille |
|||
W: . |
|||
W: Additional |
|||
W: reading |
|||
W: . |
|||
C: Jakelijat |
|||
W: Distributors |
|||
W: . |
|||
W: Source |
|||
C: file |
|||
W: releases |
|||
W: . |
|||
C: Release |
|||
W: notes |
|||
W: . |
|||
W: Supported |
|||
W: platforms |
|||
W: . |
|||
C: Linux |
|||
W: . |
|||
W: FreeBSD |
|||
W: . |
|||
W: Mac |
|||
W: OS |
|||
C: X |
|||
W: . |
|||
C: Windows |
|||
W: . |
|||
W: Architecture |
|||
W: and |
|||
W: history |
|||
W: . |
|||
W: Bugs |
|||
W: and |
|||
W: feature |
|||
W: requests |
|||
W: . |
|||
W: Communication |
|||
W: and |
|||
W: contact |
|||
W: information |
|||
W: . |
|||
C: Voikko |
|||
W: is |
|||
C: a |
|||
W: spelling |
|||
W: and |
|||
W: grammar |
|||
W: checker |
|||
W: , |
|||
W: hyphenator |
|||
W: and |
|||
W: collection |
|||
W: of |
|||
W: related |
|||
W: linguistic |
|||
C: data |
|||
W: for |
|||
W: Finnish |
|||
W: language |
|||
W: . |
|||
W: Most of |
|||
W: the |
|||
W: material |
|||
C: on |
|||
W: this |
|||
C: web |
|||
W: site |
|||
W: is |
|||
W: in |
|||
W: English |
|||
W: . |
|||
W: Pages |
|||
W: written |
|||
W: in |
|||
W: Finnish |
|||
W: contain |
|||
W: information |
|||
W: for |
|||
W: end |
|||
W: users |
|||
W: who |
|||
W: may |
|||
W: not |
|||
W: always |
|||
W: understand |
|||
W: English |
|||
W: . |
|||
W: . |
|||
C: Tämä |
|||
C: on |
|||
C: Voikko |
|||
W: - |
|||
C: kielityökalujen |
|||
C: kotisivu |
|||
W: . |
|||
C: Voikko |
|||
C: on |
|||
C: ohjelmisto |
|||
C: suomen |
|||
C: kielen |
|||
C: oikeinkirjoituksen |
|||
C: ja |
|||
C: kieliopin |
|||
C: tarkistamiseen |
|||
W: , |
|||
C: tavutukseen |
|||
C: sekä |
|||
C: sanojen |
|||
C: analysointiin |
|||
W: . |
|||
C: Tämä |
|||
C: sivusto |
|||
C: on |
|||
C: suurelta |
|||
C: osin |
|||
C: englanniksi |
|||
W: , |
|||
C: koska |
|||
C: kaikki |
|||
C: Voikon |
|||
C: kanssa |
|||
C: työskentelevät |
|||
C: ohjelmistokehittäjät |
|||
C: eivät |
|||
C: osaa |
|||
C: suomea |
|||
W: . |
|||
W: . |
|||
C: Uutisia |
|||
W: News |
|||
W: . |
|||
C: 2015 |
|||
W: - |
|||
C: 11 |
|||
W: - |
|||
C: 12 |
|||
W: : |
|||
W: Transitioning |
|||
W: the |
|||
W: Finnish |
|||
W: dictionary |
|||
W: from |
|||
W: Malaga |
|||
W: to |
|||
W: VFST |
|||
W: . |
|||
C: 2014 |
|||
W: - |
|||
C: 01 |
|||
W: - |
|||
C: 26 |
|||
W: : |
|||
C: Tilastoja |
|||
C: vuodelta |
|||
C: 2013 |
|||
C: ja |
|||
C: kehityssuunnitelmia |
|||
C: alkuvuodelle |
|||
C: 2014 |
|||
W: . |
|||
C: 2013 |
|||
W: - |
|||
C: 10 |
|||
W: - |
|||
C: 07 |
|||
W: : |
|||
C: Käyttäjäkyselyn |
|||
C: tulokset |
|||
C: ja |
|||
C: tilannepäivitystä |
|||
W: . |
|||
C: 2013 |
|||
W: - |
|||
C: 02 |
|||
W: - |
|||
C: 03 |
|||
W: : |
|||
C: Tilastoja |
|||
C: vuodelta |
|||
C: 2012 |
|||
C: ja |
|||
C: kehityssuunnitelmia |
|||
C: vuodelle |
|||
C: 2013 |
|||
W: . |
|||
C: 2012 |
|||
W: - |
|||
C: 08 |
|||
W: - |
|||
C: 23 |
|||
W: : |
|||
C: Voikko |
|||
W: for |
|||
W: Android |
|||
W: available |
|||
W: for |
|||
W: early |
|||
W: preview |
|||
W: . |
|||
C: 2012 |
|||
W: - |
|||
C: 04 |
|||
W: - |
|||
C: 25 |
|||
W: : |
|||
C: Suomen |
|||
C: kielen |
|||
W: VFST |
|||
W: - |
|||
C: morfologian |
|||
C: kehitys |
|||
C: aloitettu |
|||
W: . |
|||
</pre> |
</pre> |
||
Revision as of 13:46, 15 December 2015
Contents
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
.
Unlike in normal voikkospell usage, each word does not need to be on its own line.
Examples
Trailing Newline
$ echo '' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '\n', '\n' expected to follow '[' ^ Aborted
For this reason, when piping text directly to voikkospell --apertium-stream
, use echo -n
. It is not necessary to do this when piping through tools such as apertium-deshtml
, which encapsulate all newlines in superblanks.
One could also escape the newline:
$ echo '\' | voikkospell --apertium-stream
Unanalysed Word
$ echo -n '^a/*a$' | voikkospell --apertium-stream W: a
Analysed Words
One Tag
$ echo -n '^b/b<A>$' | voikkospell --apertium-stream W: b
More Than One Tag
$ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream W: c
Ambiguous Word
$ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream W: d
Multiwords
One Word with Inner Inflection
$ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream W: e f
More Than One Word
Without Inner Inflection
$ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream W: gh
With Inner Inflection
$ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \ voikkospell --apertium-stream W: i jk W: lm n
Reserved Characters
\
, ^
, /
, <
, >
, and $
are reserved.
\
$ echo -n '\' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:1: unexpected end-of-file following '\', end-of-file expected to fo llow ']' or '$' \ ^ Aborted
^
$ echo -n '^' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:2: unexpected end-of-file following '^', end-of-file expected to fo llow ']' or '$' ^ ^ Aborted
/
$ echo -n '/' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi ately, or to follow '^' or '#' not immediately / ^ Aborted
<
$ echo -n '<' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi ately, or to follow '/' or '+' not immediately < ^ Aborted
>
$ echo -n '>' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not immediately > ^ Aborted
$
$ echo -n '$' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi ately, or to follow '*' or '#' not immediately $ ^ Aborted
Escape
To avoid these errors, escape all reserved characters.
$ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream
Superblank
Alternatively, one can enclose reserved characters in superblanks.
$ echo -n '[^/<>$]' | voikkospell --apertium-stream
However, \
must be escaped.
$ echo -n '[\]' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:3: unexpected end-of-file following '[', end-of-file expected to fo llow ']' or '$' [\] ^ Aborted
Putting It All Together
Let's spellcheck a webpage!
voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings.
Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use apertium-en-ca
's English analyser.
$ curl -s http://voikko.puimula.org/ | apertium-deshtml | lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | voikkospell --apertium-stream W: . C: Voikko W: Free W: linguistic W: software W: for W: Finnish W: . W: Free W: linguistic W: software W: and C: data W: for W: Finnish W: . C: Käyttäjät W: Users W: . C: Käytä C: Voikkoa C: verkossa W: . W: Use C: Voikko W: online W: . C: Lataa C: Voikon C: asennuspaketti W: . C: Käyttö C: sovellusohjelmissa W: . C: Käyttö C: Linux W: - C: jakeluissa W: . C: Kielityökalut C: LibreOfficessa W: . C: Usein C: kysyttyjä C: kysymyksiä W: . C: Yhteystiedot W: . W: Developers W: . W: Source W: code W: repositories W: . W: Development W: wiki W: . W: Using W: with W: Java W: . W: Contributors W: . W: Contributing W: . C: Joukahainen W: ( W: Finnish W: vocabulary W: ) W: . C: Ohjeita C: testaajille W: . W: Additional W: reading W: . C: Jakelijat W: Distributors W: . W: Source C: file W: releases W: . C: Release W: notes W: . W: Supported W: platforms W: . C: Linux W: . W: FreeBSD W: . W: Mac W: OS C: X W: . C: Windows W: . W: Architecture W: and W: history W: . W: Bugs W: and W: feature W: requests W: . W: Communication W: and W: contact W: information W: . C: Voikko W: is C: a W: spelling W: and W: grammar W: checker W: , W: hyphenator W: and W: collection W: of W: related W: linguistic C: data W: for W: Finnish W: language W: . W: Most of W: the W: material C: on W: this C: web W: site W: is W: in W: English W: . W: Pages W: written W: in W: Finnish W: contain W: information W: for W: end W: users W: who W: may W: not W: always W: understand W: English W: . W: . C: Tämä C: on C: Voikko W: - C: kielityökalujen C: kotisivu W: . C: Voikko C: on C: ohjelmisto C: suomen C: kielen C: oikeinkirjoituksen C: ja C: kieliopin C: tarkistamiseen W: , C: tavutukseen C: sekä C: sanojen C: analysointiin W: . C: Tämä C: sivusto C: on C: suurelta C: osin C: englanniksi W: , C: koska C: kaikki C: Voikon C: kanssa C: työskentelevät C: ohjelmistokehittäjät C: eivät C: osaa C: suomea W: . W: . C: Uutisia W: News W: . C: 2015 W: - C: 11 W: - C: 12 W: : W: Transitioning W: the W: Finnish W: dictionary W: from W: Malaga W: to W: VFST W: . C: 2014 W: - C: 01 W: - C: 26 W: : C: Tilastoja C: vuodelta C: 2013 C: ja C: kehityssuunnitelmia C: alkuvuodelle C: 2014 W: . C: 2013 W: - C: 10 W: - C: 07 W: : C: Käyttäjäkyselyn C: tulokset C: ja C: tilannepäivitystä W: . C: 2013 W: - C: 02 W: - C: 03 W: : C: Tilastoja C: vuodelta C: 2012 C: ja C: kehityssuunnitelmia C: vuodelle C: 2013 W: . C: 2012 W: - C: 08 W: - C: 23 W: : C: Voikko W: for W: Android W: available W: for W: early W: preview W: . C: 2012 W: - C: 04 W: - C: 25 W: : C: Suomen C: kielen W: VFST W: - C: morfologian C: kehitys C: aloitettu W: .