Difference between revisions of "Voikkospell"
(→Escape) |
|||
Line 208: | Line 208: | ||
^ |
^ |
||
Aborted |
Aborted |
||
+ | </pre> |
||
+ | |||
+ | ====Putting It All Together==== |
||
+ | |||
+ | Let's spellcheck a webpage! |
||
+ | |||
+ | voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings. |
||
+ | |||
+ | Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use <code>apertium-en-ca</code>'s English analyser. |
||
+ | |||
+ | <pre> |
||
+ | $ curl -s http://voikko.puimula.org/ | apertium-deshtml | lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | voikkospell --apertium-stream |
||
+ | W: . |
||
+ | C: Voikko |
||
+ | W: Free |
||
+ | W: linguistic |
||
+ | W: software |
||
+ | W: for |
||
+ | W: Finnish |
||
+ | W: . |
||
+ | W: Free |
||
+ | W: linguistic |
||
+ | W: software |
||
+ | W: and |
||
+ | C: data |
||
+ | W: for |
||
+ | W: Finnish |
||
+ | W: . |
||
+ | C: Käyttäjät |
||
+ | W: Users |
||
+ | W: . |
||
+ | C: Käytä |
||
+ | C: Voikkoa |
||
+ | C: verkossa |
||
+ | W: . |
||
+ | W: Use |
||
+ | C: Voikko |
||
+ | W: online |
||
+ | W: . |
||
+ | C: Lataa |
||
+ | C: Voikon |
||
+ | C: asennuspaketti |
||
+ | W: . |
||
+ | C: Käyttö |
||
+ | C: sovellusohjelmissa |
||
+ | W: . |
||
+ | C: Käyttö |
||
+ | C: Linux |
||
+ | W: - |
||
+ | C: jakeluissa |
||
+ | W: . |
||
+ | C: Kielityökalut |
||
+ | C: LibreOfficessa |
||
+ | W: . |
||
+ | C: Usein |
||
+ | C: kysyttyjä |
||
+ | C: kysymyksiä |
||
+ | W: . |
||
+ | C: Yhteystiedot |
||
+ | W: . |
||
+ | W: Developers |
||
+ | W: . |
||
+ | W: Source |
||
+ | W: code |
||
+ | W: repositories |
||
+ | W: . |
||
+ | W: Development |
||
+ | W: wiki |
||
+ | W: . |
||
+ | W: Using |
||
+ | W: with |
||
+ | W: Java |
||
+ | W: . |
||
+ | W: Contributors |
||
+ | W: . |
||
+ | W: Contributing |
||
+ | W: . |
||
+ | C: Joukahainen |
||
+ | W: ( |
||
+ | W: Finnish |
||
+ | W: vocabulary |
||
+ | W: ) |
||
+ | W: . |
||
+ | C: Ohjeita |
||
+ | C: testaajille |
||
+ | W: . |
||
+ | W: Additional |
||
+ | W: reading |
||
+ | W: . |
||
+ | C: Jakelijat |
||
+ | W: Distributors |
||
+ | W: . |
||
+ | W: Source |
||
+ | C: file |
||
+ | W: releases |
||
+ | W: . |
||
+ | C: Release |
||
+ | W: notes |
||
+ | W: . |
||
+ | W: Supported |
||
+ | W: platforms |
||
+ | W: . |
||
+ | C: Linux |
||
+ | W: . |
||
+ | W: FreeBSD |
||
+ | W: . |
||
+ | W: Mac |
||
+ | W: OS |
||
+ | C: X |
||
+ | W: . |
||
+ | C: Windows |
||
+ | W: . |
||
+ | W: Architecture |
||
+ | W: and |
||
+ | W: history |
||
+ | W: . |
||
+ | W: Bugs |
||
+ | W: and |
||
+ | W: feature |
||
+ | W: requests |
||
+ | W: . |
||
+ | W: Communication |
||
+ | W: and |
||
+ | W: contact |
||
+ | W: information |
||
+ | W: . |
||
+ | C: Voikko |
||
+ | W: is |
||
+ | C: a |
||
+ | W: spelling |
||
+ | W: and |
||
+ | W: grammar |
||
+ | W: checker |
||
+ | W: , |
||
+ | W: hyphenator |
||
+ | W: and |
||
+ | W: collection |
||
+ | W: of |
||
+ | W: related |
||
+ | W: linguistic |
||
+ | C: data |
||
+ | W: for |
||
+ | W: Finnish |
||
+ | W: language |
||
+ | W: . |
||
+ | W: Most of |
||
+ | W: the |
||
+ | W: material |
||
+ | C: on |
||
+ | W: this |
||
+ | C: web |
||
+ | W: site |
||
+ | W: is |
||
+ | W: in |
||
+ | W: English |
||
+ | W: . |
||
+ | W: Pages |
||
+ | W: written |
||
+ | W: in |
||
+ | W: Finnish |
||
+ | W: contain |
||
+ | W: information |
||
+ | W: for |
||
+ | W: end |
||
+ | W: users |
||
+ | W: who |
||
+ | W: may |
||
+ | W: not |
||
+ | W: always |
||
+ | W: understand |
||
+ | W: English |
||
+ | W: . |
||
+ | W: . |
||
+ | C: Tämä |
||
+ | C: on |
||
+ | C: Voikko |
||
+ | W: - |
||
+ | C: kielityökalujen |
||
+ | C: kotisivu |
||
+ | W: . |
||
+ | C: Voikko |
||
+ | C: on |
||
+ | C: ohjelmisto |
||
+ | C: suomen |
||
+ | C: kielen |
||
+ | C: oikeinkirjoituksen |
||
+ | C: ja |
||
+ | C: kieliopin |
||
+ | C: tarkistamiseen |
||
+ | W: , |
||
+ | C: tavutukseen |
||
+ | C: sekä |
||
+ | C: sanojen |
||
+ | C: analysointiin |
||
+ | W: . |
||
+ | C: Tämä |
||
+ | C: sivusto |
||
+ | C: on |
||
+ | C: suurelta |
||
+ | C: osin |
||
+ | C: englanniksi |
||
+ | W: , |
||
+ | C: koska |
||
+ | C: kaikki |
||
+ | C: Voikon |
||
+ | C: kanssa |
||
+ | C: työskentelevät |
||
+ | C: ohjelmistokehittäjät |
||
+ | C: eivät |
||
+ | C: osaa |
||
+ | C: suomea |
||
+ | W: . |
||
+ | W: . |
||
+ | C: Uutisia |
||
+ | W: News |
||
+ | W: . |
||
+ | C: 2015 |
||
+ | W: - |
||
+ | C: 11 |
||
+ | W: - |
||
+ | C: 12 |
||
+ | W: : |
||
+ | W: Transitioning |
||
+ | W: the |
||
+ | W: Finnish |
||
+ | W: dictionary |
||
+ | W: from |
||
+ | W: Malaga |
||
+ | W: to |
||
+ | W: VFST |
||
+ | W: . |
||
+ | C: 2014 |
||
+ | W: - |
||
+ | C: 01 |
||
+ | W: - |
||
+ | C: 26 |
||
+ | W: : |
||
+ | C: Tilastoja |
||
+ | C: vuodelta |
||
+ | C: 2013 |
||
+ | C: ja |
||
+ | C: kehityssuunnitelmia |
||
+ | C: alkuvuodelle |
||
+ | C: 2014 |
||
+ | W: . |
||
+ | C: 2013 |
||
+ | W: - |
||
+ | C: 10 |
||
+ | W: - |
||
+ | C: 07 |
||
+ | W: : |
||
+ | C: Käyttäjäkyselyn |
||
+ | C: tulokset |
||
+ | C: ja |
||
+ | C: tilannepäivitystä |
||
+ | W: . |
||
+ | C: 2013 |
||
+ | W: - |
||
+ | C: 02 |
||
+ | W: - |
||
+ | C: 03 |
||
+ | W: : |
||
+ | C: Tilastoja |
||
+ | C: vuodelta |
||
+ | C: 2012 |
||
+ | C: ja |
||
+ | C: kehityssuunnitelmia |
||
+ | C: vuodelle |
||
+ | C: 2013 |
||
+ | W: . |
||
+ | C: 2012 |
||
+ | W: - |
||
+ | C: 08 |
||
+ | W: - |
||
+ | C: 23 |
||
+ | W: : |
||
+ | C: Voikko |
||
+ | W: for |
||
+ | W: Android |
||
+ | W: available |
||
+ | W: for |
||
+ | W: early |
||
+ | W: preview |
||
+ | W: . |
||
+ | C: 2012 |
||
+ | W: - |
||
+ | C: 04 |
||
+ | W: - |
||
+ | C: 25 |
||
+ | W: : |
||
+ | C: Suomen |
||
+ | C: kielen |
||
+ | W: VFST |
||
+ | W: - |
||
+ | C: morfologian |
||
+ | C: kehitys |
||
+ | C: aloitettu |
||
+ | W: . |
||
</pre> |
</pre> |
||
Revision as of 13:46, 15 December 2015
Contents
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
.
Unlike in normal voikkospell usage, each word does not need to be on its own line.
Examples
Trailing Newline
$ echo '' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '\n', '\n' expected to follow '[' ^ Aborted
For this reason, when piping text directly to voikkospell --apertium-stream
, use echo -n
. It is not necessary to do this when piping through tools such as apertium-deshtml
, which encapsulate all newlines in superblanks.
One could also escape the newline:
$ echo '\' | voikkospell --apertium-stream
Unanalysed Word
$ echo -n '^a/*a$' | voikkospell --apertium-stream W: a
Analysed Words
One Tag
$ echo -n '^b/b<A>$' | voikkospell --apertium-stream W: b
More Than One Tag
$ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream W: c
Ambiguous Word
$ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream W: d
Multiwords
One Word with Inner Inflection
$ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream W: e f
More Than One Word
Without Inner Inflection
$ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream W: gh
With Inner Inflection
$ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \ voikkospell --apertium-stream W: i jk W: lm n
Reserved Characters
\
, ^
, /
, <
, >
, and $
are reserved.
\
$ echo -n '\' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:1: unexpected end-of-file following '\', end-of-file expected to fo llow ']' or '$' \ ^ Aborted
^
$ echo -n '^' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:2: unexpected end-of-file following '^', end-of-file expected to fo llow ']' or '$' ^ ^ Aborted
/
$ echo -n '/' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi ately, or to follow '^' or '#' not immediately / ^ Aborted
<
$ echo -n '<' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi ately, or to follow '/' or '+' not immediately < ^ Aborted
>
$ echo -n '>' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not immediately > ^ Aborted
$
$ echo -n '$' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedReservedCharacter' what(): 1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi ately, or to follow '*' or '#' not immediately $ ^ Aborted
Escape
To avoid these errors, escape all reserved characters.
$ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream
Superblank
Alternatively, one can enclose reserved characters in superblanks.
$ echo -n '[^/<>$]' | voikkospell --apertium-stream
However, \
must be escaped.
$ echo -n '[\]' | voikkospell --apertium-stream terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe ctedEndOfFile' what(): 1:3: unexpected end-of-file following '[', end-of-file expected to fo llow ']' or '$' [\] ^ Aborted
Putting It All Together
Let's spellcheck a webpage!
voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings.
Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use apertium-en-ca
's English analyser.
$ curl -s http://voikko.puimula.org/ | apertium-deshtml | lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | voikkospell --apertium-stream W: . C: Voikko W: Free W: linguistic W: software W: for W: Finnish W: . W: Free W: linguistic W: software W: and C: data W: for W: Finnish W: . C: Käyttäjät W: Users W: . C: Käytä C: Voikkoa C: verkossa W: . W: Use C: Voikko W: online W: . C: Lataa C: Voikon C: asennuspaketti W: . C: Käyttö C: sovellusohjelmissa W: . C: Käyttö C: Linux W: - C: jakeluissa W: . C: Kielityökalut C: LibreOfficessa W: . C: Usein C: kysyttyjä C: kysymyksiä W: . C: Yhteystiedot W: . W: Developers W: . W: Source W: code W: repositories W: . W: Development W: wiki W: . W: Using W: with W: Java W: . W: Contributors W: . W: Contributing W: . C: Joukahainen W: ( W: Finnish W: vocabulary W: ) W: . C: Ohjeita C: testaajille W: . W: Additional W: reading W: . C: Jakelijat W: Distributors W: . W: Source C: file W: releases W: . C: Release W: notes W: . W: Supported W: platforms W: . C: Linux W: . W: FreeBSD W: . W: Mac W: OS C: X W: . C: Windows W: . W: Architecture W: and W: history W: . W: Bugs W: and W: feature W: requests W: . W: Communication W: and W: contact W: information W: . C: Voikko W: is C: a W: spelling W: and W: grammar W: checker W: , W: hyphenator W: and W: collection W: of W: related W: linguistic C: data W: for W: Finnish W: language W: . W: Most of W: the W: material C: on W: this C: web W: site W: is W: in W: English W: . W: Pages W: written W: in W: Finnish W: contain W: information W: for W: end W: users W: who W: may W: not W: always W: understand W: English W: . W: . C: Tämä C: on C: Voikko W: - C: kielityökalujen C: kotisivu W: . C: Voikko C: on C: ohjelmisto C: suomen C: kielen C: oikeinkirjoituksen C: ja C: kieliopin C: tarkistamiseen W: , C: tavutukseen C: sekä C: sanojen C: analysointiin W: . C: Tämä C: sivusto C: on C: suurelta C: osin C: englanniksi W: , C: koska C: kaikki C: Voikon C: kanssa C: työskentelevät C: ohjelmistokehittäjät C: eivät C: osaa C: suomea W: . W: . C: Uutisia W: News W: . C: 2015 W: - C: 11 W: - C: 12 W: : W: Transitioning W: the W: Finnish W: dictionary W: from W: Malaga W: to W: VFST W: . C: 2014 W: - C: 01 W: - C: 26 W: : C: Tilastoja C: vuodelta C: 2013 C: ja C: kehityssuunnitelmia C: alkuvuodelle C: 2014 W: . C: 2013 W: - C: 10 W: - C: 07 W: : C: Käyttäjäkyselyn C: tulokset C: ja C: tilannepäivitystä W: . C: 2013 W: - C: 02 W: - C: 03 W: : C: Tilastoja C: vuodelta C: 2012 C: ja C: kehityssuunnitelmia C: vuodelle C: 2013 W: . C: 2012 W: - C: 08 W: - C: 23 W: : C: Voikko W: for W: Android W: available W: for W: early W: preview W: . C: 2012 W: - C: 04 W: - C: 25 W: : C: Suomen C: kielen W: VFST W: - C: morfologian C: kehitys C: aloitettu W: .