Difference between revisions of "Voikkospell"
Line 97: | Line 97: | ||
To use <code>[</code> and <code>]</code> as characters, one must escape them. |
To use <code>[</code> and <code>]</code> as characters, one must escape them. |
||
=== |
===An HTML Example=== |
||
Let's spellcheck the following webpage: |
|||
====Trailing Newline==== |
|||
<pre> |
|||
$ echo '' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedReservedCharacter' |
|||
what(): 1:1: unexpected '\n', '\n' expected to follow '[' |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
For this reason, when piping text directly to <code>voikkospell --apertium-stream</code>, use <code>echo -n</code>. It is not necessary to do this when piping through tools such as <code>apertium-deshtml</code>, which encapsulate all newlines in superblanks. |
|||
One could also escape the newline: |
|||
<pre> |
|||
$ echo '\' | voikkospell --apertium-stream |
|||
</pre> |
|||
====Unanalysed Word==== |
|||
<pre> |
|||
$ echo -n '^a/*a$' | voikkospell --apertium-stream |
|||
W: a |
|||
</pre> |
|||
====Analysed Words==== |
|||
=====One Tag===== |
|||
<pre> |
|||
$ echo -n '^b/b<A>$' | voikkospell --apertium-stream |
|||
W: b |
|||
</pre> |
|||
=====More Than One Tag===== |
|||
<pre> |
|||
$ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream |
|||
W: c |
|||
</pre> |
|||
====Ambiguous Word==== |
|||
<pre> |
|||
$ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream |
|||
W: d |
|||
</pre> |
|||
====Multiwords==== |
|||
=====One Word with Inner Inflection===== |
|||
<pre> |
|||
$ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream |
|||
W: e f |
|||
</pre> |
|||
=====More Than One Word===== |
|||
======Without Inner Inflection====== |
|||
<pre> |
|||
$ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream |
|||
W: gh |
|||
</pre> |
|||
======With Inner Inflection====== |
|||
<pre> |
|||
$ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \ |
|||
voikkospell --apertium-stream |
|||
W: i jk |
|||
W: lm n |
|||
</pre> |
|||
====Reserved Characters==== |
|||
<code>\</code>, <code>^</code>, <code>/</code>, <code><</code>, <code>></code>, and <code>$</code> are reserved. |
|||
=====\===== |
|||
<pre> |
|||
$ echo -n '\' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedEndOfFile' |
|||
what(): 1:1: unexpected end-of-file following '\', end-of-file expected to fo |
|||
llow ']' or '$' |
|||
\ |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
=====^===== |
|||
<pre> |
|||
$ echo -n '^' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedEndOfFile' |
|||
what(): 1:2: unexpected end-of-file following '^', end-of-file expected to fo |
|||
llow ']' or '$' |
|||
^ |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
=====/===== |
|||
<pre> |
|||
$ echo -n '/' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedReservedCharacter' |
|||
what(): 1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi |
|||
ately, or to follow '^' or '#' not immediately |
|||
/ |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
=====<===== |
|||
<pre> |
|||
$ echo -n '<' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedReservedCharacter' |
|||
what(): 1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi |
|||
ately, or to follow '/' or '+' not immediately |
|||
< |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
=====>===== |
|||
<pre> |
|||
$ echo -n '>' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedReservedCharacter' |
|||
what(): 1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not |
|||
immediately |
|||
> |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
=====$===== |
|||
<pre> |
|||
$ echo -n '$' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedReservedCharacter' |
|||
what(): 1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi |
|||
ately, or to follow '*' or '#' not immediately |
|||
$ |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
=====Escape===== |
|||
To avoid these errors, escape all reserved characters. |
|||
<pre> |
|||
$ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream |
|||
</pre> |
|||
=====Superblank===== |
|||
Alternatively, one can enclose reserved characters in superblanks. |
|||
<pre> |
|||
$ echo -n '[^/<>$]' | voikkospell --apertium-stream |
|||
</pre> |
|||
However, <code>\</code> must be escaped. |
|||
<pre> |
|||
$ echo -n '[\]' | voikkospell --apertium-stream |
|||
terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
|||
ctedEndOfFile' |
|||
what(): 1:3: unexpected end-of-file following '[', end-of-file expected to fo |
|||
llow ']' or '$' |
|||
[\] |
|||
^ |
|||
Aborted |
|||
</pre> |
|||
====Putting It All Together==== |
|||
Let's spellcheck a webpage! |
|||
voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings. |
|||
Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use <code>apertium-en-ca</code>'s English analyser. |
|||
<pre> |
<pre> |
||
<!DOCTYPE html> |
|||
$ curl -s http://voikko.puimula.org/ | apertium-deshtml | \ |
|||
<html> |
|||
lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | \ |
|||
<head> |
|||
voikkospell --apertium-stream |
|||
<title>An HTML Example</title> |
|||
W: . |
|||
</head> |
|||
C: Voikko |
|||
<body> |
|||
W: Free |
|||
<p> |
|||
W: linguistic |
|||
This is an HTML example. |
|||
W: software |
|||
</p> |
|||
W: for |
|||
</body> |
|||
W: Finnish |
|||
</html> |
|||
W: . |
|||
W: Free |
|||
W: linguistic |
|||
W: software |
|||
W: and |
|||
C: data |
|||
W: for |
|||
W: Finnish |
|||
W: . |
|||
C: Käyttäjät |
|||
W: Users |
|||
W: . |
|||
C: Käytä |
|||
C: Voikkoa |
|||
C: verkossa |
|||
W: . |
|||
W: Use |
|||
C: Voikko |
|||
W: online |
|||
W: . |
|||
C: Lataa |
|||
C: Voikon |
|||
C: asennuspaketti |
|||
W: . |
|||
C: Käyttö |
|||
C: sovellusohjelmissa |
|||
W: . |
|||
C: Käyttö |
|||
C: Linux |
|||
W: - |
|||
C: jakeluissa |
|||
W: . |
|||
C: Kielityökalut |
|||
C: LibreOfficessa |
|||
W: . |
|||
C: Usein |
|||
C: kysyttyjä |
|||
C: kysymyksiä |
|||
W: . |
|||
C: Yhteystiedot |
|||
W: . |
|||
W: Developers |
|||
W: . |
|||
W: Source |
|||
W: code |
|||
W: repositories |
|||
W: . |
|||
W: Development |
|||
W: wiki |
|||
W: . |
|||
W: Using |
|||
W: with |
|||
W: Java |
|||
W: . |
|||
W: Contributors |
|||
W: . |
|||
W: Contributing |
|||
W: . |
|||
C: Joukahainen |
|||
W: ( |
|||
W: Finnish |
|||
W: vocabulary |
|||
W: ) |
|||
W: . |
|||
C: Ohjeita |
|||
C: testaajille |
|||
W: . |
|||
W: Additional |
|||
W: reading |
|||
W: . |
|||
C: Jakelijat |
|||
W: Distributors |
|||
W: . |
|||
W: Source |
|||
C: file |
|||
W: releases |
|||
W: . |
|||
C: Release |
|||
W: notes |
|||
W: . |
|||
W: Supported |
|||
W: platforms |
|||
W: . |
|||
C: Linux |
|||
W: . |
|||
W: FreeBSD |
|||
W: . |
|||
W: Mac |
|||
W: OS |
|||
C: X |
|||
W: . |
|||
C: Windows |
|||
W: . |
|||
W: Architecture |
|||
W: and |
|||
W: history |
|||
W: . |
|||
W: Bugs |
|||
W: and |
|||
W: feature |
|||
W: requests |
|||
W: . |
|||
W: Communication |
|||
W: and |
|||
W: contact |
|||
W: information |
|||
W: . |
|||
C: Voikko |
|||
W: is |
|||
C: a |
|||
W: spelling |
|||
W: and |
|||
W: grammar |
|||
W: checker |
|||
W: , |
|||
W: hyphenator |
|||
W: and |
|||
W: collection |
|||
W: of |
|||
W: related |
|||
W: linguistic |
|||
C: data |
|||
W: for |
|||
W: Finnish |
|||
W: language |
|||
W: . |
|||
W: Most of |
|||
W: the |
|||
W: material |
|||
C: on |
|||
W: this |
|||
C: web |
|||
W: site |
|||
W: is |
|||
W: in |
|||
W: English |
|||
W: . |
|||
W: Pages |
|||
W: written |
|||
W: in |
|||
W: Finnish |
|||
W: contain |
|||
W: information |
|||
W: for |
|||
W: end |
|||
W: users |
|||
W: who |
|||
W: may |
|||
W: not |
|||
W: always |
|||
W: understand |
|||
W: English |
|||
W: . |
|||
W: . |
|||
C: Tämä |
|||
C: on |
|||
C: Voikko |
|||
W: - |
|||
C: kielityökalujen |
|||
C: kotisivu |
|||
W: . |
|||
C: Voikko |
|||
C: on |
|||
C: ohjelmisto |
|||
C: suomen |
|||
C: kielen |
|||
C: oikeinkirjoituksen |
|||
C: ja |
|||
C: kieliopin |
|||
C: tarkistamiseen |
|||
W: , |
|||
C: tavutukseen |
|||
C: sekä |
|||
C: sanojen |
|||
C: analysointiin |
|||
W: . |
|||
C: Tämä |
|||
C: sivusto |
|||
C: on |
|||
C: suurelta |
|||
C: osin |
|||
C: englanniksi |
|||
W: , |
|||
C: koska |
|||
C: kaikki |
|||
C: Voikon |
|||
C: kanssa |
|||
C: työskentelevät |
|||
C: ohjelmistokehittäjät |
|||
C: eivät |
|||
C: osaa |
|||
C: suomea |
|||
W: . |
|||
W: . |
|||
C: Uutisia |
|||
W: News |
|||
W: . |
|||
C: 2015 |
|||
W: - |
|||
C: 11 |
|||
W: - |
|||
C: 12 |
|||
W: : |
|||
W: Transitioning |
|||
W: the |
|||
W: Finnish |
|||
W: dictionary |
|||
W: from |
|||
W: Malaga |
|||
W: to |
|||
W: VFST |
|||
W: . |
|||
C: 2014 |
|||
W: - |
|||
C: 01 |
|||
W: - |
|||
C: 26 |
|||
W: : |
|||
C: Tilastoja |
|||
C: vuodelta |
|||
C: 2013 |
|||
C: ja |
|||
C: kehityssuunnitelmia |
|||
C: alkuvuodelle |
|||
C: 2014 |
|||
W: . |
|||
C: 2013 |
|||
W: - |
|||
C: 10 |
|||
W: - |
|||
C: 07 |
|||
W: : |
|||
C: Käyttäjäkyselyn |
|||
C: tulokset |
|||
C: ja |
|||
C: tilannepäivitystä |
|||
W: . |
|||
C: 2013 |
|||
W: - |
|||
C: 02 |
|||
W: - |
|||
C: 03 |
|||
W: : |
|||
C: Tilastoja |
|||
C: vuodelta |
|||
C: 2012 |
|||
C: ja |
|||
C: kehityssuunnitelmia |
|||
C: vuodelle |
|||
C: 2013 |
|||
W: . |
|||
C: 2012 |
|||
W: - |
|||
C: 08 |
|||
W: - |
|||
C: 23 |
|||
W: : |
|||
C: Voikko |
|||
W: for |
|||
W: Android |
|||
W: available |
|||
W: for |
|||
W: early |
|||
W: preview |
|||
W: . |
|||
C: 2012 |
|||
W: - |
|||
C: 04 |
|||
W: - |
|||
C: 25 |
|||
W: : |
|||
C: Suomen |
|||
C: kielen |
|||
W: VFST |
|||
W: - |
|||
C: morfologian |
|||
C: kehitys |
|||
C: aloitettu |
|||
W: . |
|||
</pre> |
</pre> |
||
Revision as of 21:48, 17 December 2015
Contents
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
. voikkospell then expects apertium-stream-formatted input instead of a list of words.
Words in apertium Stream Format
apertium stream format encodes words as lexical units. Each begins with a ^
^ . . .
and ends with a $
.
^ . . . $
The word immediately follows the ^
,
^word . . .$
and a /
immediately follows the word. If the word is unknown, *word
follows;
^word/*word$
otherwise, all the word's analyses follow, delimited by /
's.
^word/word<n><sg>/word<vblex><inf>/word<vblex><pres>$
Escaping
To use ^
, $
, /
, <
, and >
as characters, one must escape them. Each escape sequence begins with a \
,
\ . . .
and a character follows. voikkospell then interprets the character literally. Note that the character can be any wide character, including newlines.
To use \
's as characters, one must escape them.
Superblanks
One can also escape multiple characters not encoded in lexical units by encoding them as a superblank. Each superblank begins with a [
[ . . .
and ends with a ]
.
[ . . . ]
Each ^
, $
, /
, <
, and >
between the [
and the ]
is interpreted literally.
To use [
and ]
as characters, one must escape them.
An HTML Example
Let's spellcheck the following webpage:
<!DOCTYPE html> <html> <head> <title>An HTML Example</title> </head> <body> <p> This is an HTML example. </p> </body> </html>