Difference between revisions of "Voikkospell"
Line 97: | Line 97: | ||
To use <code>[</code> and <code>]</code> as characters, one must escape them. |
To use <code>[</code> and <code>]</code> as characters, one must escape them. |
||
− | === |
+ | ===An HTML Example=== |
+ | Let's spellcheck the following webpage: |
||
− | ====Trailing Newline==== |
||
− | <pre> |
||
− | $ echo '' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedReservedCharacter' |
||
− | what(): 1:1: unexpected '\n', '\n' expected to follow '[' |
||
− | |||
− | |||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | For this reason, when piping text directly to <code>voikkospell --apertium-stream</code>, use <code>echo -n</code>. It is not necessary to do this when piping through tools such as <code>apertium-deshtml</code>, which encapsulate all newlines in superblanks. |
||
− | |||
− | One could also escape the newline: |
||
− | |||
− | <pre> |
||
− | $ echo '\' | voikkospell --apertium-stream |
||
− | </pre> |
||
− | |||
− | ====Unanalysed Word==== |
||
− | <pre> |
||
− | $ echo -n '^a/*a$' | voikkospell --apertium-stream |
||
− | W: a |
||
− | </pre> |
||
− | |||
− | ====Analysed Words==== |
||
− | =====One Tag===== |
||
− | <pre> |
||
− | $ echo -n '^b/b<A>$' | voikkospell --apertium-stream |
||
− | W: b |
||
− | </pre> |
||
− | |||
− | =====More Than One Tag===== |
||
− | <pre> |
||
− | $ echo -n '^c/c<B><C>$' | voikkospell --apertium-stream |
||
− | W: c |
||
− | </pre> |
||
− | |||
− | ====Ambiguous Word==== |
||
− | <pre> |
||
− | $ echo -n '^d/d<D>/d<E><F>$' | voikkospell --apertium-stream |
||
− | W: d |
||
− | </pre> |
||
− | |||
− | ====Multiwords==== |
||
− | =====One Word with Inner Inflection===== |
||
− | <pre> |
||
− | $ echo -n '^e f/e<G># f/e<H><I># f$' | voikkospell --apertium-stream |
||
− | W: e f |
||
− | </pre> |
||
− | |||
− | =====More Than One Word===== |
||
− | ======Without Inner Inflection====== |
||
− | <pre> |
||
− | $ echo -n 'gh/g<J>+h<K><L>/g<M>+h<N>$' | voikkospell --apertium-stream |
||
− | W: gh |
||
− | </pre> |
||
− | |||
− | ======With Inner Inflection====== |
||
− | <pre> |
||
− | $ echo -n '^i jk/i<O># j+k<P><Q>/i<R># j+k<S>$ ^lm n/l<T>+m<U># n/l<V>+m<W># n$' | \ |
||
− | voikkospell --apertium-stream |
||
− | W: i jk |
||
− | W: lm n |
||
− | </pre> |
||
− | |||
− | ====Reserved Characters==== |
||
− | <code>\</code>, <code>^</code>, <code>/</code>, <code><</code>, <code>></code>, and <code>$</code> are reserved. |
||
− | |||
− | =====\===== |
||
− | <pre> |
||
− | $ echo -n '\' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedEndOfFile' |
||
− | what(): 1:1: unexpected end-of-file following '\', end-of-file expected to fo |
||
− | llow ']' or '$' |
||
− | \ |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | =====^===== |
||
− | <pre> |
||
− | $ echo -n '^' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedEndOfFile' |
||
− | what(): 1:2: unexpected end-of-file following '^', end-of-file expected to fo |
||
− | llow ']' or '$' |
||
− | ^ |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | =====/===== |
||
− | <pre> |
||
− | $ echo -n '/' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedReservedCharacter' |
||
− | what(): 1:1: unexpected '/', '/' expected to follow '[', to follow '>' immedi |
||
− | ately, or to follow '^' or '#' not immediately |
||
− | / |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | =====<===== |
||
− | <pre> |
||
− | $ echo -n '<' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedReservedCharacter' |
||
− | what(): 1:1: unexpected '<', '<' expected to follow '[', to follow '>' immedi |
||
− | ately, or to follow '/' or '+' not immediately |
||
− | < |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | =====>===== |
||
− | <pre> |
||
− | $ echo -n '>' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedReservedCharacter' |
||
− | what(): 1:1: unexpected '>', '>' expected to follow '[' or to follow '<' not |
||
− | immediately |
||
− | > |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | =====$===== |
||
− | <pre> |
||
− | $ echo -n '$' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedReservedCharacter' |
||
− | what(): 1:1: unexpected '$', '$' expected to follow '[', to follow '>' immedi |
||
− | ately, or to follow '*' or '#' not immediately |
||
− | $ |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | =====Escape===== |
||
− | To avoid these errors, escape all reserved characters. |
||
− | |||
− | <pre> |
||
− | $ echo -n '\\\^\/\<\>\$' | voikkospell --apertium-stream |
||
− | </pre> |
||
− | |||
− | =====Superblank===== |
||
− | Alternatively, one can enclose reserved characters in superblanks. |
||
− | |||
− | <pre> |
||
− | $ echo -n '[^/<>$]' | voikkospell --apertium-stream |
||
− | </pre> |
||
− | |||
− | However, <code>\</code> must be escaped. |
||
− | |||
− | <pre> |
||
− | $ echo -n '[\]' | voikkospell --apertium-stream |
||
− | terminate called after throwing an instance of 'Apertium::ApertiumStream::Unexpe |
||
− | ctedEndOfFile' |
||
− | what(): 1:3: unexpected end-of-file following '[', end-of-file expected to fo |
||
− | llow ']' or '$' |
||
− | [\] |
||
− | ^ |
||
− | Aborted |
||
− | </pre> |
||
− | |||
− | ====Putting It All Together==== |
||
− | |||
− | Let's spellcheck a webpage! |
||
− | |||
− | voikkospell's webpage has a mixture of English and Finnish words, so we should get a good mixture of correct and incorrect spellings. |
||
− | |||
− | Since voikkospell only checks spelling, it doesn't matter which analyser we use. In this example, I use <code>apertium-en-ca</code>'s English analyser. |
||
<pre> |
<pre> |
||
+ | <!DOCTYPE html> |
||
− | $ curl -s http://voikko.puimula.org/ | apertium-deshtml | \ |
||
+ | <html> |
||
− | lt-proc ~/svn.code.sf.net/p/apertium/svn/trunk/apertium-en-ca/en-ca.automorf.bin | \ |
||
+ | <head> |
||
− | voikkospell --apertium-stream |
||
+ | <title>An HTML Example</title> |
||
− | W: . |
||
+ | </head> |
||
− | C: Voikko |
||
+ | <body> |
||
− | W: Free |
||
+ | <p> |
||
− | W: linguistic |
||
+ | This is an HTML example. |
||
− | W: software |
||
+ | </p> |
||
− | W: for |
||
+ | </body> |
||
− | W: Finnish |
||
+ | </html> |
||
− | W: . |
||
− | W: Free |
||
− | W: linguistic |
||
− | W: software |
||
− | W: and |
||
− | C: data |
||
− | W: for |
||
− | W: Finnish |
||
− | W: . |
||
− | C: Käyttäjät |
||
− | W: Users |
||
− | W: . |
||
− | C: Käytä |
||
− | C: Voikkoa |
||
− | C: verkossa |
||
− | W: . |
||
− | W: Use |
||
− | C: Voikko |
||
− | W: online |
||
− | W: . |
||
− | C: Lataa |
||
− | C: Voikon |
||
− | C: asennuspaketti |
||
− | W: . |
||
− | C: Käyttö |
||
− | C: sovellusohjelmissa |
||
− | W: . |
||
− | C: Käyttö |
||
− | C: Linux |
||
− | W: - |
||
− | C: jakeluissa |
||
− | W: . |
||
− | C: Kielityökalut |
||
− | C: LibreOfficessa |
||
− | W: . |
||
− | C: Usein |
||
− | C: kysyttyjä |
||
− | C: kysymyksiä |
||
− | W: . |
||
− | C: Yhteystiedot |
||
− | W: . |
||
− | W: Developers |
||
− | W: . |
||
− | W: Source |
||
− | W: code |
||
− | W: repositories |
||
− | W: . |
||
− | W: Development |
||
− | W: wiki |
||
− | W: . |
||
− | W: Using |
||
− | W: with |
||
− | W: Java |
||
− | W: . |
||
− | W: Contributors |
||
− | W: . |
||
− | W: Contributing |
||
− | W: . |
||
− | C: Joukahainen |
||
− | W: ( |
||
− | W: Finnish |
||
− | W: vocabulary |
||
− | W: ) |
||
− | W: . |
||
− | C: Ohjeita |
||
− | C: testaajille |
||
− | W: . |
||
− | W: Additional |
||
− | W: reading |
||
− | W: . |
||
− | C: Jakelijat |
||
− | W: Distributors |
||
− | W: . |
||
− | W: Source |
||
− | C: file |
||
− | W: releases |
||
− | W: . |
||
− | C: Release |
||
− | W: notes |
||
− | W: . |
||
− | W: Supported |
||
− | W: platforms |
||
− | W: . |
||
− | C: Linux |
||
− | W: . |
||
− | W: FreeBSD |
||
− | W: . |
||
− | W: Mac |
||
− | W: OS |
||
− | C: X |
||
− | W: . |
||
− | C: Windows |
||
− | W: . |
||
− | W: Architecture |
||
− | W: and |
||
− | W: history |
||
− | W: . |
||
− | W: Bugs |
||
− | W: and |
||
− | W: feature |
||
− | W: requests |
||
− | W: . |
||
− | W: Communication |
||
− | W: and |
||
− | W: contact |
||
− | W: information |
||
− | W: . |
||
− | C: Voikko |
||
− | W: is |
||
− | C: a |
||
− | W: spelling |
||
− | W: and |
||
− | W: grammar |
||
− | W: checker |
||
− | W: , |
||
− | W: hyphenator |
||
− | W: and |
||
− | W: collection |
||
− | W: of |
||
− | W: related |
||
− | W: linguistic |
||
− | C: data |
||
− | W: for |
||
− | W: Finnish |
||
− | W: language |
||
− | W: . |
||
− | W: Most of |
||
− | W: the |
||
− | W: material |
||
− | C: on |
||
− | W: this |
||
− | C: web |
||
− | W: site |
||
− | W: is |
||
− | W: in |
||
− | W: English |
||
− | W: . |
||
− | W: Pages |
||
− | W: written |
||
− | W: in |
||
− | W: Finnish |
||
− | W: contain |
||
− | W: information |
||
− | W: for |
||
− | W: end |
||
− | W: users |
||
− | W: who |
||
− | W: may |
||
− | W: not |
||
− | W: always |
||
− | W: understand |
||
− | W: English |
||
− | W: . |
||
− | W: . |
||
− | C: Tämä |
||
− | C: on |
||
− | C: Voikko |
||
− | W: - |
||
− | C: kielityökalujen |
||
− | C: kotisivu |
||
− | W: . |
||
− | C: Voikko |
||
− | C: on |
||
− | C: ohjelmisto |
||
− | C: suomen |
||
− | C: kielen |
||
− | C: oikeinkirjoituksen |
||
− | C: ja |
||
− | C: kieliopin |
||
− | C: tarkistamiseen |
||
− | W: , |
||
− | C: tavutukseen |
||
− | C: sekä |
||
− | C: sanojen |
||
− | C: analysointiin |
||
− | W: . |
||
− | C: Tämä |
||
− | C: sivusto |
||
− | C: on |
||
− | C: suurelta |
||
− | C: osin |
||
− | C: englanniksi |
||
− | W: , |
||
− | C: koska |
||
− | C: kaikki |
||
− | C: Voikon |
||
− | C: kanssa |
||
− | C: työskentelevät |
||
− | C: ohjelmistokehittäjät |
||
− | C: eivät |
||
− | C: osaa |
||
− | C: suomea |
||
− | W: . |
||
− | W: . |
||
− | C: Uutisia |
||
− | W: News |
||
− | W: . |
||
− | C: 2015 |
||
− | W: - |
||
− | C: 11 |
||
− | W: - |
||
− | C: 12 |
||
− | W: : |
||
− | W: Transitioning |
||
− | W: the |
||
− | W: Finnish |
||
− | W: dictionary |
||
− | W: from |
||
− | W: Malaga |
||
− | W: to |
||
− | W: VFST |
||
− | W: . |
||
− | C: 2014 |
||
− | W: - |
||
− | C: 01 |
||
− | W: - |
||
− | C: 26 |
||
− | W: : |
||
− | C: Tilastoja |
||
− | C: vuodelta |
||
− | C: 2013 |
||
− | C: ja |
||
− | C: kehityssuunnitelmia |
||
− | C: alkuvuodelle |
||
− | C: 2014 |
||
− | W: . |
||
− | C: 2013 |
||
− | W: - |
||
− | C: 10 |
||
− | W: - |
||
− | C: 07 |
||
− | W: : |
||
− | C: Käyttäjäkyselyn |
||
− | C: tulokset |
||
− | C: ja |
||
− | C: tilannepäivitystä |
||
− | W: . |
||
− | C: 2013 |
||
− | W: - |
||
− | C: 02 |
||
− | W: - |
||
− | C: 03 |
||
− | W: : |
||
− | C: Tilastoja |
||
− | C: vuodelta |
||
− | C: 2012 |
||
− | C: ja |
||
− | C: kehityssuunnitelmia |
||
− | C: vuodelle |
||
− | C: 2013 |
||
− | W: . |
||
− | C: 2012 |
||
− | W: - |
||
− | C: 08 |
||
− | W: - |
||
− | C: 23 |
||
− | W: : |
||
− | C: Voikko |
||
− | W: for |
||
− | W: Android |
||
− | W: available |
||
− | W: for |
||
− | W: early |
||
− | W: preview |
||
− | W: . |
||
− | C: 2012 |
||
− | W: - |
||
− | C: 04 |
||
− | W: - |
||
− | C: 25 |
||
− | W: : |
||
− | C: Suomen |
||
− | C: kielen |
||
− | W: VFST |
||
− | W: - |
||
− | C: morfologian |
||
− | C: kehitys |
||
− | C: aloitettu |
||
− | W: . |
||
</pre> |
</pre> |
||
Revision as of 21:48, 17 December 2015
Contents
Installation
m5w/corevoikko, a fork of corevoikko, supports apertium stream format.
To clone it, execute the following command:
git clone https://github.com/m5w/corevoikko.git corevoikko
First, install libvoikko's dependencies. Next, execute the following commands:
cd corevoikko/libvoikko ./configure make sudo make install
If you do not have root privileges or would like to specify where to install libvoikko, execute the following instead: (Otherwise, you are finished with installation.)
cd corevoikko/libvoikko PREFIX="$HOME/install/corevoikko" # e.g. ./configure --prefix="$PREFIX" make make install
Finally, add your "$PREFIX"
to your "$PATH"
by appending the following lines to your .profile
:
PREFIX="$HOME/install/corevoikko" # e.g. if [ -d "$PREFIX" ]; then export PATH="$PREFIX/bin:$PATH" fi
Using voikkospell with apertium Stream Format
Invoke voikkospell with --apertium-stream
. voikkospell then expects apertium-stream-formatted input instead of a list of words.
Words in apertium Stream Format
apertium stream format encodes words as lexical units. Each begins with a ^
^ . . .
and ends with a $
.
^ . . . $
The word immediately follows the ^
,
^word . . .$
and a /
immediately follows the word. If the word is unknown, *word
follows;
^word/*word$
otherwise, all the word's analyses follow, delimited by /
's.
^word/word<n><sg>/word<vblex><inf>/word<vblex><pres>$
Escaping
To use ^
, $
, /
, <
, and >
as characters, one must escape them. Each escape sequence begins with a \
,
\ . . .
and a character follows. voikkospell then interprets the character literally. Note that the character can be any wide character, including newlines.
To use \
's as characters, one must escape them.
Superblanks
One can also escape multiple characters not encoded in lexical units by encoding them as a superblank. Each superblank begins with a [
[ . . .
and ends with a ]
.
[ . . . ]
Each ^
, $
, /
, <
, and >
between the [
and the ]
is interpreted literally.
To use [
and ]
as characters, one must escape them.
An HTML Example
Let's spellcheck the following webpage:
<!DOCTYPE html> <html> <head> <title>An HTML Example</title> </head> <body> <p> This is an HTML example. </p> </body> </html>