User:Ranjan19
This guide shows how to compile a speller for an lttoolbox-based analyser and test it on some text. It is possible to use morphologies created in the Apertium platform directly as spellcheckers using the libvoikko and libreoffice-voikko extension. We will do exactly that. Apertium hfst transducers can be compiled into libraries that libvoikko can use to perform spell checking, including providing suggestions.
The hfst based languages like Apertium-kaz have already accomplished this task so we'll follow their lead and take a lot of help from their implementation to compile the speller. We will first converting lttoolbox to hfst and then generating a zhfst(to be used as a spell-checker) for an lttoolbox-based language such as Hindi(the sample language in this case).
As a prerequisite we need to install libreoffice-voikko on Ubuntu/Debian.This guide shows how to do things by manually compiling most of it manually.
You should be able to do make do with just packages from apt-get, so try that first: Using apertium spellers with libreoffice-voikko on Debian. But if one of the packages isn't working, (parts of) this guide may be helpful.
Install prerequisites
First add the repository by following Prerequisites for Debian.
Then install speller-prerequisites:
sudo apt-get install libreoffice python3 git make sed findutils zip unzip pkg-config gettext \ libxml++2.6-dev libarchive-dev zlib1g-dev unzip automake autoconf libtool flex bison g++ libreadline-dev hfst
Manually compile other prerequisites
hfst-ospell
wget http://downloads.sourceforge.net/project/hfst/hfst/source/hfstospell-0.4.0.tar.gz tar xvf hfstospell-0.4.0.tar.gz cd hfstospell-0.4.0 ./configure --enable-zhfst make -j4 sudo make install
Libvoikko
sudo apt-get install hfst-ospell-dev wget http://www.puimula.org/voikko-sources/libvoikko/libvoikko-4.0.tar.gz tar xvf libvoikko-4.0.tar.gz cd libvoikko-4.0 ./autogen.sh ./configure --with-dictionary-path=$HOME/.voikko --enable-hfst make -j4 sudo make install
You may also have to put
export LD_LIBRARY_PATH=/usr/local/lib
in your ~/.bashrc
voikko-fi
wget http://www.puimula.org/voikko-sources/voikko-fi/voikko-fi-2.0.tar.gz tar xvf voikko-fi-2.0.tar.gz cd voikko-fi-2.0 PATH=/usr/local/voikko/bin:$PATH make vvfst sudo make vvfst-install DESTDIR=/usr/local/voikkodict
Install language modules
This is based on Minimal installation from SVN:
- To install Kazakh language module, first get it. We'll need Kazakh module in the implementation as mentioned earlier. We don't need to build it.
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-kaz/
- To install Hindi language module, first get it. We'll need Kazakh module in the implementation as mentioned earlier.
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-hin/ cd apertium-hin ./autogen.sh --enable-ospell make -j4
Converting lttoolbox binary into hfst
lt-print hin.autogen.bin | hfst-txt2fst -e ε > hin.autogen.hfst cd ..
Here `lt-print foo.autogen.bin` gives a .att file which is taken as an input to hfst-txt2fst.
Generating a zhfst package from the hfst file generated
Here we have observed how apertium-kaz generates the zhfst file and implemented that here.
- Copy “dev/editdist.py” and “speller” from apertium-kaz to apertium-hin.
cp apertium-kaz/dev/editdist.py apertium-hin/dev/editdist.py cp -r apertium-kaz/speller apertium-hin/speller
- Run the following commands in the Hindi language module. These commands have been taken from apertium-kaz zhfst conversion when we compile apertium-kaz i.e during the `make` of apertium-kaz.
cd apertium-hin cat hin.autogen.hfst | hfst-fst2fst -t | hfst-project --project=lower | hfst-minimise |hfst-fst2fst -f olw -o acceptor.default.hfst grep -v -e "^#" -e "^$" speller/words.default.txt | hfst-strings2fst -j -o words.default.hfst echo "?*;" | hfst-regexp2fst -S -o anystar.hfst grep -v -e "^#" -e "^$" speller/strings.default.txt | hfst-strings2fst -j | hfst-concatenate anystar.hfst - \ | hfst-concatenate - anystar.hfst -o strings.default.hfst python dev/editdist.py -v -s -d 1 -e '@0@' -i speller/editdist.default.txt -a acceptor.default.hfst > editdist.default.hfst.txt hfst-txt2fst -i editdist.default.hfst.txt -e '@0@' -o editdist.default.hfst rm -f editdist.default.hfst.txt hfst-disjunct strings.default.hfst editdist.default.hfst | hfst-minimise | hfst-repeat -f 1 -t 2 -o editstrings.default.hfst hfst-disjunct words.default.hfst editstrings.default.hfst | hfst-fst2fst -f olw -o errmodel.default.hfst rm -f hin.zhfst zip -Z store -j hin.zhfst acceptor.default.hfst errmodel.default.hfst speller/index.xml
Here we expect that the proper zhfst file has been generated in the hindi language module.
Using the zhfst file as a spellchecker and testing it from the command line
There are two ways to go about this.
- Using hfst-ospell
echo "कौशल" | hfst-ospell -S hin.zhfst echo "आदमी" | hfst-ospell -S hin.zhfst
In the above two cases, since the spellings are correct the output should be “The word is in the lexicon”. Not working. The error comes “word not in the lexicon, cannot provide corrections”.
echo "अगामी" | hfst-ospell -S hin.zhfst
In this case the spelling of the word is incorrect and requires a trivial correction. The output would be the correct options. Not working. The error comes “word not in the lexicon, cannot provide corrections”.
- Using voikkospell
cd apertium-hin cp hin.zhfst ~/.voikko/3/hin.zhfst echo "अगामी" | tr ' ' '\n' | voikkospell -d hin.zhfst -s
In this case the spelling of the word is incorrect and requires a trivial correction. The output would be the correct options.