Hfst
hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob and apertium-fin-sme.
The IRC channel is #hfst
at irc.freenode.net
(you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The HFST Wiki has some very good documentation (see especially the page HfstReadme when you run into compilation problems).
HFST is actually created as a set of wrappers over several possible back-ends, Foma, OpenFST, SFST, …. If you want to use HFST for anything serious, you'll need at least one of these back-ends installed. Fortunately, the latest release of HFST also includes the back-ends in the package, and installs whichever back-end you need along with HFST itself.
Installing the latest release
The pre-packaged tarball includes all the regular back-end dependencies, this should make it much simpler to install than fetching all back-ends manually.
You will need the regular build dependencies (automake, autoconf, libtool, flex, bison, g++
) – if you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager. MacOS X users might need to install XCode (registration required).
Download the latest version, named something like hfst-X.Y.Z.tar.gz, from [1], and unzip. Then follow the instructions in the README file, typically:
$ tar -xzf hfst-X.Y.Z.tgz $ cd hfst-X.Y.Z/ $ ./configure --with-foma --enable-lexc $ make $ sudo make install $ sudo ldconfig
Note: you can add --with-unicode-handler=glib
(or --with-unicode-handler=ICU
) to the ./configure step if you have glib (or ICU) installed and want better Unicode Case_folding.
HFST from svn
Prerequisites
Required no matter what:
- automake, autoconf, libtool, flex, bison, g++, svn
On MacOS X you may need to install XCode (registration required).
If you haven't ever installed the pre-packaged tarball (see Hfst#Installing_the_latest_release above), you get to install prerequisite back-ends manually:
Required back-ends:
- OpenFST
- YOU HAVE TO INSTALL THIS!
Semi-Optional Backends:
- Foma -- used for lexc and xfst (sequential rewrite rules)
- remember to pass
--enable-lexc --with-foma
to ./configure to use this - IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY
- remember to pass
Optional Backends:
- SFST -- makes hfst-substitute a lot faster
- remember to pass
--with-sfst
to ./configure to use this - DON'T INSTALL THIS UNLESS YOU REALLY NEED TO. IT MAKES EVERYTHING HARDER, NOT EASIER
- remember to pass
You can also use glib or ICU to handle Unicode operations (configure --with-unicode-handler={glib,ICU}).
Compiling HFST3 from svn
First we need to checkout the code from the svn
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk/hfst3
Next we need to change the directory to the downloaded one
$ cd hfst3/
And then run autoreconf -i
$ autoreconf -i
Now's the time to configure the package, this is done differently depending on which of the back ends you have installed earlier. If you've installed all of them use
$ ./configure --enable-lexc --enable-calculate --with-foma --with-sfst --prefix=/home/USERNAME/local/
Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing whoami
If you've only installed foma and openfst use
$ ./configure --enable-lexc --with-foma --prefix=/home/USERNAME/local/
and if you've only installed sfst and openfst use
$ ./configure --with-sfst --enable-calculate --prefix=/home/USERNAME/local/
Note: If you want to install hfst in /usr/local, dump the --prefix at the end of the configure command
Now for the easier part, you need to make the package by running
$ make
then you need to install (Note: you need to add a sudo in front of the command if you installed it in /usr/local; otherwise, no sudo!)
$ make install
and finally (this might not be necessary on your Mac)
$ sudo ldconfig
Troubleshooting
If, during the ./configure step, you see
checking for GNU libc compatible malloc... no […] checking for GNU libc compatible realloc... no
and then during make a bunch of errors like:
/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope
, try the following:
sudo ldconfig export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
and then ./configure and make.
If, during make, you see errors like
xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
try instead
make CXXFLAGS=-fpermissive
For more advices on installation problems, have a look at the Hfst Readme page.
Using
$ svn co https://victorio.uit.no/langtech/trunk/st/fao $ cd fao/src $ make -f Makefile.hfst $ echo "orð" | hfst-lookup ../bin/fao-morph.hfst lookup> orð orð+N+Neu+Sg+Nom+Indef orð orð+N+Neu+Sg+Acc+Indef orð orð+N+Neu+Pl+Nom+Indef orð orð+N+Neu+Pl+Acc+Indef lookup> $
To compile lexc
code, first concatenate all the lexc files:
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt
To compile this, just use the hfst-lexc
program,
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin
To compile the twol
rules, just use the hfst-twolc
program,
$ hfst-twolc twol-fao.txt > twol-fao.bin
And then to compose the lexicon and rule file, use hfst-compose-intersect
:
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst
This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert
:
$ hfst-invert fao-gen.hfst -o fao-morph.hfst
HFST2 vs HFST3
There have been some changes. Notably:
- In twol files, a
/
in alphabetic symbols has to be escaped, e.g.%+Der%/st
instead of%+Der/st
. - In twol files, you can no longer have Sets on the left-hand side of a rule, so write
Vx:Vy /<= _ ; where Vx in Set1 Vy in Set2 ;
where you before would haveSet1:Set2 /<= _ ;
- The old
-r
option to hfst-twolc is now uppercase:-R
- hfst-lookup-optimize is gone, use instead
hfst-fst2fst -O -i infile.hfst -o outfile.hfst.ol
- hfst-lexc needs the outfile option to be before the lexc (input), e.g.
hfst-lexc -o outfile.hfst mylexicon.lexc
- hfst-compose-intersect uses
-1
(number one) instead of-l
(letter L), and-2
for the rule-file. E.g.hfst-compose-intersect -1 lexicon.hfst -2 rules.twol.hfst -o generator.hfst