hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob, apertium-fin-sme, apertium-kaz-tat and in few other pairs which involve Turkic languages.
The IRC channel is
irc.freenode.net (you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The HFST Wiki has some very good documentation (see especially the page HfstReadme when you run into compilation problems).
HFST is actually created as a set of wrappers over several possible back-ends, Foma, OpenFST, SFST, …. If you want to use HFST for anything serious, you'll need at least one of these back-ends installed. Fortunately, the latest versions of HFST also include the back-ends, and will install whichever back-end you need along with HFST itself.
Building and installing HFST
You will need the regular build dependencies:
automake, autoconf, libtool, flex, bison, g++, libreadline-dev
If you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager, e.g.
sudo apt-get install automake autoconf libtool flex bison g++ libreadline-dev
- Arch Linux:
sudo pacman -S base-devel
MacOS X users might need to install XCode (registration required).
Either use the latest release (recommended for users), or go with the bleeding-edge SVN version (recommended for developers).
$ svn co svn://svn.code.sf.net/p/hfst/code/trunk/hfst3 hfst3 $ cd hfst3/ $ autoreconf -i
(The autoreconf step is only needed when using SVN, not with the tarball.)
Download the latest release, named something like hfst-X.Y.Z.tar.gz, from http://sourceforge.net/projects/hfst/files/, then
$ tar -xzf hfst-X.Y.Z.tgz $ cd hfst-X.Y.Z/
(replacing X.Y.Z for the version you downloaded)
In the configure step, you'll need to decide which back-ends you want to include.
- Included by default.
- Foma – used for lexc and xfst (sequential rewrite rules)
--enable-lexc --with-fomato ./configure to use this
- IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY
For most users, this is enough:
$ ./configure --with-foma --enable-lexc
The above command will configure it to be installed to /usr/local in the
make install step (below).
If you want hfst and back-ends installed somewhere else, you can do
$ ./configure --with-foma --enable-lexc --prefix=/home/USERNAME/local/
Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing
You can also add
--with-unicode-handler=ICU) to the ./configure step if you have glib (or ICU) installed and want better Unicode Case_folding.
Compile and install
Build the package by running
Then you need to install (Note: you need to use
sudo make install if you installed it in /usr/local (or did not give a --prefix in the configure step); otherwise, no sudo!)
$ make install
And finally, unless you have a Mac, you may need to do:
$ sudo ldconfig
If, during the ./configure step, you see
checking for GNU libc compatible malloc... no […] checking for GNU libc compatible realloc... no
and then during make a bunch of errors like:
/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope
, try the following:
sudo ldconfig export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
and then ./configure and make.
If, during make, you see errors like
xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
For more advices on installation problems, have a look at the Hfst Readme page.
$ svn co https://victorio.uit.no/langtech/trunk/st/fao $ cd fao/src $ make -f Makefile.hfst $ echo "orð" | hfst-lookup ../bin/fao-morph.hfst lookup> orð orð+N+Neu+Sg+Nom+Indef orð orð+N+Neu+Sg+Acc+Indef orð orð+N+Neu+Pl+Nom+Indef orð orð+N+Neu+Pl+Acc+Indef lookup> $
lexc code, first concatenate all the lexc files:
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt
To compile this, just use the
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin
To compile the
twol rules, just use the
$ hfst-twolc twol-fao.txt > twol-fao.bin
And then to compose the lexicon and rule file, use
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst
This will create a generator, if you want an analyser, you just need to invert the generator with
$ hfst-invert fao-gen.hfst -o fao-morph.hfst
HFST2 vs HFST3
There have been some changes. Notably:
- In twol files, a
/in alphabetic symbols has to be escaped, e.g.
- In twol files, you can no longer have Sets on the left-hand side of a rule, so write
Vx:Vy /<= _ ; where Vx in Set1 Vy in Set2 ;where you before would have
Set1:Set2 /<= _ ;
- The old
-roption to hfst-twolc is now uppercase:
- hfst-lookup-optimize is gone, use instead
hfst-fst2fst -O -i infile.hfst -o outfile.hfst.ol
- hfst-lexc needs the outfile option to be before the lexc (input), e.g.
hfst-lexc -o outfile.hfst mylexicon.lexc
- hfst-compose-intersect uses
-1(number one) instead of
-l(letter L), and
-2for the rule-file. E.g.
hfst-compose-intersect -1 lexicon.hfst -2 rules.twol.hfst -o generator.hfst