Difference between revisions of "Hfst"
(→From SVN: replaced outdated svn instructions with git ones) |
|||
Line 27: | Line 27: | ||
Either use the latest release (recommended for users), or go with the bleeding-edge SVN version (recommended for developers). |
Either use the latest release (recommended for users), or go with the bleeding-edge SVN version (recommended for developers). |
||
====From |
====From SVN==== |
||
'''Note: hfst has moved to Github and downloading it from sf will give you an outdated version''' |
|||
<pre> |
<pre> |
||
$ svn co svn://svn.code.sf.net/p/hfst/code/trunk/hfst3 hfst3 |
|||
$ git clone https://github.com/hfst/hfst.git |
|||
$ cd hfst3/ |
$ cd hfst3/ |
||
$ ./autogen.sh |
|||
</pre> |
</pre> |
||
(The autogen step is only needed when using SVN, not with the tarball.) |
|||
====Released tarball==== |
====Released tarball==== |
Revision as of 14:37, 23 November 2016
hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob, apertium-fin-sme, apertium-kaz-tat and in few other pairs which involve Turkic languages.
The IRC channel is #hfst
at irc.freenode.net
(you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The HFST Wiki has some very good documentation (see especially the page HfstReadme when you run into compilation problems).
HFST is actually created as a set of wrappers over several possible back-ends, Foma, OpenFST, SFST, …. The latest versions of HFST include the back-ends you need, so there's no reason to install any of these backends separately.
Building and installing HFST
See Installation, for most real operating systems you can now get pre-built packages of HFST (as well as other core tools) through your regular package manager.
If you wish to hack on the HFST C++ code itself (or you are on some system that doesn't have packages yet), you can follow this procedure:
Install prerequisites
You will need the regular build dependencies:
automake, autoconf, libtool, flex, bison, g++, libreadline-dev
If you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager, e.g.
- Ubuntu:
sudo apt-get install automake autoconf libtool flex bison g++ libreadline-dev
- Arch Linux:
sudo pacman -S base-devel
- MacOS X users should install the general Prerequisites_for_Mac_OS_X first, then
sudo port install bison readline
Download HFST
Either use the latest release (recommended for users), or go with the bleeding-edge SVN version (recommended for developers).
From SVN
Note: hfst has moved to Github and downloading it from sf will give you an outdated version
$ svn co svn://svn.code.sf.net/p/hfst/code/trunk/hfst3 hfst3 $ cd hfst3/ $ ./autogen.sh
(The autogen step is only needed when using SVN, not with the tarball.)
Released tarball
Download the latest release, named something like hfst-X.Y.Z.tar.gz, from http://sourceforge.net/projects/hfst/files/, then
$ tar -xzf hfst-X.Y.Z.tgz $ cd hfst-X.Y.Z/
(replacing X.Y.Z for the version you downloaded)
Configure
In the configure step, you can turn on/off features and backends and such. The OpenFST backend is included in the HFST distribution, while foma and SFST are not and are not recommended since they typically lead to more trouble than it's worth.
For most users, this should work:
$ ./configure --enable-proc --without-foma --enable-lexc --enable-all-tools
The above command will configure it to be installed to /usr/local in the make install
step (below).
If you want hfst and back-ends installed somewhere else, you can do
$ ./configure --enable-proc --without-foma --enable-lexc --enable-all-tools --prefix=/home/USERNAME/local/
Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing whoami
You can also add --with-unicode-handler=glib
(or --with-unicode-handler=ICU
) to the ./configure step if you have glib (or ICU) installed and want better Unicode Case_folding.
Compile and install
If your autotools version is older than 1.14 (check with automake --version
), first do:
$ scripts/generate-cc-files.sh
Build by running
$ make
Then you need to install (Note: you need to use sudo make install
if you installed it in /usr/local (or did not give a --prefix in the configure step); otherwise, no sudo!)
$ make install
And finally, unless you have a Mac, you may need to do:
$ sudo ldconfig
Troubleshooting
When doing "make" with old autotools (pre 1.14?)
make[5]: *** No rule to make target `xre_parse.hh', needed by `xre_lex.ll'. Stop.
Run scripts/generate-cc-files.sh
and then make again.
If, during the ./configure step, you see
checking for GNU libc compatible malloc... no […] checking for GNU libc compatible realloc... no
and then during make a bunch of errors like:
/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope
, try the following:
sudo ldconfig export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
and then ./configure and make.
If, during make, you see errors like
xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
try instead
make CXXFLAGS=-fpermissive
If, when compiling a dictionary, you end up in a "foma" prompt where you can type stuff, you should remove anything related to foma or "hfst-xfst" from your system, and build HFST anew as described above.
For more advices on installation problems, have a look at the Hfst Readme page.
See also Foma, OpenFST and SFST for problems regarding the back-ends.
Using
$ svn co https://victorio.uit.no/langtech/trunk/st/fao $ cd fao/src $ make -f Makefile.hfst $ echo "orð" | hfst-lookup ../bin/fao-morph.hfst lookup> orð orð+N+Neu+Sg+Nom+Indef orð orð+N+Neu+Sg+Acc+Indef orð orð+N+Neu+Pl+Nom+Indef orð orð+N+Neu+Pl+Acc+Indef lookup> $
To compile lexc
code, first concatenate all the lexc files:
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt
To compile this, just use the hfst-lexc
program,
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin
To compile the twol
rules, just use the hfst-twolc
program,
$ hfst-twolc twol-fao.txt > twol-fao.bin
And then to compose the lexicon and rule file, use hfst-compose-intersect
:
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst
This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert
:
$ hfst-invert fao-gen.hfst -o fao-morph.hfst