Difference between revisions of "Hfst"
Line 26: | Line 26: | ||
$ make |
$ make |
||
$ sudo make install |
$ sudo make install |
||
$ sudo ldconfig |
|||
</pre> |
</pre> |
||
Revision as of 08:54, 22 August 2011
hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob and apertium-fin-sme.
The IRC channel is #hfst
at irc.freenode.net
(you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client).
Prerequisites
- automake, autoconf, libtool
HFST is a sort of meta-package with several backends. To do anything useful, you'll need at least one (preferably all) of:
Compiling HFST3
Subversion checkout
- "MacOS X note: you need XCode installed on your Mac. It came with your computer, and can be downloaded from Apple (registration required)"
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk hfst $ cd hfst/hfst3/ $ sh autogen.sh $ ./configure --prefix=/home/USERNAME/local/ # remove --prefix if you just want it in /usr/local $ make $ sudo make install $ sudo ldconfig
Prepackaged tarball
Download the latest version from [1], and unzip. Then follow the instructions in the README file, i.e.:
$ cd hfst-3.0/ $ sh autogen.sh $ ./configure $ make $ sudo make install $ sudo ldconfig
Troubleshooting
If, during the ./configure step, you see
checking for GNU libc compatible malloc... no […] checking for GNU libc compatible realloc... no
and then during make a bunch of errors like:
/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope
, try the following:
sudo ldconfig export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
and then ./configure and make.
For more advices on installation problems, have a look at the Hfst Readme page.
Compiling HFST2
There are some regressions in HFST3 that make it impossible to use with apertium-sme-nob yet (last tested: revision 1204).
Use revision 627 of HFST2:
$ svn co -r627 https://hfst.svn.sourceforge.net/svnroot/hfst/branches/hfst2 $ cd hfst2 $ autoreconf -i $ ./configure --prefix=/home/fran/local/ # remove --prefix if you just want it in /usr/local $ make $ sudo make install
Using
$ svn co https://victorio.uit.no/langtech/trunk/st/fao $ cd fao/src $ make -f Makefile.hfst $ echo "orð" | hfst-lookup ../bin/fao-morph.hfst lookup> orð orð+N+Neu+Sg+Nom+Indef orð orð+N+Neu+Sg+Acc+Indef orð orð+N+Neu+Pl+Nom+Indef orð orð+N+Neu+Pl+Acc+Indef lookup> $
To compile lexc
code, first concatenate all the lexc files:
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt
To compile this, just use the hfst-lexc
program,
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin
To compile the twol
rules, just use the hfst-twolc
program,
$ hfst-twolc twol-fao.txt > twol-fao.bin
And then to compose the lexicon and rule file, use hfst-compose-intersect
:
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst
This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert
:
$ hfst-invert fao-gen.hfst -o fao-morph.hfst
HFST2 vs HFST3
There have been some changes. Notably:
- In twol files, a
/
in alphabetic symbols has to be escaped, e.g.%+Der%/st
instead of%+Der/st
. - In twol files, you can no longer have Sets on the left-hand side of a rule, so write
Vx:Vy /<= _ ; where Vx in Set1 Vy in Set2 ;
where you before would haveSet1:Set2 /<= _ ;
- The old
-r
option to hfst-twolc is now uppercase:-R
- hfst-lookup-optimize is gone, use instead
hfst-fst2fst -O -i infile.hfst -o outfile.hfst.ol
- hfst-lexc needs the outfile option to be before the lexc (input), e.g.
hfst-lexc -o outfile.hfst mylexicon.lexc
- hfst-compose-intersect uses
-1
(number one) instead of-l
(letter L), and-2
for the rule-file. E.g.hfst-compose-intersect -1 lexicon.hfst -2 rules.twol.hfst -o generator.hfst