Difference between revisions of "Hfst"
Nathan0n5ire (talk | contribs) |
Nathan0n5ire (talk | contribs) |
||
Line 8: | Line 8: | ||
==Prerequisites== |
==Prerequisites== |
||
'''Required:''' |
|||
* automake, autoconf, libtool |
* automake, autoconf, libtool |
||
HFST is a sort of meta-package with several ''backends''. To do anything useful, you'll need at least one (preferably all) of: |
|||
* [[OpenFST]] |
* [[OpenFST]] |
||
⚫ | |||
'''Semi-Optional Backends:''' |
|||
⚫ | |||
* [[Foma]] -- used for lexc and xfst (sequential rewrite rules) |
* [[Foma]] -- used for lexc and xfst (sequential rewrite rules) |
||
** remember to pass <code>--enable-lexc --with-foma</code> to ./configure to use this |
** remember to pass <code>--enable-lexc --with-foma</code> to ./configure to use this |
||
** IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY |
|||
'''Optional Backends:''' |
|||
⚫ | |||
⚫ | |||
You can also use glib or ICU to handle Unicode operations (configure --with-unicode-handler={glib,ICU}). |
You can also use glib or ICU to handle Unicode operations (configure --with-unicode-handler={glib,ICU}). |
Revision as of 21:41, 24 November 2011
hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob and apertium-fin-sme.
The IRC channel is #hfst
at irc.freenode.net
(you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The HFST Wiki has some very good documentation (see especially the page HfstReadme when you run into compilation problems).
Prerequisites
Required:
- automake, autoconf, libtool
- OpenFST
Semi-Optional Backends:
- Foma -- used for lexc and xfst (sequential rewrite rules)
- remember to pass
--enable-lexc --with-foma
to ./configure to use this - IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY
- remember to pass
Optional Backends:
- SFST -- makes hfst-substitute a lot faster
- remember to pass
--with-sfst
to ./configure to use this
- remember to pass
You can also use glib or ICU to handle Unicode operations (configure --with-unicode-handler={glib,ICU}).
Compiling HFST3
Subversion checkout
- "MacOS X note: you need XCode installed on your Mac. It came with your computer, and can be downloaded from Apple (registration required)"
First we need to checkout the code from the svn
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk/hfst3
Next we need to change the directory to the downloaded one
$ cd hfst3/
And then run autoreconf -i
$ autoreconf -i
Now's the time to configure the package, this is done differently depending on which of the back ends you have installed earlier; If you've installed all of them use
$ ./configure --enable-lexc --with-foma --with-sfst --prefix=/home/USERNAME/local/
If you've only installed foma and openfst use
$ ./configure --enable-lexc --with-foma --prefix=/home/USERNAME/local/
and if you've only installed sfst and openfst use
$ ./configure --with-sfst --prefix=/home/USERNAME/local/
Note: If you want to install hfst in /usr/local, dump the --prefix at the end of the configure command
Now for the easier part, you need to make the package by running
$ make
then you need to install (Note: you need to add a sudo in front of the command if you installed it in /usr/local)
$ make install
and finally (this might not be necessary on your Mac)
$ sudo ldconfig
Prepackaged tarball
Download the latest version from [1], and unzip. Then follow the instructions in the README file, i.e.:
$ cd hfst-3.0/ $ sh autogen.sh $ ./configure $ make $ sudo make install $ sudo ldconfig
Troubleshooting
If, during the ./configure step, you see
checking for GNU libc compatible malloc... no […] checking for GNU libc compatible realloc... no
and then during make a bunch of errors like:
/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope
, try the following:
sudo ldconfig export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
and then ./configure and make.
If, during make, you see errors like
xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]
try instead
make CXXFLAGS=-fpermissive
For more advices on installation problems, have a look at the Hfst Readme page.
Using
$ svn co https://victorio.uit.no/langtech/trunk/st/fao $ cd fao/src $ make -f Makefile.hfst $ echo "orð" | hfst-lookup ../bin/fao-morph.hfst lookup> orð orð+N+Neu+Sg+Nom+Indef orð orð+N+Neu+Sg+Acc+Indef orð orð+N+Neu+Pl+Nom+Indef orð orð+N+Neu+Pl+Acc+Indef lookup> $
To compile lexc
code, first concatenate all the lexc files:
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt
To compile this, just use the hfst-lexc
program,
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin
To compile the twol
rules, just use the hfst-twolc
program,
$ hfst-twolc twol-fao.txt > twol-fao.bin
And then to compose the lexicon and rule file, use hfst-compose-intersect
:
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst
This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert
:
$ hfst-invert fao-gen.hfst -o fao-morph.hfst
HFST2 vs HFST3
There have been some changes. Notably:
- In twol files, a
/
in alphabetic symbols has to be escaped, e.g.%+Der%/st
instead of%+Der/st
. - In twol files, you can no longer have Sets on the left-hand side of a rule, so write
Vx:Vy /<= _ ; where Vx in Set1 Vy in Set2 ;
where you before would haveSet1:Set2 /<= _ ;
- The old
-r
option to hfst-twolc is now uppercase:-R
- hfst-lookup-optimize is gone, use instead
hfst-fst2fst -O -i infile.hfst -o outfile.hfst.ol
- hfst-lexc needs the outfile option to be before the lexc (input), e.g.
hfst-lexc -o outfile.hfst mylexicon.lexc
- hfst-compose-intersect uses
-1
(number one) instead of-l
(letter L), and-2
for the rule-file. E.g.hfst-compose-intersect -1 lexicon.hfst -2 rules.twol.hfst -o generator.hfst