Difference between revisions of "Hfst"

From Apertium
Jump to navigation Jump to search
Line 38: Line 38:
$ make
$ make
$ sudo make install
$ sudo make install
$ sudo ldconfig
</pre>
</pre>



Revision as of 17:47, 13 August 2011

hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob and apertium-fin-sme.

The IRC channel is #hfst at irc.freenode.net (you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client).

Prerequisites

  • automake, autoconf, libtool

HFST is a sort of meta-package with several backends. To do anything useful, you'll need at least one (preferably all) of:

Compiling HFST3

Subversion checkout

"MacOS X note: you need XCode installed on your Mac. It came with your computer, and can be downloaded from Apple (registration required)"
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk hfst 
$ cd hfst/hfst3/
$ sh autogen.sh
$ ./configure --prefix=/home/USERNAME/local/ # remove --prefix if you just want it in /usr/local
$ make
$ sudo make install

Prepackaged tarball

Download the latest version from [1], and unzip. Then follow the instructions in the README file, i.e.:

$ cd hfst-3.0/
$ sh autogen.sh
$ ./configure
$ make
$ sudo make install
$ sudo ldconfig

Troubleshooting

If, during the ./configure step, you see

checking for GNU libc compatible malloc... no
[…]
checking for GNU libc compatible realloc... no

and then during make a bunch of errors like:

/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope

, try the following:

sudo ldconfig
export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

and then ./configure and make.

For more advices on installation problems, have a look at the Hfst Readme page.

Compiling HFST2

There are some regressions in HFST3 that make it impossible to use with apertium-sme-nob yet (last tested: revision 1204).

Use revision 627 of HFST2:

$ svn co -r627 https://hfst.svn.sourceforge.net/svnroot/hfst/branches/hfst2
$ cd hfst2
$ autoreconf -i
$ ./configure --prefix=/home/fran/local/ # remove --prefix if you just want it in /usr/local
$ make
$ sudo make install

Using

$ svn co https://victorio.uit.no/langtech/trunk/st/fao
$ cd fao/src
$ make -f Makefile.hfst

$ echo "orð" | hfst-lookup ../bin/fao-morph.hfst
lookup> 
orð	orð+N+Neu+Sg+Nom+Indef
orð	orð+N+Neu+Sg+Acc+Indef
orð	orð+N+Neu+Pl+Nom+Indef
orð	orð+N+Neu+Pl+Acc+Indef

lookup>
$

To compile lexc code, first concatenate all the lexc files:

$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \
adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \
abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \
numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \
interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt

To compile this, just use the hfst-lexc program,

hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin

To compile the twol rules, just use the hfst-twolc program,

$ hfst-twolc twol-fao.txt > twol-fao.bin

And then to compose the lexicon and rule file, use hfst-compose-intersect:

$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst

This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert:

$ hfst-invert fao-gen.hfst -o fao-morph.hfst

HFST2 vs HFST3

There have been some changes. Notably:

  • In twol files, a / in alphabetic symbols has to be escaped, e.g. %+Der%/st instead of %+Der/st.
  • In twol files, you can no longer have Sets on the left-hand side of a rule, so write Vx:Vy /<= _ ; where Vx in Set1 Vy in Set2 ; where you before would have Set1:Set2 /<= _ ;
  • The old -r option to hfst-twolc is now uppercase: -R
  • hfst-lookup-optimize is gone, use instead hfst-fst2fst -O -i infile.hfst -o outfile.hfst.ol
  • hfst-lexc needs the outfile option to be before the lexc (input), e.g. hfst-lexc -o outfile.hfst mylexicon.lexc
  • hfst-compose-intersect uses -1 (number one) instead of -l (letter L), and -2 for the rule-file. E.g. hfst-compose-intersect -1 lexicon.hfst -2 rules.twol.hfst -o generator.hfst

See also

External links