Difference between revisions of "Hfst"

From Apertium
Jump to navigation Jump to search
(oh, svn too has back-ends)
Line 4: Line 4:
The IRC channel is <code>#hfst</code> at <code>irc.freenode.net</code> (you may try [irc://irc.freenode.net/#hfst irc://irc.freenode.net/#hfst] if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome HFST Wiki] has some very good documentation (see especially the page [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstReadme HfstReadme] when you run into compilation problems).
The IRC channel is <code>#hfst</code> at <code>irc.freenode.net</code> (you may try [irc://irc.freenode.net/#hfst irc://irc.freenode.net/#hfst] if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstHome HFST Wiki] has some very good documentation (see especially the page [https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstReadme HfstReadme] when you run into compilation problems).


HFST is actually created as a set of wrappers over several possible ''back-ends'', [[Foma]], [[OpenFST]], [[SFST]], …. If you want to use HFST for anything serious, you'll need at least one of these back-ends installed. Fortunately, the latest release of HFST also includes the back-ends in the package, and installs whichever back-end you need along with HFST itself.
HFST is actually created as a set of wrappers over several possible ''back-ends'', [[Foma]], [[OpenFST]], [[SFST]], …. If you want to use HFST for anything serious, you'll need at least one of these back-ends installed. Fortunately, the latest versions of HFST also include the back-ends, and will install whichever back-end you need along with HFST itself.


==Building and installing HFST==
==Installing the latest release==
===Install prerequisites===


You will need the regular build dependencies:
The pre-packaged tarball includes all the regular back-end dependencies, this should make it much simpler to install than fetching all back-ends manually.
* <code>automake, autoconf, libtool, flex, bison, g++</code>


You will need the regular build dependencies (<code>automake, autoconf, libtool, flex, bison, g++</code>) – if you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager. MacOS X users might need to install [http://developer.apple.com/ XCode] (registration required).
If you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager, e.g. on Ubuntu: <code>sudo apt-get install automake autoconf libtool flex bison g++</code>, on Arch Linux: <code>sudo pacman -S base-devel</code>.


MacOS X users might need to install [http://developer.apple.com/ XCode] (registration required).


===Download HFST===
Download the latest version, named something like hfst-X.Y.Z.tar.gz, from [http://sourceforge.net/projects/hfst/files/], and unzip. Then follow the instructions in the README file, typically:

Either use the latest release (recommended for users), or go with the bleeding-edge SVN version (recommended for developers).

====From SVN====


<pre>
<pre>
$ svn co svn://svn.code.sf.net/p/hfst/code/trunk/hfst3 hfst3
$ tar -xzf hfst-X.Y.Z.tgz
$ cd hfst-X.Y.Z/
$ cd hfst3/
$ autoreconf -i
$ ./configure --with-foma --enable-lexc
$ make
$ sudo make install
$ sudo ldconfig
</pre>
</pre>


(Only SVN users need the autoreconf step.)
Note: you can add <code>--with-unicode-handler=glib</code> (or <code>--with-unicode-handler=ICU</code>) to the ./configure step if you have glib (or ICU) installed and want better Unicode [https://en.wikipedia.org/wiki/Case_folding#Case_folding Case_folding].


==HFST from svn==
====Released tarball====
===Prerequisites===


Download the latest release, named something like hfst-X.Y.Z.tar.gz, from [http://sourceforge.net/projects/hfst/files/], then
'''Required no matter what:'''
<pre>
* automake, autoconf, libtool, flex, bison, g++, svn
$ tar -xzf hfst-X.Y.Z.tgz
$ cd hfst-X.Y.Z/
</pre>
(replacing X.Y.Z for the version you downloaded)


===Configure===
On MacOS X you may need to install [http://developer.apple.com/ XCode] (registration required).


In the configure step, you'll need to decide which back-ends you want to include.
If you haven't ever installed the pre-packaged tarball (see [[Hfst#Installing_the_latest_release|above]]), you get to install prerequisite back-ends manually:


'''Required back-ends:'''
'''Required back-ends:'''
* [[OpenFST]]
* [[OpenFST]]
** Included by default.
** YOU HAVE TO INSTALL THIS!


'''Semi-Optional Backends:'''
'''Semi-Optional Backends:'''
* [[Foma]] -- used for lexc and xfst (sequential rewrite rules)
* [[Foma]] used for lexc and xfst (sequential rewrite rules)
** remember to pass <code>--enable-lexc --with-foma</code> to ./configure to use this
** pass <code>--enable-lexc --with-foma</code> to ./configure to use this
** IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY
** IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY


'''Optional Backends:'''
'''Optional Backends:'''
* [[SFST]] -- makes hfst-substitute a lot faster
* [[SFST]] makes hfst-substitute faster
** remember to pass <code>--with-sfst</code> to ./configure to use this
** pass <code>--with-sfst</code> to ./configure to use this
** DON'T INSTALL THIS UNLESS YOU REALLY NEED TO. IT MAKES EVERYTHING HARDER, NOT EASIER
** DON'T INSTALL THIS UNLESS YOU REALLY NEED TO. IT MAKES EVERYTHING HARDER, NOT EASIER


For most users, this is enough:
You can also use glib or ICU to handle Unicode operations (configure --with-unicode-handler={glib,ICU}).

===Compiling HFST3 from svn===
First we need to checkout the code from the svn
<pre>
<pre>
$ ./configure --with-foma --enable-lexc
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk/hfst3
</pre>
Next we need to change the directory to the downloaded one
<pre>
$ cd hfst3/
</pre>
And then run autoreconf -i
<pre>
$ autoreconf -i
</pre>
</pre>


The above command will configure it to be installed to /usr/local in the <code>make install</code> step (below).


If you want hfst and back-ends installed somewhere else, you can do
Now's the time to configure the package, this is done differently depending on which of the back ends you have installed earlier. If you've installed all of them use
<pre>
<pre>
$ ./configure --enable-lexc --enable-calculate --with-foma --with-sfst --prefix=/home/USERNAME/local/
$ ./configure --with-foma --enable-lexc --prefix=/home/USERNAME/local/
</pre>
</pre>


'''Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing <code>whoami</code>'''
'''Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing <code>whoami</code>'''


You can also add <code>--with-unicode-handler=glib</code> (or <code>--with-unicode-handler=ICU</code>) to the ./configure step if you have glib (or ICU) installed and want better Unicode [https://en.wikipedia.org/wiki/Case_folding#Case_folding Case_folding].
If you've only installed foma and openfst use
<pre>
$ ./configure --enable-lexc --with-foma --prefix=/home/USERNAME/local/
</pre>
and if you've only installed sfst and openfst use
<pre>
$ ./configure --with-sfst --enable-calculate --prefix=/home/USERNAME/local/
</pre>
Note: If you want to install hfst in /usr/local, dump the --prefix at the end of the configure command




===Compile and install===
Now for the easier part, you need to make the package by running
Build the package by running
<pre>$ make</pre>
<pre>$ make</pre>

then you need to install (Note: you need to add a sudo in front of the command if you installed it in /usr/local; otherwise, no sudo!)
Then you need to install (Note: you need to use <code>sudo make install</code> if you installed it in /usr/local (or did not give a --prefix in the configure step); otherwise, no sudo!)
<pre>
<pre>
$ make install
$ make install
</pre>
</pre>

and finally (this might not be necessary on your Mac)
And finally, unless you have a Mac, you may need to do:
<pre>
<pre>
$ sudo ldconfig
$ sudo ldconfig

Revision as of 11:48, 4 January 2013

hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst. It is currently being used in apertium-sme-nob and apertium-fin-sme.

The IRC channel is #hfst at irc.freenode.net (you may try irc://irc.freenode.net/#hfst if your browser supports it, or enter #hfst into http://webchat.freenode.net/ if you want a web client). The HFST Wiki has some very good documentation (see especially the page HfstReadme when you run into compilation problems).

HFST is actually created as a set of wrappers over several possible back-ends, Foma, OpenFST, SFST, …. If you want to use HFST for anything serious, you'll need at least one of these back-ends installed. Fortunately, the latest versions of HFST also include the back-ends, and will install whichever back-end you need along with HFST itself.

Building and installing HFST

Install prerequisites

You will need the regular build dependencies:

  • automake, autoconf, libtool, flex, bison, g++

If you've already installed apertium/lttoolbox these should be installed already; if not, they should be easily installable with your package manager, e.g. on Ubuntu: sudo apt-get install automake autoconf libtool flex bison g++, on Arch Linux: sudo pacman -S base-devel.

MacOS X users might need to install XCode (registration required).

Download HFST

Either use the latest release (recommended for users), or go with the bleeding-edge SVN version (recommended for developers).

From SVN

$ svn co svn://svn.code.sf.net/p/hfst/code/trunk/hfst3 hfst3
$ cd hfst3/
$ autoreconf -i

(Only SVN users need the autoreconf step.)

Released tarball

Download the latest release, named something like hfst-X.Y.Z.tar.gz, from [1], then

$ tar -xzf hfst-X.Y.Z.tgz
$ cd hfst-X.Y.Z/

(replacing X.Y.Z for the version you downloaded)

Configure

In the configure step, you'll need to decide which back-ends you want to include.

Required back-ends:

Semi-Optional Backends:

  • Foma – used for lexc and xfst (sequential rewrite rules)
    • pass --enable-lexc --with-foma to ./configure to use this
    • IF YOU PLAN ON COMPILING ANY LEXC FILES, THIS IS BASICALLY MANDATORY

Optional Backends:

  • SFST – makes hfst-substitute faster
    • pass --with-sfst to ./configure to use this
    • DON'T INSTALL THIS UNLESS YOU REALLY NEED TO. IT MAKES EVERYTHING HARDER, NOT EASIER

For most users, this is enough:

$ ./configure --with-foma --enable-lexc

The above command will configure it to be installed to /usr/local in the make install step (below).

If you want hfst and back-ends installed somewhere else, you can do

$ ./configure --with-foma --enable-lexc --prefix=/home/USERNAME/local/

Note: When we say USERNAME we mean your username, you need to replace it with your username, if you don't know what it is, you can find out by typing whoami

You can also add --with-unicode-handler=glib (or --with-unicode-handler=ICU) to the ./configure step if you have glib (or ICU) installed and want better Unicode Case_folding.


Compile and install

Build the package by running

$ make

Then you need to install (Note: you need to use sudo make install if you installed it in /usr/local (or did not give a --prefix in the configure step); otherwise, no sudo!)

$ make install

And finally, unless you have a Mac, you may need to do:

$ sudo ldconfig


Troubleshooting

If, during the ./configure step, you see

checking for GNU libc compatible malloc... no
[…]
checking for GNU libc compatible realloc... no

and then during make a bunch of errors like:

/usr/local/include/sfst/mem.h:37:57: error: 'malloc' was not declared in this scope

, try the following:

sudo ldconfig
export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

and then ./configure and make.


If, during make, you see errors like

xre_parse.cc:2293:24: error: invalid conversion from 'const char*' to 'char*' [-fpermissive]

try instead

make CXXFLAGS=-fpermissive


For more advices on installation problems, have a look at the Hfst Readme page.

See also Foma, OpenFST and SFST for compilation problems regarding the back-ends.

Using

$ svn co https://victorio.uit.no/langtech/trunk/st/fao
$ cd fao/src
$ make -f Makefile.hfst

$ echo "orð" | hfst-lookup ../bin/fao-morph.hfst
lookup> 
orð	orð+N+Neu+Sg+Nom+Indef
orð	orð+N+Neu+Sg+Acc+Indef
orð	orð+N+Neu+Pl+Nom+Indef
orð	orð+N+Neu+Pl+Acc+Indef

lookup>
$

To compile lexc code, first concatenate all the lexc files:

$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \
adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \
abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \
numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \
interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt

To compile this, just use the hfst-lexc program,

hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin

To compile the twol rules, just use the hfst-twolc program,

$ hfst-twolc twol-fao.txt > twol-fao.bin

And then to compose the lexicon and rule file, use hfst-compose-intersect:

$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst

This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert:

$ hfst-invert fao-gen.hfst -o fao-morph.hfst

HFST2 vs HFST3

There have been some changes. Notably:

  • In twol files, a / in alphabetic symbols has to be escaped, e.g. %+Der%/st instead of %+Der/st.
  • In twol files, you can no longer have Sets on the left-hand side of a rule, so write Vx:Vy /<= _ ; where Vx in Set1 Vy in Set2 ; where you before would have Set1:Set2 /<= _ ;
  • The old -r option to hfst-twolc is now uppercase: -R
  • hfst-lookup-optimize is gone, use instead hfst-fst2fst -O -i infile.hfst -o outfile.hfst.ol
  • hfst-lexc needs the outfile option to be before the lexc (input), e.g. hfst-lexc -o outfile.hfst mylexicon.lexc
  • hfst-compose-intersect uses -1 (number one) instead of -l (letter L), and -2 for the rule-file. E.g. hfst-compose-intersect -1 lexicon.hfst -2 rules.twol.hfst -o generator.hfst

See also

External links