Difference between revisions of "Hfst"

From Apertium
Jump to navigation Jump to search
Line 61: Line 61:
 
</pre>
 
</pre>
   
Then, open the <code>hfst-lexc</code> program, and do <code>compile-source</code> and <code>save-source</code>:
+
To compile this, just use the <code>hfst-lexc</code> program,
   
 
<pre>
 
<pre>
  +
hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin
$ hfst-lexc
 
 
...
 
 
lexc> compile-source lexc-all.txt
 
 
...
 
 
Minimizing...Done!
 
lexc> save-source lexc-fao.bin
 
opening "lexc-fao.bin"
 
Opening 'lexc-fao.bin'...
 
Done.
 
lexc> quit
 
 
</pre>
 
</pre>
   

Revision as of 13:03, 18 November 2009

hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst.

Prerequisites

  • automake, autoconf, libtool

Compiling

Subversion checkout

"MacOS X note: you need XCode installed on your Mac. It came with your computer, and can be downloaded from Apple (registration required)"
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk hfst 
$ cd hfst/hfst/
$ autoreconf -i
$ ./configure --prefix=/home/fran/local/
$ make
$ sudo make install

Prepackaged tarball

Download the latest version from [1], and unzip. Then follow the instructions in the README file, i.e.:

$ cd hfst-2.0/
$ ./configure
$ make
$ sudo make install

Using

$ svn co https://victorio.uit.no/langtech/trunk/st/fao
$ cd fao/src
$ make -f Makefile.hfst

$ echo "orð" | hfst-lookup ../bin/fao-morph.hfst
lookup> 
orð	orð+N+Neu+Sg+Nom+Indef
orð	orð+N+Neu+Sg+Acc+Indef
orð	orð+N+Neu+Pl+Nom+Indef
orð	orð+N+Neu+Pl+Acc+Indef

lookup>
$

To compile lexc code, first concatenate all the lexc files:

$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \
adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \
abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \
numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \
interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt

To compile this, just use the hfst-lexc program,

hfst-lexc < ../tmp/lexc-all.txt > ../bin/lexc-fao.bin

To compile the twol rules, just use the hfst-twolc program,

$ hfst-twolc twol-fao.txt > twol-fao.bin

And then to compose the lexicon and rule file, use hfst-compose-intersect:

$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst

This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert:

$ hfst-invert fao-gen.hfst -o fao-morph.hfst

External links