Difference between revisions of "Hfst"
Jump to navigation
Jump to search
(→Using) |
(→Using) |
||
Line 36: | Line 36: | ||
<pre> |
<pre> |
||
$ svn co https://victorio.uit.no/langtech/trunk/st/fao |
$ svn co https://victorio.uit.no/langtech/trunk/st/fao |
||
$ cd fao |
$ cd fao/src |
||
$ make -f Makefile.hfst |
|||
⚫ | |||
$ echo "orð" | hfst-lookup ../bin/fao-morph.hfst |
|||
⚫ | |||
lookup> |
|||
orð orð+N+Neu+Sg+Nom+Indef |
|||
orð orð+N+Neu+Sg+Acc+Indef |
|||
orð orð+N+Neu+Pl+Nom+Indef |
|||
orð orð+N+Neu+Pl+Acc+Indef |
|||
lookup> |
|||
$ |
|||
</pre> |
|||
To compile <code>lexc</code> code, first concatenate all the lexc files: |
|||
<pre> |
|||
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ |
|||
adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ |
|||
abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ |
|||
numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ |
|||
interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt |
|||
</pre> |
|||
Then, open the <code>hfst-lexc</code> program, and do <code>compile-source</code> and <code>save-source</code>: |
|||
<pre> |
|||
$ hfst-lexc |
|||
⚫ | |||
lexc> compile-source ../tmp/lexc-all.txt |
|||
... |
|||
Minimizing...Done! |
|||
lexc> save-source ../bin/lexc-fao.bin |
|||
opening "../bin/lexc-fao.bin" |
|||
Opening '../bin/lexc-fao.bin'... |
|||
Done. |
|||
lexc> quit |
|||
</pre> |
|||
To compile the <code>twol</code> rules, just use the <code>hfst-twolc</code> program, |
|||
<pre> |
|||
⚫ | |||
</pre> |
|||
And then to compose the lexicon and rule file, use <code>hfst-compose-intersect</code>: |
|||
<pre> |
|||
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst |
|||
</pre> |
|||
This will create a generator, if you want an analyser, you just need to invert the generator with <code>hfst-invert</code>: |
|||
<pre> |
|||
$ hfst-invert ../bin/fao-gen.hfst -o fao-morph.hfst |
|||
</pre> |
</pre> |
||
Revision as of 22:59, 26 October 2009
hfst is the Helsinki finite-state toolkit. This is formalism-compatible with both lexc and twolc, so, kind of like foma is to xfst.
Prerequisites
- automake, autoconf, libtool
Compiling
Subversion checkout
- "MacOS X note: you need XCode installed on your Mac. It came with your computer, and can be downloaded from Apple (registration required)"
$ svn co https://hfst.svn.sourceforge.net/svnroot/hfst/trunk hfst $ cd hfst/hfst-2.0/ $ autoreconf -i $ ./configure --prefix=/home/fran/local/ $ make $ sudo make install
Prepackaged tarball
Download the latest version from [1], and unzip. Then follow the instructions in the README file, i.e.:
$ cd hfst-2.0/ $ ./configure $ make $ sudo make install
Using
$ svn co https://victorio.uit.no/langtech/trunk/st/fao $ cd fao/src $ make -f Makefile.hfst $ echo "orð" | hfst-lookup ../bin/fao-morph.hfst lookup> orð orð+N+Neu+Sg+Nom+Indef orð orð+N+Neu+Sg+Acc+Indef orð orð+N+Neu+Pl+Nom+Indef orð orð+N+Neu+Pl+Acc+Indef lookup> $
To compile lexc
code, first concatenate all the lexc files:
$ cat fao-lex.txt noun-fao-lex.txt noun-fao-morph.txt adj-fao-lex.txt \ adj-fao-morph.txt verb-fao-lex.txt verb-fao-morph.txt adv-fao-lex.txt \ abbr-fao-lex.txt acro-fao-lex.txt pron-fao-lex.txt punct-fao-lex.txt \ numeral-fao-lex.txt pp-fao-lex.txt cc-fao-lex.txt cs-fao-lex.txt \ interj-fao-lex.txt det-fao-lex.txt > ../tmp/lexc-all.txt
Then, open the hfst-lexc
program, and do compile-source
and save-source
:
$ hfst-lexc ... lexc> compile-source ../tmp/lexc-all.txt ... Minimizing...Done! lexc> save-source ../bin/lexc-fao.bin opening "../bin/lexc-fao.bin" Opening '../bin/lexc-fao.bin'... Done. lexc> quit
To compile the twol
rules, just use the hfst-twolc
program,
$ hfst-twolc twol-fao.txt > twol-fao.bin
And then to compose the lexicon and rule file, use hfst-compose-intersect
:
$ hfst-compose-intersect -l lexc-fao.bin twol-fao.bin -o fao-gen.hfst
This will create a generator, if you want an analyser, you just need to invert the generator with hfst-invert
:
$ hfst-invert ../bin/fao-gen.hfst -o fao-morph.hfst