Difference between revisions of "SFST"
Jump to navigation
Jump to search
(→Usage) |
|||
Line 44: | Line 44: | ||
^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$ |
^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$ |
||
^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$ ^Bier/*Bier$ |
^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$ ^Bier/*Bier$ |
||
</pre> |
|||
It should also work with [[deformatters and reformatters]], |
|||
<pre> |
|||
$ echo "Ich habe <em>ein</em> bier" | apertium-deshtml | ./fst-proc ../data/SMOR/smor.ac |
|||
^Ich/<CAP>ich<+PPRO><pers><1><Sg><NoGend><Nom>$ |
|||
^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$[ |
|||
<em>]^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$[<\/em> ]^bier/*bier$.[][ |
|||
</pre> |
</pre> |
||
Revision as of 01:18, 24 May 2008
SFST (Stuttgart Finite State Toolkit) is a set of programs that can be used for writing morphological analysers.
Downloading
A packaged version, with the fst-proc
program for processing Apertium input streams can be downloaded from Apertium SVN:
$ svn co http://apertium.svn.sourceforge.net/svnroot/branches/sfst
- Compiling
Follow the standard steps:
$ sh autogen.sh $ ./configure $ make $ make install
Usage
To try SFST out, you can start by compiling the German transducer, SMOR, that comes with the package:
$ cd data/SMOR $ make
Wait some time, and you will have a file called morph.a
, now you need to compact this so it can be read by fst-proc
,
$ fst-compact morph.a morph.ac
Now you can use it,
$ cd ../../src $ echo "Ich habe ein Bier" | ./fst-proc ../data/SMOR/smor.ac ^Ich/<CAP>ich<+PPRO><pers><1><Sg><NoGend><Nom>$ ^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$ ^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$ ^Bier/*Bier$
It should also work with deformatters and reformatters,
$ echo "Ich habe <em>ein</em> bier" | apertium-deshtml | ./fst-proc ../data/SMOR/smor.ac ^Ich/<CAP>ich<+PPRO><pers><1><Sg><NoGend><Nom>$ ^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$[ <em>]^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$[<\/em> ]^bier/*bier$.[][
Morphologies
The following
- Morph-IT! (Italian, 34,968 lemmas, LGPL)
- SMOR — comes in the SFST distribution (German, 1,096 lemmas, GPL)
Performance
The analysers produced are fast. For a 1.3Mb analyser (SMOR), it processes ~1,100 words per second. Compare with lttoolbox which processes ~5,000 words per second.