Difference between revisions of "SFST"

From Apertium
Jump to navigation Jump to search
Line 19: Line 19:
 
$ make
 
$ make
 
$ make install
 
$ make install
  +
</pre>
  +
  +
==Usage==
  +
  +
To try SFST out, you can start by compiling the German transducer, SMOR, that comes with the package:
  +
  +
<pre>
  +
$ cd data/SMOR
  +
$ make
  +
</pre>
  +
  +
Wait some time, and you will have a file called <code>morph.a</code>, now you need to compact this so it can be read by <code>fst-proc</code>,
  +
  +
<pre>
  +
$ fst-compact morph.a morph.ac
  +
</pre>
  +
  +
Now you can use it,
  +
  +
<pre>
  +
$ cd ../../src
  +
$ echo "Ich habe ein Bier" | ./fst-proc ../data/SMOR/smor.ac
  +
^Ich/<CAP>ich<+PPRO><pers><1><Sg><NoGend><Nom>$
  +
^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$
  +
^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$ ^Bier/*Bier$
 
</pre>
 
</pre>
   

Revision as of 01:05, 24 May 2008

SFST (Stuttgart Finite State Toolkit) is a set of programs that can be used for writing morphological analysers.

Downloading

A packaged version, with the fst-proc program for processing Apertium input streams can be downloaded from Apertium SVN:

$ svn co http://apertium.svn.sourceforge.net/svnroot/branches/sfst
Compiling

Follow the standard steps:

$ sh autogen.sh
$ ./configure
$ make
$ make install

Usage

To try SFST out, you can start by compiling the German transducer, SMOR, that comes with the package:

$ cd data/SMOR
$ make

Wait some time, and you will have a file called morph.a, now you need to compact this so it can be read by fst-proc,

$ fst-compact morph.a morph.ac

Now you can use it,

$ cd ../../src
$ echo "Ich habe ein Bier" |  ./fst-proc ../data/SMOR/smor.ac
^Ich/<CAP>ich<+PPRO><pers><1><Sg><NoGend><Nom>$ 
^habe/haben<+V><1><Sg><Pres><Konj>/haben<+V><3><Sg><Pres><Konj>/haben<+V><Imp><Sg>/haben<+V><1><Sg><Pres><Ind>$ 
^ein/ein<+ART><Indef><Masc><Nom><Sg>/ein<+ART><Indef><Neut><Nom><Sg>/ein<+ART><Indef><Neut><Akk><Sg>$ ^Bier/*Bier$

Morphologies

The following

  • Morph-IT! (Italian, 34,968 lemmas, LGPL)
  • SMOR — comes in the SFST distribution (German, 1,096 lemmas, GPL)

Performance

The analysers produced are fast. For a 1.3Mb analyser (SMOR), it processes 1,100 words per second. Compare with lttoolbox which processes ~5,000 words per second.

External links