Foma
foma is a finite-state toolkit that implements Xerox lexc and xfst. It can be used for building morphologies of natural languages.
Installation
Note: foma requires libreadline
to be installed, on Debian or Ubuntu use apt-get install libreadline5-dev
- Download the .tar.gz source from the website.
- Untar
- Run
make
If you get an error Makefile:12: *** missing separator. Stop.
, edit the Makefile and add \
to the end of the lines 11--13.
- This will create a binary
foma
, which should be copied into yourPATH
.
Example usage
First check out the Greenlandic (kal
) morphology from Giellatekno SVN:
$ svn co https://victorio.uit.no/langtech/trunk/st/kal
Move to the src/
directory and combine all the lexc
source files:
$ cat kal-lex.txt \ abbr-kal-lex.txt acro-kal-lex.txt \ noun-kal-lex.txt verb-kal-lex.txt \ ateq-kal-lex.txt ateq-kal-morph.txt \ punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc
Next, remove the comments from the xfst
rewrite rule file:
$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp
Compile the xfst
code as follows, run foma and load the rewrite rules:
foma[0]: source xfst-kal.tmp Opening file 'xfst-kal.tmp'. defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths. defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths. ... 6.1 MB. 12474 states, 402541 arcs, Cyclic. foma[1]:
Note the [1]
, if you don't get this something has gone wrong.
Next, save the compiled transducer and quit:
foma[1]: save stack xfst-kal.bin Writing to file xfst-kal.bin. foma[1]: quit
Now we compile the lexc file,
External links
- http://foma.sourceforge.net/
- Giellatekno SVN — here you can find some example morphologies in foma format.