Difference between revisions of "Foma"
Line 12: | Line 12: | ||
* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>. |
* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>. |
||
== Example usage == |
|||
First check out the Greenlandic (<code>kal</code>) morphology from Giellatekno SVN: |
|||
<pre> |
|||
$ svn co https://victorio.uit.no/langtech/trunk/st/kal |
|||
</pre> |
|||
Move to the <code>src/</code> directory and combine all the <code>lexc</code> source files: |
|||
<pre> |
|||
$ cat kal-lex.txt \ |
|||
abbr-kal-lex.txt acro-kal-lex.txt \ |
|||
noun-kal-lex.txt verb-kal-lex.txt \ |
|||
ateq-kal-lex.txt ateq-kal-morph.txt \ |
|||
punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc |
|||
</pre> |
|||
Next, remove the comments from the <code>xfst</code> rewrite rule file: |
|||
<pre> |
|||
$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp |
|||
</pre> |
|||
Compile the <code>xfst</code> code as follows, run foma and load the rewrite rules: |
|||
<pre> |
|||
foma[0]: source xfst-kal.tmp |
|||
Opening file 'xfst-kal.tmp'. |
|||
defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths. |
|||
defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths. |
|||
... |
|||
6.1 MB. 12474 states, 402541 arcs, Cyclic. |
|||
foma[1]: |
|||
</pre> |
|||
Note the <code>[1]</code>, if you don't get this something has gone wrong. |
|||
Next, save the compiled transducer and quit: |
|||
<pre> |
|||
foma[1]: save stack xfst-kal.bin |
|||
Writing to file xfst-kal.bin. |
|||
foma[1]: quit |
|||
</pre> |
|||
Now we compile the lexc file, |
|||
== External links == |
== External links == |
Revision as of 08:48, 30 September 2009
foma is a finite-state toolkit that implements Xerox lexc and xfst. It can be used for building morphologies of natural languages.
Installation
Note: foma requires libreadline
to be installed, on Debian or Ubuntu use apt-get install libreadline5-dev
- Download the .tar.gz source from the website.
- Untar
- Run
make
If you get an error Makefile:12: *** missing separator. Stop.
, edit the Makefile and add \
to the end of the lines 11--13.
- This will create a binary
foma
, which should be copied into yourPATH
.
Example usage
First check out the Greenlandic (kal
) morphology from Giellatekno SVN:
$ svn co https://victorio.uit.no/langtech/trunk/st/kal
Move to the src/
directory and combine all the lexc
source files:
$ cat kal-lex.txt \ abbr-kal-lex.txt acro-kal-lex.txt \ noun-kal-lex.txt verb-kal-lex.txt \ ateq-kal-lex.txt ateq-kal-morph.txt \ punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc
Next, remove the comments from the xfst
rewrite rule file:
$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp
Compile the xfst
code as follows, run foma and load the rewrite rules:
foma[0]: source xfst-kal.tmp Opening file 'xfst-kal.tmp'. defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths. defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths. ... 6.1 MB. 12474 states, 402541 arcs, Cyclic. foma[1]:
Note the [1]
, if you don't get this something has gone wrong.
Next, save the compiled transducer and quit:
foma[1]: save stack xfst-kal.bin Writing to file xfst-kal.bin. foma[1]: quit
Now we compile the lexc file,
External links
- http://foma.sourceforge.net/
- Giellatekno SVN — here you can find some example morphologies in foma format.