Difference between revisions of "Foma"

From Apertium
Jump to navigation Jump to search
Line 12: Line 12:
   
 
* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>.
 
* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>.
  +
  +
== Example usage ==
  +
  +
First check out the Greenlandic (<code>kal</code>) morphology from Giellatekno SVN:
  +
  +
<pre>
  +
$ svn co https://victorio.uit.no/langtech/trunk/st/kal
  +
</pre>
  +
  +
Move to the <code>src/</code> directory and combine all the <code>lexc</code> source files:
  +
  +
<pre>
  +
$ cat kal-lex.txt \
  +
abbr-kal-lex.txt acro-kal-lex.txt \
  +
noun-kal-lex.txt verb-kal-lex.txt \
  +
ateq-kal-lex.txt ateq-kal-morph.txt \
  +
punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc
  +
</pre>
  +
  +
Next, remove the comments from the <code>xfst</code> rewrite rule file:
  +
  +
<pre>
  +
$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp
  +
</pre>
  +
  +
Compile the <code>xfst</code> code as follows, run foma and load the rewrite rules:
  +
  +
<pre>
  +
foma[0]: source xfst-kal.tmp
  +
Opening file 'xfst-kal.tmp'.
  +
defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths.
  +
defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths.
  +
...
  +
6.1 MB. 12474 states, 402541 arcs, Cyclic.
  +
foma[1]:
  +
</pre>
  +
  +
Note the <code>[1]</code>, if you don't get this something has gone wrong.
  +
  +
Next, save the compiled transducer and quit:
  +
  +
<pre>
  +
foma[1]: save stack xfst-kal.bin
  +
Writing to file xfst-kal.bin.
  +
foma[1]: quit
  +
</pre>
  +
  +
Now we compile the lexc file,
   
 
== External links ==
 
== External links ==

Revision as of 08:48, 30 September 2009

foma is a finite-state toolkit that implements Xerox lexc and xfst. It can be used for building morphologies of natural languages.

Installation

Note: foma requires libreadline to be installed, on Debian or Ubuntu use apt-get install libreadline5-dev

  • Download the .tar.gz source from the website.
  • Untar
  • Run make

If you get an error Makefile:12: *** missing separator. Stop., edit the Makefile and add \ to the end of the lines 11--13.

  • This will create a binary foma, which should be copied into your PATH.

Example usage

First check out the Greenlandic (kal) morphology from Giellatekno SVN:

$ svn co https://victorio.uit.no/langtech/trunk/st/kal

Move to the src/ directory and combine all the lexc source files:

$ cat kal-lex.txt \
abbr-kal-lex.txt acro-kal-lex.txt \
noun-kal-lex.txt verb-kal-lex.txt \
ateq-kal-lex.txt ateq-kal-morph.txt \
punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc

Next, remove the comments from the xfst rewrite rule file:

$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp

Compile the xfst code as follows, run foma and load the rewrite rules:

foma[0]: source xfst-kal.tmp 
Opening file 'xfst-kal.tmp'.
defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths.
defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths.
...
6.1 MB. 12474 states, 402541 arcs, Cyclic.
foma[1]: 

Note the [1], if you don't get this something has gone wrong.

Next, save the compiled transducer and quit:

foma[1]: save stack xfst-kal.bin
Writing to file xfst-kal.bin.
foma[1]: quit

Now we compile the lexc file,

External links