Difference between revisions of "Foma"

Latest revision as of 08:54, 23 September 2022

Installation[edit]

Note: foma requires libreadline to be installed, on Debian or Ubuntu use apt-get install libreadline-dev

Note: foma requires zlib1g-dev to be installed, on Debian use apt-get install zlib1g-dev

wget http://dingo.sbs.arizona.edu/~mhulden/foma-0.9.15alpha.tar.gz
tar -xzvf foma-0.9.15alpha.tar.gz
cd foma
make
sudo make install

or, from svn:

svn checkout http://foma.googlecode.com/svn/trunk/foma/ foma
cd foma
#run if you do not run with sudo: sed -i.tmp "s%prefix = /usr/local%prefix = $PREFIX%" Makefile
make
sudo make install

Installation troubleshooting[edit]

If you get an error about -fPIC (happens on Arch Linux), do:

make clean
make CFLAGS=-fPIC
sudo make install

If you get an error like

/usr/bin/ld: cannot find -ltermcap
collect2: ld returned 1 exit status
make: *** [libfoma] Error 1

when running make, open the Makefile and change the -ltermcap to -lncurses (happens on Arch Linux and OpenSUSE).

If you get an error Makefile:12: *** missing separator. Stop., edit the Makefile and add \ to the end of the lines 11--13.

If you get an error like this (I got it running ubuntu 11.10):

/usr/bin/ld: int_stack.o: relocation R_X86_64_32S against `.bss' can not be used when making a shared object; recompile with -fPIC
int_stack.o: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [libfoma] Error 1

edit the Makefile and change a line that looks like this

CFLAGS = -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden

to this

CFLAGS = -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden -fPIC

Example usage[edit]

First check out the Greenlandic (kal) morphology from Giellatekno SVN:

$ svn co https://victorio.uit.no/langtech/trunk/st/kal

Move to the src/ directory and combine all the lexc source files:

$ cat kal-lex.txt \
abbr-kal-lex.txt acro-kal-lex.txt \
noun-kal-lex.txt verb-kal-lex.txt \
ateq-kal-lex.txt ateq-kal-morph.txt \
punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc

Next, remove the comments from the xfst rewrite rule file:

$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp

Compile the xfst code as follows, run foma and load the rewrite rules:

foma[0]: source xfst-kal.tmp 
Opening file 'xfst-kal.tmp'.
defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths.
defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths.
...
6.1 MB. 12474 states, 402541 arcs, Cyclic.
foma[1]:

Note the [1], if you don't get this something has gone wrong.

Next, save the compiled transducer and quit:

foma[1]: save stack xfst-kal.bin
Writing to file xfst-kal.bin.
foma[1]: quit

Now we compile the lexc file and save the resulting transducer and quit:

$ foma
foma[0]: read lexc kal-lex-all.lexc
Root...8, Z1Zmorf...59, Z1SZmorf...56, Z1PZmorf...59, Z1+ssZmorf...59, ...
Building lexicon...Determinizing...Minimizing...Done!
85.5 MB. 154826 states, 5599566 arcs, Cyclic.
foma[1]: save stack kal-lex.save
Writing to file kal-lex.save.
foma[1]: quit
<pre>

The final step is to compose the two transducers (the lexicon and the rewrite rules),

<pre>
$ foma
foma[0]: regex [[@"kal-lex.save"] .o. [[@"kal-lex.save"].l .o. [@"xfst-kal.bin"]] ] ;

This final step takes some time, up to 2—3 minutes. It also takes a lot of processing power and RAM. The final result will be:

76.4 MB. 160041 states, 5002206 arcs, Cyclic.
foma[1]:

Then save the final transducer, and quit:

foma[1]: save stack kal.morph.bin
Writing to file kal.morph.bin.
foma[1]: quit

You can now use the transducer for analysis and generation, for example,

$ foma
foma[0]: load kal.morph.bin
76.4 MB. 160041 states, 5002206 arcs, Cyclic.
foma[1]: apply up nittartagaq
nittar+TAR+vv+TAQ+N+Abs+Sg
foma[1]: apply up kalaallisut
kalaaleq+N+Aeq+Pl
kalaaleq+N+Aeq+Sg
foma[1]: apply down kalaaleq+N+Aeq+Sg
kalaallitut
kalaallisut

Visualising an Apertium transducer[edit]

$ lt-print no-en.autobil.bin > /tmp/no-en.txt

$ foma
foma[0]: read att /tmp/no-en.txt
foma[1]: view

Make sure you've install a .dot renderer for converting the file to PNG. On Ubuntu its done by:

$ sudo apt-get install graphviz

You could also put this script in a file `lt-view` and then `lt-view foo.automorf.bin >foo.png`:

#!/bin/sh

set -e -u

if ! command -V dot >/dev/null; then
    echo "Please install graphviz (e.g. apt install graphviz)" >&2
    exit 1
elif ! command -V foma >/dev/null; then
    echo "Please install foma (e.g. apt install foma)" >&2
    exit 1
elif [ $# -ne 1 ]; then
    echo "Expecting an lttoolbox binary as arg 1, no other args" >&2
    exit 1
elif [ -t 1 ]; then
    echo "This will write a png file – you should redirect, e.g. $* > fst.png" >&2
    exit 1
fi

tmpd=$(mktemp -dt lt-view.XXXXXXXXXXX)
trap 'rm -rf "${tmpd}"' EXIT

lt-print "$1" > "${tmpd}"/att

printf 'read att %s\nprint dot >%s\n' "${tmpd}"/att "${tmpd}"/dot | foma >/dev/null

dot -Tpng "${tmpd}"/dot

External links[edit]

http://foma.sourceforge.net/
Giellatekno SVN — here you can find some example morphologies in foma format.

Difference between revisions of "Foma"

Latest revision as of 08:54, 23 September 2022

Contents

Installation[edit]

Installation troubleshooting[edit]

Example usage[edit]

Visualising an Apertium transducer[edit]

External links[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
+{{TOCD}}
 '''foma''' is a finite-state toolkit that implements Xerox lexc and xfst. It can be used for building morphologies of natural languages.
 == Installation ==
-Note: foma requires <code>libreadline</code> to be installed, on Debian or Ubuntu use <code>apt-get install libreadline5-dev</code>
+Note: foma requires <code>libreadline</code> to be installed, on Debian or Ubuntu use <code>apt-get install libreadline-dev</code>
+Note: foma requires <code>zlib1g-dev</code> to be installed, on Debian use <code>apt-get install zlib1g-dev</code>
-* Download the .tar.gz source from the website.
-* Untar
+<pre>wget http://dingo.sbs.arizona.edu/~mhulden/foma-0.9.15alpha.tar.gz
-* Run <code>make</code>
+tar -xzvf foma-0.9.15alpha.tar.gz
+cd foma
+make
+sudo make install</pre>
+or, from svn:
+<pre>svn checkout http://foma.googlecode.com/svn/trunk/foma/ foma
+cd foma
+#run if you do not run with sudo: sed -i.tmp "s%prefix = /usr/local%prefix = $PREFIX%" Makefile
+make
+sudo make install</pre>
+===Installation troubleshooting===
+If you get an error about -fPIC (happens on Arch Linux), do:
+<pre>make clean
+make CFLAGS=-fPIC
+sudo make install</pre>
+If you get an error like <pre>/usr/bin/ld: cannot find -ltermcap
+collect2: ld returned 1 exit status
+make: *** [libfoma] Error 1</pre> when running make, open the Makefile and change the <code>-ltermcap</code> to <code>-lncurses</code> (happens on Arch Linux and OpenSUSE).
 If you get an error <code>Makefile:12: *** missing separator.  Stop.</code>, edit the Makefile and add <code>\</code> to the end of the lines 11--13.
+If you get an error like this (I got it running ubuntu 11.10):
-* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>.
+<pre>
+/usr/bin/ld: int_stack.o: relocation R_X86_64_32S against `.bss' can not be used when making a shared object; recompile with -fPIC
+int_stack.o: could not read symbols: Bad value
+collect2: ld returned 1 exit status
+make: *** [libfoma] Error 1
+</pre>
+edit the Makefile and change a line that looks like this
+<pre>CFLAGS = -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden</pre>
+to this
+<pre>CFLAGS = -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden -fPIC</pre>
+== Example usage ==
+First check out the Greenlandic (<code>kal</code>) morphology from Giellatekno SVN:
+<pre>
+$ svn co https://victorio.uit.no/langtech/trunk/st/kal
+</pre>
+Move to the <code>src/</code> directory and combine all the <code>lexc</code> source files:
+<pre>
+$ cat kal-lex.txt \
+abbr-kal-lex.txt acro-kal-lex.txt \
+noun-kal-lex.txt verb-kal-lex.txt \
+ateq-kal-lex.txt ateq-kal-morph.txt \
+punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc
+</pre>
+Next, remove the comments from the <code>xfst</code> rewrite rule file:
+<pre>
+$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp
+</pre>
+Compile the <code>xfst</code> code as follows, run foma and load the rewrite rules:
+<pre>
+foma[0]: source xfst-kal.tmp
+Opening file 'xfst-kal.tmp'.
+defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths.
+defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths.
+...
+.1 MB. 12474 states, 402541 arcs, Cyclic.
+foma[1]:
+</pre>
+Note the <code>[1]</code>, if you don't get this something has gone wrong.
+Next, save the compiled transducer and quit:
+<pre>
+foma[1]: save stack xfst-kal.bin
+Writing to file xfst-kal.bin.
+foma[1]: quit
+</pre>
+Now we compile the lexc file and save the resulting transducer and quit:
+<pre>
+$ foma
+foma[0]: read lexc kal-lex-all.lexc
+Root...8, Z1Zmorf...59, Z1SZmorf...56, Z1PZmorf...59, Z1+ssZmorf...59, ...
+Building lexicon...Determinizing...Minimizing...Done!
+.5 MB. 154826 states, 5599566 arcs, Cyclic.
+foma[1]: save stack kal-lex.save
+Writing to file kal-lex.save.
+foma[1]: quit
+<pre>
+The final step is to compose the two transducers (the lexicon and the rewrite rules),
+<pre>
+$ foma
+foma[0]: regex [[@"kal-lex.save"] .o. [[@"kal-lex.save"].l .o. [@"xfst-kal.bin"]] ] ;
+</pre>
+This final step takes some time, up to 2&mdash;3 minutes. It also takes a lot of processing power and RAM. The final result will be:
+<pre>
+.4 MB. 160041 states, 5002206 arcs, Cyclic.
+foma[1]:
+</pre>
+Then save the final transducer, and quit:
+<pre>
+foma[1]: save stack kal.morph.bin
+Writing to file kal.morph.bin.
+foma[1]: quit
+</pre>
+You can now use the transducer for analysis and generation, for example,
+<pre>
+$ foma
+foma[0]: load kal.morph.bin
+.4 MB. 160041 states, 5002206 arcs, Cyclic.
+foma[1]: apply up nittartagaq
+nittar+TAR+vv+TAQ+N+Abs+Sg
+foma[1]: apply up kalaallisut
+kalaaleq+N+Aeq+Pl
+kalaaleq+N+Aeq+Sg
+foma[1]: apply down kalaaleq+N+Aeq+Sg
+kalaallitut
+kalaallisut
+</pre>
+=== Visualising an Apertium transducer ===
+<pre>
+$ lt-print no-en.autobil.bin > /tmp/no-en.txt
+$ foma
+foma[0]: read att /tmp/no-en.txt
+foma[1]: view
+</pre>
+Make sure you've install a .dot renderer for converting the file to PNG. On Ubuntu its done by:
+<pre>
+$ sudo apt-get install graphviz
+</pre>
+You could also put this script in a file `lt-view` and then `lt-view foo.automorf.bin >foo.png`:
+<pre>
+#!/bin/sh
+set -e -u
+if ! command -V dot >/dev/null; then
+    echo "Please install graphviz (e.g. apt install graphviz)" >&2
+    exit 1
+elif ! command -V foma >/dev/null; then
+    echo "Please install foma (e.g. apt install foma)" >&2
+    exit 1
+elif [ $# -ne 1 ]; then
+    echo "Expecting an lttoolbox binary as arg 1, no other args" >&2
+    exit 1
+elif [ -t 1 ]; then
+    echo "This will write a png file – you should redirect, e.g. $* > fst.png" >&2
+    exit 1
+fi
+tmpd=$(mktemp -dt lt-view.XXXXXXXXXXX)
+trap 'rm -rf "${tmpd}"' EXIT
+lt-print "$1" > "${tmpd}"/att
+printf 'read att %s\nprint dot >%s\n' "${tmpd}"/att "${tmpd}"/dot | foma >/dev/null
+dot -Tpng "${tmpd}"/dot
+</pre>
 == External links ==
@@ Line 19: / Line 194: @@
-[[Category:Tools]]
+[[Category:Morphological analysers]]