Difference between revisions of "Foma"
Line 12: | Line 12: | ||
* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>. |
* This will create a binary <code>foma</code>, which should be copied into your <code>PATH</code>. |
||
+ | |||
+ | == Example usage == |
||
+ | |||
+ | First check out the Greenlandic (<code>kal</code>) morphology from Giellatekno SVN: |
||
+ | |||
+ | <pre> |
||
+ | $ svn co https://victorio.uit.no/langtech/trunk/st/kal |
||
+ | </pre> |
||
+ | |||
+ | Move to the <code>src/</code> directory and combine all the <code>lexc</code> source files: |
||
+ | |||
+ | <pre> |
||
+ | $ cat kal-lex.txt \ |
||
+ | abbr-kal-lex.txt acro-kal-lex.txt \ |
||
+ | noun-kal-lex.txt verb-kal-lex.txt \ |
||
+ | ateq-kal-lex.txt ateq-kal-morph.txt \ |
||
+ | punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc |
||
+ | </pre> |
||
+ | |||
+ | Next, remove the comments from the <code>xfst</code> rewrite rule file: |
||
+ | |||
+ | <pre> |
||
+ | $ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp |
||
+ | </pre> |
||
+ | |||
+ | Compile the <code>xfst</code> code as follows, run foma and load the rewrite rules: |
||
+ | |||
+ | <pre> |
||
+ | foma[0]: source xfst-kal.tmp |
||
+ | Opening file 'xfst-kal.tmp'. |
||
+ | defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths. |
||
+ | defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths. |
||
+ | ... |
||
+ | 6.1 MB. 12474 states, 402541 arcs, Cyclic. |
||
+ | foma[1]: |
||
+ | </pre> |
||
+ | |||
+ | Note the <code>[1]</code>, if you don't get this something has gone wrong. |
||
+ | |||
+ | Next, save the compiled transducer and quit: |
||
+ | |||
+ | <pre> |
||
+ | foma[1]: save stack xfst-kal.bin |
||
+ | Writing to file xfst-kal.bin. |
||
+ | foma[1]: quit |
||
+ | </pre> |
||
+ | |||
+ | Now we compile the lexc file, |
||
== External links == |
== External links == |
Revision as of 08:48, 30 September 2009
foma is a finite-state toolkit that implements Xerox lexc and xfst. It can be used for building morphologies of natural languages.
Installation
Note: foma requires libreadline
to be installed, on Debian or Ubuntu use apt-get install libreadline5-dev
- Download the .tar.gz source from the website.
- Untar
- Run
make
If you get an error Makefile:12: *** missing separator. Stop.
, edit the Makefile and add \
to the end of the lines 11--13.
- This will create a binary
foma
, which should be copied into yourPATH
.
Example usage
First check out the Greenlandic (kal
) morphology from Giellatekno SVN:
$ svn co https://victorio.uit.no/langtech/trunk/st/kal
Move to the src/
directory and combine all the lexc
source files:
$ cat kal-lex.txt \ abbr-kal-lex.txt acro-kal-lex.txt \ noun-kal-lex.txt verb-kal-lex.txt \ ateq-kal-lex.txt ateq-kal-morph.txt \ punct-kal-lex.txt prt-kal-lex.txt num-kal-lex.txt > kal-lex-all.lexc
Next, remove the comments from the xfst
rewrite rule file:
$ cat xfst-kal.txt | sed 's/\s\!.*$/ /g' | grep -v '^!' | sed 's/$/ /g' | grep -v 'echo' > xfst-kal.tmp
Compile the xfst
code as follows, run foma and load the rewrite rules:
foma[0]: source xfst-kal.tmp Opening file 'xfst-kal.tmp'. defined Vow: 348 bytes. 2 states, 6 arcs, 6 paths. defined Cns: 741 bytes. 2 states, 19 arcs, 19 paths. ... 6.1 MB. 12474 states, 402541 arcs, Cyclic. foma[1]:
Note the [1]
, if you don't get this something has gone wrong.
Next, save the compiled transducer and quit:
foma[1]: save stack xfst-kal.bin Writing to file xfst-kal.bin. foma[1]: quit
Now we compile the lexc file,
External links
- http://foma.sourceforge.net/
- Giellatekno SVN — here you can find some example morphologies in foma format.