Difference between revisions of "Hunmorph"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
{{TOCD}}
{{TOCD}}
'''hunmorph''' is an set of programs for making morphological analysers and generators. Analysers and generators made with these tools could be integrated into an Apertium-based machine translation system.
'''hunmorph''' is an set of programs for making morphological analysers and generators. Analysers and generators made with these tools could be integrated into an Apertium-based machine translation system, although they would need to be hacked to change the output / input format.


==Requirements==
==Requirements==

Revision as of 18:19, 31 March 2008

hunmorph is an set of programs for making morphological analysers and generators. Analysers and generators made with these tools could be integrated into an Apertium-based machine translation system, although they would need to be hacked to change the output / input format.

Requirements

On Debian you will need:

  • ocaml
  • ocaml-libs
  • ocaml-tools
  • ocaml-compiler-libs

Compiling

cvs -d :pserver:anonymous:anonymous@cvs.mokk.bme.hu:/local/cvs co ocamorph
cd ocamorph
./build.sh build
cd src/lib
make
cd ../bindings/c
make
cd ../../wrappers/ocamorph
make

If you get the error, /usr/bin/ld: cannot find -lunix, then check the Makefile and the include -I paths, probably they don't point to the right place. On Debian I had to change the /usr/lib/ocaml/3.09.1 for /usr/lib/ocaml/3.10.1. After you've compiled this you should have an ocamorph binary. Now go back to the root of your CVS tree.

You can test ocamorph with the binary distribution available here. The CVS distribution does not seem to build at the moment. If you untar the file in ~/source/ you should see:

$ ls ~/source/morphdb.hu/
AUTHORS  CVS  doc  LICENCE  morphdb_hu.aff  morphdb_hu.dic  README

You can then test it with:

$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic
> programot
program/NOUN<CAS<ACC>>

Performance

For a 10,000 line test file, with a analyser with support for 4,000,000 word forms.

$ time cat /tmp/test | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic > /dev/null
real    0m47.224s
user    0m41.859s
sys     0m0.620s

Compile the lexicon using:

$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic --bin hu.morph.bin

You seem to be required to attempt to analyse something in order to compile. Then re-test:

$ time cat /tmp/test | ocamorph  --bin hu.morph.bin > /dev/null
real    0m15.023s
user    0m14.625s
sys     0m0.344s

Final size of the compiled binary is 22Mb.

Further reading

  • Trón, V., Németh, L., Halácsy, P., Kornai, A., Gyepesi, G., and Varga, D. (2005) "Hunmorph: open source word analysis". Proceedings of the ACL 2005 Workshop on Software. pp. 77--85

External links