Difference between revisions of "Hunmorph"
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
'''hunmorph''' is an set of programs for making morphological analysers and generators. Analysers and generators made with these tools could be integrated into an Apertium-based machine translation system, although they would need to be hacked to change the output / input format. |
'''hunmorph''' is an set of programs for making morphological analysers and generators written largely in Ocaml. Analysers and generators made with these tools could be integrated into an Apertium-based machine translation system, although they would need to be hacked to change the output / input format. Currently it only seems to support Hungarian. |
||
==Requirements== |
==Requirements== |
||
Line 75: | Line 75: | ||
==External links== |
==External links== |
||
* http://mokk.bme.hu/resources/hunmorph |
* [http://mokk.bme.hu/resources/hunmorph Hunmorph - open source morphological analyzer] |
||
[[Category:Tools]] |
[[Category:Tools]] |
Revision as of 19:56, 31 March 2008
hunmorph is an set of programs for making morphological analysers and generators written largely in Ocaml. Analysers and generators made with these tools could be integrated into an Apertium-based machine translation system, although they would need to be hacked to change the output / input format. Currently it only seems to support Hungarian.
Requirements
On Debian you will need:
- ocaml
- ocaml-libs
- ocaml-tools
- ocaml-compiler-libs
Compiling
cvs -d :pserver:anonymous:anonymous@cvs.mokk.bme.hu:/local/cvs co ocamorph cd ocamorph ./build.sh build cd src/lib make cd ../bindings/c make cd ../../wrappers/ocamorph make
If you get the error, /usr/bin/ld: cannot find -lunix
, then check the Makefile and the include -I
paths, probably they don't point to the right place. On Debian I had to change the /usr/lib/ocaml/3.09.1
for /usr/lib/ocaml/3.10.1
. After you've compiled this you should have an ocamorph binary. Now go back to the root of your CVS tree.
You can test ocamorph with the binary distribution available here. The CVS distribution does not seem to build at the moment. If you untar the file in ~/source/
you should see:
$ ls ~/source/morphdb.hu/ AUTHORS CVS doc LICENCE morphdb_hu.aff morphdb_hu.dic README
You can then test it with:
$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic > programot program/NOUN<CAS<ACC>>
Performance
For a 10,000 line test file, with a analyser with support for 4,000,000 word forms.
$ time cat /tmp/test | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic > /dev/null real 0m47.224s user 0m41.859s sys 0m0.620s
Compile the lexicon using:
$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic --bin hu.morph.bin
You seem to be required to attempt to analyse something in order to compile. Then re-test:
$ time cat /tmp/test | ocamorph --bin hu.morph.bin > /dev/null real 0m15.023s user 0m14.625s sys 0m0.344s
Final size of the compiled binary is 22Mb.
Further reading
- Trón, V., Németh, L., Halácsy, P., Kornai, A., Gyepesi, G., and Varga, D. (2005) "Hunmorph: open source word analysis". Proceedings of the ACL 2005 Workshop on Software. pp. 77--85