Difference between revisions of "Hunmorph"
Line 44: | Line 44: | ||
<pre> |
<pre> |
||
$ time cat /tmp/test | |
$ time cat /tmp/test | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic > /dev/null |
||
real 0m47.224s |
real 0m47.224s |
||
user 0m41.859s |
user 0m41.859s |
||
Line 52: | Line 52: | ||
Compile the lexicon using: |
Compile the lexicon using: |
||
<pre> |
<pre> |
||
$ |
$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic --bin hu.morph.bin |
||
</pre> |
</pre> |
||
You seem to be required to attempt to analyse something in order to compile. Then re-test: |
|||
Then re-test: |
|||
<pre> |
<pre> |
||
$ time cat /tmp/test | |
$ time cat /tmp/test | ocamorph --bin hu.morph.bin > /dev/null |
||
real 0m15.023s |
real 0m15.023s |
||
user 0m14.625s |
user 0m14.625s |
Revision as of 17:56, 31 March 2008
hunmorph is an set of programs for making morphological analysers and generators.
Requirements
You will need:
- ocaml
- ocaml-libs
Compiling
cd ocamorph ./build.sh build cd src/lib make cd ../bindings/c make cd ../../wrappers/ocamorph make
If you get the error, /usr/bin/ld: cannot find -lunix
, then check the Makefile and the include -I
paths, probably they don't point to the right place. On Debian I had to change the /usr/lib/ocaml/3.09.1
for /usr/lib/ocaml/3.10.1
. After you've compiled this you should have an ocamorph binary. Now go back to the root of your CVS tree.
You can test ocamorph with the binary distribution available here. The CVS distribution does not seem to build at the moment. If you untar the file in ~/source/
you should see:
$ ls ~/source/morphdb.hu/ AUTHORS CVS doc LICENCE morphdb_hu.aff morphdb_hu.dic README
You can then test it with:
$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic > programot program/NOUN<CAS<ACC>>
Performance
For a 10,000 line test file, with a analyser with support for 4,000,000 word forms.
$ time cat /tmp/test | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic > /dev/null real 0m47.224s user 0m41.859s sys 0m0.620s
Compile the lexicon using:
$ echo "programot" | ocamorph --aff ~/source/morphdb.hu/morphdb_hu.aff --dic ~/source/morphdb.hu/morphdb_hu.dic --bin hu.morph.bin
You seem to be required to attempt to analyse something in order to compile. Then re-test:
$ time cat /tmp/test | ocamorph --bin hu.morph.bin > /dev/null real 0m15.023s user 0m14.625s sys 0m0.344s
Final size of the compiled binary is 22Mb.