Bytecode for transfer

A concrete example: Esperanto-English

Take a look at apertium-eo-en.eo-en.t1x and compare with the Java version [http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/transfer/generated/apertium_eo_en_eo_en_t1x.java?view=markup apertium_eo_en_eo_en_t1x.java]. This is compiled into bytecode and executed with the Java JIT (Just-in-time) compiler.

Parsing /home/j/esperanto/apertium-svn/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x
// WARNING: Attribute a_np_acr is not defined. Valid attributes are: [a_nom, a_prp, a_adv, a_adj, a_vrb, a_vrb2, a_det, a_ord, a_prn, a_tns, a_nepersonaj_tempoj, a_gen, a_prs, a_nbr, a_cas, lem, lemq, lemh, whole, tags, chname, chcontent, content]
// Replacing with error_UNKNOWN_ATTR - for <transfer default="chunk">/<section-def-macros>/<def-macro n="firstWord" npar="1">/<choose>/<when>/<test>/<equal>/<clip part="a_np_acr" pos="1" side="sl">
Compiling: javac -cp dist/lttoolbox.jar transfertest/res/lttoolbox-java/testdata/transfer/apertium_eo_en_eo_en_t1x.java

Here is a speed comparison:

Interpreted transfer took 91.59 secs
bytecode compiled transfer took 15.88 secs
Speedup factor: 5.76

Further work

The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java).
Memory usage is also higher than really needed. I.a.
The underlying library, lttoolbox-java, is using 50% of the CPU, and there are some well known performance issues which are fixable
The bytecode should be pulled thru an optimizer, like Soot
There is a zillion of Open Source Java bytecode interpreters to choose from, most prominent Sun's own and http://kaffe.org. Only Sun's have been tested. At least GCJ should be tried out.
A step for post-compiling to native code should be tried out.
With http://xmlvm.org/ there could be a way for iPhones as well
Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. Only the tagger is missing for a full system.

Bytecode for transfer

Contents

A concrete example: Esperanto-English

Further work

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools