Revision as of 23:59, 27 February 2010

A concrete example: Esperanto-English

So http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x?view=markup becomes http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/transfer/generated/apertium_eo_en_eo_en_t1x.java?view=markup

which is compiled into Java bytecode and executed with the Java JIT (Just-in-time) compiler.

Parsing /home/j/esperanto/apertium-svn/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x
// WARNING: Attribute a_np_acr is not defined. Valid attributes are: [a_nom, a_prp, a_adv, a_adj, a_vrb, a_vrb2, a_det, a_ord, a_prn, a_tns, a_nepersonaj_tempoj, a_gen, a_prs, a_nbr, a_cas, lem, lemq, lemh, whole, tags, chname, chcontent, content]
// Replacing with error_UNKNOWN_ATTR - for <transfer default="chunk">/<section-def-macros>/<def-macro n="firstWord" npar="1">/<choose>/<when>/<test>/<equal>/<clip part="a_np_acr" pos="1" side="sl">
Compiling: javac -cp dist/lttoolbox.jar transfertest/res/lttoolbox-java/testdata/transfer/apertium_eo_en_eo_en_t1x.java

Here is a speed comparison:

Interpreted transfer took 91.59 secs
bytecode compiled transfer took 15.88 secs
Speedup factor: 5.76

Further work

The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java).
Memory usage is also higher than really needed. I.a.
The underlying library, lttoolbox-java, is using 50% of the CPU, and there are some well known performance issues which are fixable
The bytecode should be pulled thru an optimizer, like Soot
Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. Only the tagger is missing for a full system.

Old contents

Adapt transfer to use bytecode instead of tree walking. This task would be write a compiler and interpreter for Apertium transfer rules into the format of an an off-the-shelf bytecode engine (e.g. Java, v8, kjs, ...).

This page is to list ideas and their pros and cons.

Java bytecode

There is a zillion of Open Source Java bytecode interpreters to choose from, most prominent Sun's own and http://kaffe.org.

Theres is also a lot of more or less easy-to-use [Java bytecode generators].

Considering that lttoolbox is on its way to being ported to Java. If Java bytecode was chosen this might eventually make Apertium run on J2ME devices (only the tagger is missing for a full system).

<jacobEo> spectie: jimregan I don't know, but I suppose that Java byte would run fastest, as there have been extremely 
  much work on optimize its speed, on different platforms....
<jacobEo> spectie: jimregan Also think in terms of some day get Apertium on a mobile phone.... then transfer in Java 
  bytecode would be the easiest thing. But if we don't at least also do Java bytecode, then we would have to write 
  a (non-Java) bytecode executor in J2ME.... pheh....
<jacobEo> spectie: jimregan Actually, if we get lttoolbox-java to work AND have Java bytecode for transfer, then we 
  instantly HAVE apertium running on phones! And also on Windows, many Unix variants, web pages, whatever can run Java bytecode.

Javascript bytecode

A Javascript engine.

External links

Google v8: Embedder's documentation

@@ Line 37: / Line 37: @@
 == Further work ==
 * The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java).
-* Memory usage is also higher than really needed.
+* Memory usage is also higher than really needed. I.a.
-* The underlying library, [[lttoolbox-java]], is using 50% of the CPU, and there are some well known shortcomings which is easily fixable
+* The underlying library, [[lttoolbox-java]], is using 50% of the CPU, and there are some well known performance issues which are fixable
 * The bytecode should be pulled thru an optimizer, like [http://www.sable.mcgill.ca/soot/tutorial/optimizer/index.html Soot]
 * Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. Only the tagger is missing for a full system.
 = Old contents =

Difference between revisions of "Bytecode for transfer"

Revision as of 23:59, 27 February 2010

Contents

A concrete example: Esperanto-English

Further work

Old contents

Java bytecode

Javascript bytecode

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools