Difference between revisions of "Bytecode for transfer"

Revision as of 00:24, 28 February 2010

A concrete example: Esperanto-English

Take a look at apertium-eo-en.eo-en.t1x and apertium_eo_en_eo_en_t1x.java (the same file converted into Java). The Java version is compiled into bytecode and executed with the Java JVM and JIT (Just-in-time) compiler which converts it into machine code during run-time.

Parsing /home/j/esperanto/apertium-svn/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x
// WARNING: Attribute a_np_acr is not defined. Valid attributes are: [a_nom, a_prp, a_adv, a_adj, a_vrb, a_vrb2, a_det, a_ord, a_prn, a_tns, a_nepersonaj_tempoj, a_gen, a_prs, a_nbr, a_cas, lem, lemq, lemh, whole, tags, chname, chcontent, content]
// Replacing with error_UNKNOWN_ATTR - for <transfer default="chunk">/<section-def-macros>/<def-macro n="firstWord" npar="1">/<choose>/<when>/<test>/<equal>/<clip part="a_np_acr" pos="1" side="sl">
Compiling: javac -cp dist/lttoolbox.jar transfertest/res/lttoolbox-java/testdata/transfer/apertium_eo_en_eo_en_t1x.java

Here is a speed comparison:

Interpreted transfer took 91.59 secs
bytecode compiled transfer took 15.88 secs
Speedup factor: 5.76

Using it in a language pair

Add an entry to modes.xml where you replace "apertium-transfer" with "apertium-transfer-j" and use the .class file instead if the .t1x file.

For example, replace

     <program name="apertium-transfer">
       <file name="apertium-eo-en.eo-en.t1x"/>
       <file name="eo-en.t1x.bin"/>
       <file name="eo-en.autobil.bin"/>
     </program>

with

     <program name="apertium-transfer-j">
       <file name="apertium_eo_en_eo_en_t1x.class"/>
       <file name="eo-en.t1x.bin"/>
       <file name="eo-en.autobil.bin"/>
     </program>

Also add apertium-preprocess-transfer-bytecode-j to Makefile.am, or do it manually:

$ apertium-preprocess-transfer-bytecode-j apertium-eo-en.eo-en.t1x apertium_eo_en_eo_en_t1x.class

Further work

The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java).
Memory usage is also higher than really needed. I.a.
The underlying library, lttoolbox-java, is using 50% of the CPU, and there are some well known performance issues which are fixable
The bytecode should be pulled thru an optimizer, like Soot
There is a zillion of Open Source Java bytecode interpreters to choose from, most prominent Sun's own and http://kaffe.org. Only Sun's have been tested. At least GCJ should be tried out.
A step for post-compiling to native code should be tried out.
With http://xmlvm.org/ there could be a way for iPhones as well
Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. Only the tagger is missing for a full system.

Difference between revisions of "Bytecode for transfer"

Revision as of 00:24, 28 February 2010

Contents

A concrete example: Esperanto-English

Using it in a language pair

Further work

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 15: / Line 15: @@
 == A concrete example: Esperanto-English ==
 Take a look at
-[http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x?view=markup apertium-eo-en.eo-en.t1x]
+[http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x?view=markup apertium-eo-en.eo-en.t1x] and [http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/transfer/generated/apertium_eo_en_eo_en_t1x.java?view=markup apertium_eo_en_eo_en_t1x.java] (the same file converted into Java).
+The Java version is compiled into bytecode and executed with the Java JVM and JIT (Just-in-time) compiler which converts it into machine code during run-time.
-and compare with the Java version [http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/transfer/generated/apertium_eo_en_eo_en_t1x.java?view=markup
-apertium_eo_en_eo_en_t1x.java].
-This is compiled into bytecode and executed with the Java JIT (Just-in-time) compiler.
 <pre>
@@ Line 33: / Line 31: @@
 Speedup factor: 5.76
 </pre>
+== Using it in a language pair ==
+Add an entry to modes.xml where you replace "apertium-transfer" with "apertium-transfer-j" and use the .class file instead if the .t1x file.
+For example, replace
+      <program name="apertium-transfer">
+        <file name="apertium-eo-en.eo-en.t1x"/>
+        <file name="eo-en.t1x.bin"/>
+        <file name="eo-en.autobil.bin"/>
+      </program>
+with
+      <program name="apertium-transfer-j">
+        <file name="apertium_eo_en_eo_en_t1x.class"/>
+        <file name="eo-en.t1x.bin"/>
+        <file name="eo-en.autobil.bin"/>
+      </program>
+Also add apertium-preprocess-transfer-bytecode-j to Makefile.am, or do it manually:
+ $ apertium-preprocess-transfer-bytecode-j apertium-eo-en.eo-en.t1x apertium_eo_en_eo_en_t1x.class
 == Further work ==