Difference between revisions of "Bytecode for transfer"

Revision as of 12:01, 5 March 2012

Faster transfer (currently factor 5). As startup times are higher by 0.33 seconds this is only feasible when processing more than 100 sentences (2000 words).
Debuggable transfer. Using a Java development tool, for example Netbeans, you can step thru the transfer code line-by-line, inspecting variables and see exactly what is happening
Validating transfer files

A concrete example: Esperanto-English

Take a look at apertium-eo-en.eo-en.t1x and apertium_eo_en_eo_en_t1x.java (the same file converted into Java). The Java version is compiled into bytecode and executed with the Java JVM and JIT (Just-in-time) compiler which converts it into machine code during run-time.

Here is a speed comparison on a corpus (testdata/transfer/transferinput-en-eo.t1x.txt - 20000 sentences, 423215 words 7527866 bytes).

Interpreted transfer took 91.59 secs
bytecode compiled transfer took 15.88 secs
Speedup factor: 5.76

Using it

First, compile the t1x to bytecode:

$ apertium-preprocess-transfer-bytecode-j file.t1x file.class

Then replace 'apertium-transfer file.t1x' with 'apertium-transfer-j file.class'.

Using it in a language pair

Add an entry to modes.xml where you replace "apertium-transfer" with "apertium-transfer-j" and use the .class file instead of the .t1x file.

For example, replace

     <program name="apertium-transfer">
       <file name="apertium-eo-en.eo-en.t1x"/>
       <file name="eo-en.t1x.bin"/>
       <file name="eo-en.autobil.bin"/>
     </program>

with

     <program name="apertium-transfer-j">
       <file name="eo-en.t1x.class"/>
       <file name="eo-en.t1x.bin"/>
       <file name="eo-en.autobil.bin"/>
     </program>

Now you can compile manually with

$ apertium-preprocess-transfer-bytecode-j apertium-eo-en.eo-en.t1x eo-en.t1x.class

Adding it to your Makefile

You can also add optional support for bytecode compilation to Makefile.am:

Under the lines

$(PREFIX1).t1x.bin: $(BASENAME).$(PREFIX1).t1x
       apertium-validate-transfer $(BASENAME).$(PREFIX1).t1x
       apertium-preprocess-transfer $(BASENAME).$(PREFIX1).t1x $@

Add

       @if [ "`which apertium-preprocess-transfer-bytecode-j`" == "" ]; then echo && echo "NOTE: lttoolbox-java (used for bytecode accelerated transfer) is missing" && echo "      Therefore the following will fail (but it's OK)" && echo; fi
       -apertium-preprocess-transfer-bytecode-j $(BASENAME).$(PREFIX1).t1x $(PREFIX1).t1x.class

If lttoolbox-java isnt installed a warning is emitted and compilation continues (so things still work).

Remember to do the same for $(PREFIX2).

See http://apertium.svn.sourceforge.net/viewvc/apertium?view=rev&revision=21146 for a complete example of the changes.

Further work

The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java).
Memory usage is also higher than really needed. I.a.
The underlying library, lttoolbox-java, is using 50% of the CPU, and there are some well known performance issues which are fixable
The bytecode should be pulled thru an optimizer, like Soot
There is a zillion of Open Source Java bytecode interpreters to choose from, most prominent Sun's own and http://kaffe.org. Only Sun's have been tested. At least GCJ should be tried out.
A step for post-compiling to native code should be tried out.
With http://xmlvm.org/ there could be a way for iPhones as well
Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. ~~Only the tagger is missing for a full system.~~

@@ Line 78: / Line 78: @@
 * A step for post-compiling to native code should be tried out.
 * With http://xmlvm.org/ there could be a way for iPhones as well
-* Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. Only the tagger is missing for a full system.
+* Considering that we have a full port lttoolbox, Apertium could be made to run purely on Java, enabling a wide range of platforms, i.a. Windows, phones (J2ME or Android), web pages, server systems. <s>Only the tagger is missing for a full system.</s>
 [[Category:Development]]

Difference between revisions of "Bytecode for transfer"

Revision as of 12:01, 5 March 2012

Contents

A concrete example: Esperanto-English

Using it

Using it in a language pair

Adding it to your Makefile

Further work

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools