Difference between revisions of "Bytecode for transfer"
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
Currently transfer is the bottleneck in Apertium, processing here takes 95% CPU. |
|||
This is because the transfer file is being interpreted (tree walking of the XML in the transfer t1x file) instead of being compiled into machine code. |
|||
The Java transfer bytecode compiler converts arbitrarily complex transfer files into Java source code, which is then compiled into platform-indepent bytecode. |
|||
During transfer the Java Virtual Machine will convert the most used part (the 'hot spots') into machine code. |
|||
So |
|||
http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x?view=markup |
|||
becomes |
|||
http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/transfer/generated/apertium_eo_en_eo_en_t1x.java?view=markup |
|||
which is compiled into Java bytecode and executed with the Java JIT (Just-in-time) compiler. |
|||
Here is a speed test on a corpus: |
|||
j@j-laptop-nova:~/esperanto/a/lttoolbox-java/testdata/transfer$ time apertium-transfer apertium-eo-en.en-eo.t1x en-eo.t1x.bin en-eo.autobil.bin < transferinput-en-eo.t1x.txt > x |
|||
real 1m9.563s |
|||
user 1m9.264s |
|||
sys 0m0.284s |
|||
j@j-laptop-nova:~/esperanto/a/lttoolbox-java$ time java -cp dist/lttoolbox.jar org.apertium.transfer.Transfer |
|||
real 0m17.505s |
|||
user 0m18.909s |
|||
sys 0m0.268s |
|||
Some minor bugs remain, but all in all, expect that transfer will run at least 4 times as fast in a couple of weeks. |
|||
The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java :-) |
|||
More info: |
|||
http://apertium.svn.sourceforge.net/viewvc/apertium?view=rev&revision=19218 |
|||
= Old contents = |
|||
Adapt transfer to use bytecode instead of tree walking. This task would be write a compiler and interpreter for Apertium transfer rules into the format of an an off-the-shelf bytecode engine (e.g. Java, v8, kjs, ...). |
Adapt transfer to use bytecode instead of tree walking. This task would be write a compiler and interpreter for Apertium transfer rules into the format of an an off-the-shelf bytecode engine (e.g. Java, v8, kjs, ...). |
||
Revision as of 23:28, 27 February 2010
Currently transfer is the bottleneck in Apertium, processing here takes 95% CPU. This is because the transfer file is being interpreted (tree walking of the XML in the transfer t1x file) instead of being compiled into machine code.
The Java transfer bytecode compiler converts arbitrarily complex transfer files into Java source code, which is then compiled into platform-indepent bytecode.
During transfer the Java Virtual Machine will convert the most used part (the 'hot spots') into machine code.
So http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/testdata/transfer/apertium-eo-en.eo-en.t1x?view=markup becomes http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/src/org/apertium/transfer/generated/apertium_eo_en_eo_en_t1x.java?view=markup
which is compiled into Java bytecode and executed with the Java JIT (Just-in-time) compiler.
Here is a speed test on a corpus:
j@j-laptop-nova:~/esperanto/a/lttoolbox-java/testdata/transfer$ time apertium-transfer apertium-eo-en.en-eo.t1x en-eo.t1x.bin en-eo.autobil.bin < transferinput-en-eo.t1x.txt > x real 1m9.563s user 1m9.264s sys 0m0.284s
j@j-laptop-nova:~/esperanto/a/lttoolbox-java$ time java -cp dist/lttoolbox.jar org.apertium.transfer.Transfer
real 0m17.505s
user 0m18.909s
sys 0m0.268s
Some minor bugs remain, but all in all, expect that transfer will run at least 4 times as fast in a couple of weeks.
The Java code have not been optimized for speed, so perhaps the real potential speedup is 6-8, or even a higher factor, if using a mixed mode (mixing C and Java code instead of doing pure-Java :-)
More info: http://apertium.svn.sourceforge.net/viewvc/apertium?view=rev&revision=19218
Old contents
Adapt transfer to use bytecode instead of tree walking. This task would be write a compiler and interpreter for Apertium transfer rules into the format of an an off-the-shelf bytecode engine (e.g. Java, v8, kjs, ...).
This page is to list ideas and their pros and cons.
Prototype
Java bytecode
- There is a zillion of Open Source Java bytecode interpreters to choose from, most prominent Sun's own and http://kaffe.org.
- Theres is also a lot of more or less easy-to-use [Java bytecode generators].
- Considering that lttoolbox is on its way to being ported to Java. If Java bytecode was chosen this might eventually make Apertium run on J2ME devices (only the tagger is missing for a full system).
<jacobEo> spectie: jimregan I don't know, but I suppose that Java byte would run fastest, as there have been extremely much work on optimize its speed, on different platforms.... <jacobEo> spectie: jimregan Also think in terms of some day get Apertium on a mobile phone.... then transfer in Java bytecode would be the easiest thing. But if we don't at least also do Java bytecode, then we would have to write a (non-Java) bytecode executor in J2ME.... pheh.... <jacobEo> spectie: jimregan Actually, if we get lttoolbox-java to work AND have Java bytecode for transfer, then we instantly HAVE apertium running on phones! And also on Windows, many Unix variants, web pages, whatever can run Java bytecode.
Javascript bytecode
A Javascript engine.