Difference between revisions of "Lttoolbox-java"
Jump to navigation
Jump to search
m (→Thanks) |
|||
Line 3: | Line 3: | ||
== What is lttoolbox == |
== What is lttoolbox == |
||
[[lttoolbox]] are 1) making binary files out of the .dix files (lt-comp), 2) analysing or generating text (lt-proc) and 3) expanding a .dix file (lt-expand). |
[[lttoolbox]] are 1) making binary files out of the .dix files (lt-comp), 2) analysing or generating text (lt-proc) and 3) expanding a .dix file (lt-expand). |
||
== Usage == |
|||
<pre> |
|||
$ java -jar dist/lttoolbox.jar |
|||
lttoolbox: is a toolbox for lexical processing, morphological analysis and generation of words |
|||
USAGE: java -jar dist/lttoolbox.jar [task] |
|||
Examples: |
|||
java -jar dist/lttoolbox.jar lt-expand dictionary.dix expands a dictionary |
|||
java -jar dist/lttoolbox.jar lt-comp lr dic.dix dic.bin compiles a dictionary |
|||
java -jar dist/lttoolbox.jar lt-proc dic.bin morphological analysis |
|||
</pre> |
|||
== Reasons for a Java port == |
== Reasons for a Java port == |
||
Line 10: | Line 21: | ||
* Transfer in bytecode has a promise of speedup factor 4 - compared to what we use now (interpreted XML). And transfer CPU usage is dominating when processing large amounts of text |
* Transfer in bytecode has a promise of speedup factor 4 - compared to what we use now (interpreted XML). And transfer CPU usage is dominating when processing large amounts of text |
||
== State |
== State of Java port == |
||
<pre> |
<pre> |
||
j@j-laptop-nova:~/esperanto/apertium/lttoolbox-java/testdata/regression$ ./compare_java_and_c.sh |
j@j-laptop-nova:~/esperanto/apertium/lttoolbox-java/testdata/regression$ ./compare_java_and_c.sh |
||
C analysis is... 0. |
C analysis is... 0.39sec |
||
OK |
OK |
||
Java analysis is... |
Java analysis is... 1.91sec |
||
OK |
OK |
||
C generator -g is ... 0. |
C generator -g is ... 0.33sec |
||
OK |
OK |
||
Java generator -g is ... |
Java generator -g is ... 1.26sec |
||
OK |
OK |
||
C generator -d is ... 0. |
C generator -d is ... 0.33sec |
||
OK |
OK |
||
Java generator -d is ... |
Java generator -d is ... 1.27sec |
||
OK |
OK |
||
C generator -n is ... 0. |
C generator -n is ... 0.33sec |
||
OK |
OK |
||
Java generator -n is ... |
Java generator -n is ... 1.25sec |
||
OK |
OK |
||
C postgenerator -p is ... 0.04sec |
C postgenerator -p is ... 0.04sec |
||
OK |
OK |
||
Java postgenerator -p is ... |
Java postgenerator -p is ... 0.72sec |
||
OK |
OK |
||
All tests passed |
All tests passed |
||
</pre> |
</pre> |
||
--[[User:Jacob Nordfalk|Jacob Nordfalk]] |
--[[User:Jacob Nordfalk|Jacob Nordfalk]] 08:52, 30 November 2009 (UTC) |
||
Revision as of 08:52, 30 November 2009
What is lttoolbox
lttoolbox are 1) making binary files out of the .dix files (lt-comp), 2) analysing or generating text (lt-proc) and 3) expanding a .dix file (lt-expand).
Usage
$ java -jar dist/lttoolbox.jar lttoolbox: is a toolbox for lexical processing, morphological analysis and generation of words USAGE: java -jar dist/lttoolbox.jar [task] Examples: java -jar dist/lttoolbox.jar lt-expand dictionary.dix expands a dictionary java -jar dist/lttoolbox.jar lt-comp lr dic.dix dic.bin compiles a dictionary java -jar dist/lttoolbox.jar lt-proc dic.bin morphological analysis
Reasons for a Java port
- There are several devices (mobile phones, for example) which can run quite complicated software, but only if written in Java. lttoolbox is the first step to having Apertium run on these devices.
- Windows port. It won't be as powerfull as Unix based system, but it will be there
- Apertium will be the first MT system *ever* that can be demonstradet within a Java applets
- Transfer in bytecode has a promise of speedup factor 4 - compared to what we use now (interpreted XML). And transfer CPU usage is dominating when processing large amounts of text
State of Java port
j@j-laptop-nova:~/esperanto/apertium/lttoolbox-java/testdata/regression$ ./compare_java_and_c.sh C analysis is... 0.39sec OK Java analysis is... 1.91sec OK C generator -g is ... 0.33sec OK Java generator -g is ... 1.26sec OK C generator -d is ... 0.33sec OK Java generator -d is ... 1.27sec OK C generator -n is ... 0.33sec OK Java generator -n is ... 1.25sec OK C postgenerator -p is ... 0.04sec OK Java postgenerator -p is ... 0.72sec OK All tests passed
--Jacob Nordfalk 08:52, 30 November 2009 (UTC)
Features
- Binary compatibility with lttoolbox. lttoolbox-java is able _read_ and _write_ the binary files lttoolbox and generates exactly the same output
- There is a comprehensive test suite that tests both lttoolbox (C++) and lttoolbox-java.
Other notes
<Drew_> jacobEo: I can't find a main class in the source code, am I looking in the wrong place? :S <jacobEo> Drew_: LTComp.java, LTExpand.java, LTProc.java
Thanks
- Nic Cottrell contributed an initial version of a Java port of lttoolbox.
- During GSOC2009 Raphaël and Sergio worked on it, but processing still didnt work (compilation and expansion worked)
- November 2009 Jacob Nordfalk finished it up and optimized it