Lttoolbox-java

From Apertium
Revision as of 15:54, 30 March 2009 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Nic Cottrell contributed a Java port of lttoolbox but it needs work to finish it.

You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both.

You only have to understand what lt-expand, lt-comp and lt-proc does with a .dix file.

What is lttoolbox

lttoolbox are 1) making binary files out of the .dix files (lt-comp), 2) analysing or generating text (lt-proc) and 3) expanding a .dix file (lt-expand).

Download preferably via SVN. It it fails, try [1] and [2] ("Download GNU tarball" will give a compressed archive)

Pls compile lttoolbox and apertium and a language pair of your choice. Then you have the setup needed to understand the role of lt-toolbox.


Required

  • Binary compatibility with lttoolbox (input and output files should be the same)
  • a test suite which runs on both lttoolbox (C++) and lttoolbox-java
  • lttoolbox-java needs to at least be able to _read_ the binary files (see 2) abobe: analysing or generating text (lt-proc))


Problems

  • Right now we have a line-for-line port of the C++ code of lttoolbox in apertium-tools/lttoolbox-java. It's NOT working.
  • it's amost line for line identical to the C++, aside from Java/C++ differences.

But the languages are different. C++ for example has some methods where some simple type variables are changed (the reference is passed) But in Java simple type variables can only be passed by value, and thus the caller's value is not changes. That sort of things needs to be sorted out.

  • The biggest problem is the XML handling: The C code's library callback calls a method in the code both when it meets a START and an END tag (for C++, we use libxml2).
    • The Java's XML library only calls the callback method at the START tag.
    • Perhaps we could find another Java XML library that could be made also call for the end tags. Or some kind of wrapper-inbetween thing could be made. Or you could use SAX and make your own callback thing.
  • There might be other problems. The project just got stranded on the XML parse part.


Other notes

<Drew_> jacobEo: I can't find a main class in the source code, am I looking in the wrong place? :S
<jacobEo> Drew_: LTComp.java, LTExpand.java, LTProc.java

[21:08:21] Jacob Nordfalk: So, Nic, how much time do you probably have the next months? Would you like to be a co-mentor on this, or would you like to just occasioanlly be informed about progress?
[21:08:58] Apertium Java-lttoolboc Nic Cottrell: Well, I would love to be a co-mentor, but I fear that I might not be able to give enough time to perform that role
[21:09:12] … But I would definitely like to be in the loop and can jump in to help when I can