Difference between revisions of "Lttoolbox-java"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
Notes (yes, I know its a mess... perhaps I will clean up later, or someone will be so kind to do it for me :-) [[User:Jacob Nordfalk|Jacob Nordfalk]] 10:29, 30 March 2009 (UTC)
 
Notes (yes, I know its a mess... perhaps I will clean up later, or someone will be so kind to do it for me :-) [[User:Jacob Nordfalk|Jacob Nordfalk]] 10:29, 30 March 2009 (UTC)
   
 
Nic Cottrell contributed a Java port of lttoolbox but it needs work to finish it.
  +
  +
== What is [[lttoolbox]] ==
 
<pre>
 
<pre>
<jimregan> Nic Cottrell contributed a Java port of lttoolbox
 
<jimregan> but it needs work to finish it
 
<jimregan> and a test suite, in both C++ and Java
 
<jacobEo> Its in in apertium-tools/lttoolbox-java
 
<jacobEo> What is in apertium-tools/lttoolbox-java right now is NOT working.
 
<jacobEo> It's a line-for-line port of the C++ code of lttoolbox
 
 
 
<jacobEo> and the great problem is the XML handling
 
 
 
<jimregan> it has to be binary compatible
 
<jimregan> and the test suite has to be in both C++ and Java, to ensure that
 
<jimregan> yeah, it's amost line for line identical to the C++, aside from Java/C++ differences
 
<jimregan> but, the binary stuff can be hard
 
<jacobEo> therefore jimregan its not that hard.
 
<jimregan> all you need is one bit in the wrong place, and it's useless
 
<jimregan> the compression stuff
 
<jacobEo> jimregan: The binary stuff is _probably_ easy, as you can debug the C++ and compare variables etc
 
<jimregan> for C++, we use libxml2
 
 
 
 
<jacobEo> Rah2: lttoolbox are making binary files out of the .dix files.
 
<jacobEo> Rah2: lttoolbox are making binary files out of the .dix files.
 
<jacobEo> Rah2: lttoolbox-java needs to at least be able to _read_ these binary files.
 
<jacobEo> Rah2: lttoolbox-java needs to at least be able to _read_ these binary files.
 
<jacobEo> Rah2: Pls compile lttoolbox and apertium and a language pair of your choice.
 
<jacobEo> Rah2: Pls compile lttoolbox and apertium and a language pair of your choice.
 
<jacobEo> Rah2: Then much more will be clear
 
<jacobEo> Rah2: Then much more will be clear
  +
</pre>
   
  +
Download preferably via [[SVN]]. It it fails:
 
  +
<pre>
<jacobEo> Rah2: You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both
 
<Drew_> jacobEo: What was the location of lttoolbox again?
 
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox/
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox/
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/lttoolbox-java/
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/lttoolbox-java/
 
<jacobEo> "Download GNU tarball" will give a compressed archive
 
<jacobEo> "Download GNU tarball" will give a compressed archive
  +
</pre>
   
  +
==Required==
  +
* Binary compatibility with lttoolbox (input and output files should be the same)
  +
* a test suite which runs on both lttoolbox (C++) and lttoolbox-java
   
<jacobEo> The problem, I think, is the XML handling: The C code's library callback calls a method in the code both when it meets a START and an END tag.
 
<jimregan> avinesh, maybe you should say it to spectie because I'm not interested in unicode -> wx
 
<jimregan> not for any reason
 
<jacobEo> the Java's XML library only calls the callback method at the START tag.
 
<jimregan> jacobEo, that will be necessary for chunk merging
 
<jimregan> we don't have it yet, but it will be necessary
 
<Leftmost> jimregan, I guess I'm a bit unclear as to what form the regression tests should take. Simply translations between ga and gd?
 
<jimregan> because when chcontent in t2 is written in chunk mode, it will be without { or }, otherwise with
 
<avinesh> ok got it
 
<jimregan> to fit the current model, that has to be a bool set and unset on entry/exit
 
<Drew_> jacobEo: Is it a big job to make it work with the END tag?
 
<avinesh> no wx right :D
 
<jimregan> that's it
 
<jimregan> avinesh, noone told me anudev is a course supervisor :/
 
<jacobEo> Drew_: I don't know. Perhaps we could find another Java XML library that could be made also call for the end tags. Or some kind of wrapper-inbetween thing could be made. Or you could use SAX and make your own callback thing.
 
<jimregan> I think I would have expected more of his opinions if I knew he wasn't actually doing any of the work
 
<avinesh> umm he mainly working on anusaraka
 
<jacobEo> Drew_: There might be other problems. The project just got stranded on the XML parse part.
 
<Drew_> jacobEo: Ah, ok. I'm just compiling it now
 
<jacobEo> Drew_: You have to run the code to see. To do that you need to have at least one language pair runnning on your machine
 
* vaasu (n=yt@123.176.16.43) has joined #apertium
 
<Drew_> jacobEo: I can't find a main class in the source code, am I looking in the wrong place? :S
 
<jacobEo> Drew_: The Java code?
 
<cseong> uhm..i am interested in improving interoperability..but what formats are u refering to ?
 
<Drew_> jacobEo: Yeah, I loaded the java code into eclipse but it can't find a main method to compile the .java's
 
<jacobEo> Drew_: LTComp.java, LTExpand.java, LTProc.java
 
<jimregan> avinesh, yeah. So I was right when I thought he expected us to change all of apertium to suit the analyser :/
 
* abhiSri (i=AB-Alway@220.224.99.238) has joined #apertium
 
<jacobEo> Drew_: Use Netbeans if you can. It's kinda standard here in Apertium
 
   
  +
==Problems==
  +
* Right now we have a line-for-line port of the C++ code of lttoolbox in apertium-tools/lttoolbox-java. It's NOT working.
 
* it's amost line for line identical to the C++, aside from Java/C++ differences.
 
But the languages are different. C++ for example has some methods where some simple type variables are changed (the reference is passed)
 
But in Java simple type variables can only be passed by value, and thus the caller's value is not changes.
 
That sort of things needs to be sorted out.
 
* The biggest problem is the XML handling: The C code's library callback calls a method in the code both when it meets a START and an END tag (for C++, we use libxml2).
 
** The Java's XML library only calls the callback method at the START tag.
 
** Perhaps we could find another Java XML library that could be made also call for the end tags. Or some kind of wrapper-inbetween thing could be made. Or you could use SAX and make your own callback thing.
 
* There might be other problems. The project just got stranded on the XML parse part.
   
   
  +
==Other notes==
 
You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both.
 
You only have to understand what lt-expand, lt-comp and lt-proc does with a .dix file
   
  +
<pre>
<jacobEo_> Rah1: Great. The problem right now is that the XML parser in C++ and in Java behaves differently.
 
 
<Drew_> jacobEo: I can't find a main class in the source code, am I looking in the wrong place? :S
<jacobEo_> Rah1: C++ parser calls back when a tag BEGINS and when it ENDS
 
 
<jacobEo> Drew_: LTComp.java, LTExpand.java, LTProc.java
<jacobEo_> Rah1: Java parser calls back only when a tag BEGINS
 
  +
</pre>
<jacobEo_> Rah1: Therefore the code for Java parsing is not working.
 
<jacobEo_> Rah1: Apart from that there is probably some minor things.
 
<Rah1> hmm, ok
 
<jacobEo_> Rah1: You will have to debug the C++ and the Java version and compare executions.
 
<jacobEo_> Rah1: The Java version is a line-for-line more or less exact port of the C++.
 
<Rah1> yes, I think I can do that
 
<jacobEo_> Rah1: But the languages are different. C++ for example has some methods where some simple type variables are changed (the reference is passed)
 
<jacobEo_> Rah1: But in Java simple type variables can only be passed by value, and thus the caller's value is not changes.
 
<jacobEo_> Rah1: That sort of things needs to be sorted out.
 
<Rah1> Yeah, I see the problem
 
<Rah1> and I think I can take care of it
 
<jacobEo_> So Rah1 You don't have to know much about linguistics. You only have to understand what lt-expand, lt-comp and lt-proc does with a .dix file
 
   
   

Revision as of 15:38, 30 March 2009

Notes (yes, I know its a mess... perhaps I will clean up later, or someone will be so kind to do it for me :-) Jacob Nordfalk 10:29, 30 March 2009 (UTC)

Nic Cottrell contributed a Java port of lttoolbox but it needs work to finish it.

What is lttoolbox

<jacobEo> Rah2: lttoolbox are making binary files out of the .dix files.
<jacobEo> Rah2: lttoolbox-java needs to at least be able to _read_ these binary files.
<jacobEo> Rah2: Pls compile lttoolbox and apertium and a language pair of your choice.
<jacobEo> Rah2: Then much more will be clear

Download preferably via SVN. It it fails:

<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox/
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/lttoolbox-java/
<jacobEo> "Download GNU tarball" will give a compressed archive

Required

  • Binary compatibility with lttoolbox (input and output files should be the same)
  • a test suite which runs on both lttoolbox (C++) and lttoolbox-java


Problems

  • Right now we have a line-for-line port of the C++ code of lttoolbox in apertium-tools/lttoolbox-java. It's NOT working.
  • it's amost line for line identical to the C++, aside from Java/C++ differences.

But the languages are different. C++ for example has some methods where some simple type variables are changed (the reference is passed) But in Java simple type variables can only be passed by value, and thus the caller's value is not changes. That sort of things needs to be sorted out.

  • The biggest problem is the XML handling: The C code's library callback calls a method in the code both when it meets a START and an END tag (for C++, we use libxml2).
    • The Java's XML library only calls the callback method at the START tag.
    • Perhaps we could find another Java XML library that could be made also call for the end tags. Or some kind of wrapper-inbetween thing could be made. Or you could use SAX and make your own callback thing.
  • There might be other problems. The project just got stranded on the XML parse part.


Other notes

You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both. You only have to understand what lt-expand, lt-comp and lt-proc does with a .dix file

<Drew_> jacobEo: I can't find a main class in the source code, am I looking in the wrong place? :S
<jacobEo> Drew_: LTComp.java, LTExpand.java, LTProc.java