Difference between revisions of "Lttoolbox-java"

From Apertium
Jump to navigation Jump to search
(New page: Notes <jimregan> Nic Cottrell contributed a Java port of lttoolbox <jimregan> but it needs work to finish it <jimregan> and a test suite, in both C++ and Java <Drew_> ah, I've found the ...)
 
Line 1: Line 1:
 
Notes
 
Notes
   
  +
<pre>
 
 
<jimregan> Nic Cottrell contributed a Java port of lttoolbox
 
<jimregan> Nic Cottrell contributed a Java port of lttoolbox
 
<jimregan> but it needs work to finish it
 
<jimregan> but it needs work to finish it
 
<jimregan> and a test suite, in both C++ and Java
 
<jimregan> and a test suite, in both C++ and Java
 
<jacobEo> Its in in apertium-tools/lttoolbox-java
<Drew_> ah, I've found the Lttoolbox page on the wiki
 
 
<jacobEo> What is in apertium-tools/lttoolbox-java right now is NOT working.
<jimregan> ok
 
 
<jacobEo> It's a line-for-line port of the C++ code of lttoolbox
<Drew_> this is a project I may be interested in - my specialty language is Java
 
  +
<jacobEo> Great Drew_ !
 
  +
<Drew_> :)
 
<jacobEo> its in in apertium-tools/lttoolbox-java
 
<Drew_> Do you have any more information on it at the minute?
 
<jacobEo> Drew_: What is in apertium-tools/lttoolbox-java right now
 
<jacobEo> is NOT working.
 
<jacobEo> in apertium-tools/lttoolbox-java is a line-for-line port of the C++ code of lttoolbox
 
 
<jacobEo> and the great problem is the XML handling
 
<jacobEo> and the great problem is the XML handling
   
   
 
<jimregan> it has to be binary compatible
 
<jimregan> it has to be binary compatible
<Drew_> jacobEo, I will download Ubuntu now
 
<jacobEo> jimregan: Did you look at the Java code?
 
 
<jimregan> and the test suite has to be in both C++ and Java, to ensure that
 
<jimregan> and the test suite has to be in both C++ and Java, to ensure that
<jacobEo> ok. "medium" then, or if there is anything betw "easy" and "medium" choose that
 
 
<jimregan> yeah, it's amost line for line identical to the C++, aside from Java/C++ differences
 
<jimregan> yeah, it's amost line for line identical to the C++, aside from Java/C++ differences
 
<jimregan> but, the binary stuff can be hard
 
<jimregan> but, the binary stuff can be hard
Line 43: Line 35:
 
<jacobEo> Rah2: lttoolbox are making binary files out of the .dix files.
 
<jacobEo> Rah2: lttoolbox are making binary files out of the .dix files.
 
<jacobEo> Rah2: lttoolbox-java needs to at least be able to _read_ these binary files.
 
<jacobEo> Rah2: lttoolbox-java needs to at least be able to _read_ these binary files.
<Rah2> ok
 
* vaasu (i=73548f22@gateway/web/ajax/mibbit.com/x-423d7da178407283) has joined #apertium
 
<jacobEo> Rah2: Did you try Apertium? Have a language pair installed?
 
<Rah2> I just svn checked out
 
* vaasu has quit (Client Quit)
 
* Drew_ (n=chatzill@5ac42755.bb.sky.com) has joined #apertium
 
<jimregan> wow!
 
<jimregan> Rah2, that was /fast/
 
<jimregan> I only finsihed adding that 5 minutes ago :)
 
<Rah2> no in fact it wasn't
 
<Rah2> It's like 900 Mo
 
<Rah2> I took it all
 
<Rah2> I just started before you mentionned that project
 
<jimregan> no; I mean the Java lttoolbox idea :)
 
 
<jacobEo> Rah2: Pls compile lttoolbox and apertium and a language pair of your choice.
 
<jacobEo> Rah2: Pls compile lttoolbox and apertium and a language pair of your choice.
<Rah2> I was idling on that chan
 
 
<jacobEo> Rah2: Then much more will be clear
 
<jacobEo> Rah2: Then much more will be clear
   
Line 64: Line 41:
 
<jacobEo> Rah2: You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both
 
<jacobEo> Rah2: You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both
 
<Drew_> jacobEo: What was the location of lttoolbox again?
 
<Drew_> jacobEo: What was the location of lttoolbox again?
<jacobEo> Drew_: With SVN or as a ZIP file?
 
<CIA-18> apertium: nordfalk * r9192 /trunk/apertium-eo-en/apertium-eo-en.en-eo.t1x: Pli da simpligo. set_gender1 estas preskaux ne-necesa
 
<Drew_> um, I am using Tortoise SVN, is there a ZIP file uploaded somewhere?
 
<jacobEo> You can get SVN things as ZIP files.
 
<Drew_> ah right
 
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox/
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox/
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/lttoolbox-java/
 
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/lttoolbox-java/
 
 
<jacobEo> "Download GNU tarball" will give a compressed archive
 
<jacobEo> "Download GNU tarball" will give a compressed archive
   
Line 103: Line 74:
 
* abhiSri (i=AB-Alway@220.224.99.238) has joined #apertium
 
* abhiSri (i=AB-Alway@220.224.99.238) has joined #apertium
 
<jacobEo> Drew_: Use Netbeans if you can. It's kinda standard here in Apertium
 
<jacobEo> Drew_: Use Netbeans if you can. It's kinda standard here in Apertium
  +
  +
  +
  +
  +
<jacobEo_> Rah1: Great. The problem right now is that the XML parser in C++ and in Java behaves differently.
  +
<jacobEo_> Rah1: C++ parser calls back when a tag BEGINS and when it ENDS
  +
<jacobEo_> Rah1: Java parser calls back only when a tag BEGINS
  +
<jacobEo_> Rah1: Therefore the code for Java parsing is not working.
  +
<jacobEo_> Rah1: Apart from that there is probably some minor things.
  +
<Rah1> hmm, ok
  +
<jacobEo_> Rah1: You will have to debug the C++ and the Java version and compare executions.
  +
<jacobEo_> Rah1: The Java version is a line-for-line more or less exact port of the C++.
  +
<Rah1> yes, I think I can do that
  +
<jacobEo_> Rah1: But the languages are different. C++ for example has some methods where some simple type variables are changed (the reference is passed)
  +
<jacobEo_> Rah1: But in Java simple type variables can only be passed by value, and thus the caller's value is not changes.
  +
<jacobEo_> Rah1: That sort of things needs to be sorted out.
  +
<Rah1> Yeah, I see the problem
  +
<Rah1> and I think I can take care of it
  +
<jacobEo_> So Rah1 You don't have to know much about linguistics. You only have to understand what lt-expand, lt-comp and lt-proc does with a .dix file
  +
  +
  +
  +
</pre>

Revision as of 10:21, 30 March 2009

Notes

<jimregan> Nic Cottrell contributed a Java port of lttoolbox
<jimregan> but it needs work to finish it
<jimregan> and a test suite, in both C++ and Java
<jacobEo> Its in in apertium-tools/lttoolbox-java 
<jacobEo> What is in apertium-tools/lttoolbox-java right now is NOT working.
<jacobEo> It's a line-for-line port of the C++ code of lttoolbox


<jacobEo> and the great problem is the XML handling


<jimregan> it has to be binary compatible
<jimregan> and the test suite has to be in both C++ and Java, to ensure that
<jimregan> yeah, it's amost line for line identical to the C++, aside from Java/C++ differences
<jimregan> but, the binary stuff can be hard
<jacobEo> therefore jimregan its not that hard. 
<jimregan> all you need is one bit in the wrong place, and it's useless
<jacobEo> jimregan: Binary stuff?
<jimregan> medium, then
<jimregan> jacobEo, yeah
<jimregan> well
<jimregan> the compression stuff
<cseong> if i dont know one of the required language, for example is C,C++ and XML are the requirements and i dont know XML, can i still choose it ?
<jimregan> and the transducer
<jacobEo> jimregan: The binary stuff is _probably_ easy, as you can debug the C++ and compare variables etc
<jimregan> cseong, XML is easy to pick up
<jimregan> there are plenty of APIs availabl
<jacobEo> cseong: Which project are you thinking of?
<jimregan> for C++, we use libxml2


<jacobEo> Rah2: lttoolbox are making binary files out of the .dix files.
<jacobEo> Rah2: lttoolbox-java needs to at least be able to _read_ these binary files.
<jacobEo> Rah2: Pls compile lttoolbox and apertium and a language pair of your choice.
<jacobEo> Rah2: Then much more will be clear


<jacobEo> Rah2: You don't need much knowlede of MT or NLP to do lttoolbox-java. But you need to know C++ and Java and be able to debug both
<Drew_> jacobEo: What was the location of lttoolbox again?
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox/
<jacobEo> http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium-tools/lttoolbox-java/
<jacobEo> "Download GNU tarball" will give a compressed archive

<jacobEo> The problem, I think, is the XML handling: The C code's library callback calls a method in the code both when it meets a START and an END tag.
<jimregan> avinesh, maybe you should say it to spectie because I'm not interested in unicode -> wx
<jimregan> not for any reason
<jacobEo> the Java's XML library only calls the callback method at the START tag.
<jimregan> jacobEo, that will be necessary for chunk merging
<jimregan> we don't have it yet, but it will be necessary
<Leftmost> jimregan, I guess I'm a bit unclear as to what form the regression tests should take. Simply translations between ga and gd?
<jimregan> because when chcontent in t2 is written in chunk mode, it will be without { or }, otherwise with
<avinesh> ok got it
<jimregan> to fit the current model, that has to be a bool set and unset on entry/exit
<Drew_> jacobEo: Is it a big job to make it work with the END tag?
<avinesh> no wx right :D
<jimregan> that's it
<jimregan> avinesh, noone told me anudev is a course supervisor :/
<jacobEo> Drew_: I don't know. Perhaps we could find another Java XML library that could be made also call for the end tags. Or some kind of wrapper-inbetween thing could be made. Or you could use SAX and make your own callback thing.
<jimregan> I think I would have expected more of his opinions if I knew he wasn't actually doing any of the work
<avinesh> umm he mainly working on anusaraka
<jacobEo> Drew_: There might be other problems. The project just got stranded on the XML parse part.
<Drew_> jacobEo: Ah, ok. I'm just compiling it now
<jacobEo> Drew_: You have to run the code to see. To do that you need to have at least one language pair runnning on your machine
* vaasu (n=yt@123.176.16.43) has joined #apertium
<Drew_> jacobEo: I can't find a main class in the source code, am I looking in the wrong place? :S
<jacobEo> Drew_: The Java code?
<cseong> uhm..i am interested in improving interoperability..but what formats are u refering to ?
<Drew_> jacobEo: Yeah, I loaded the java code into eclipse but it can't find a main method to compile the .java's
<jacobEo> Drew_: LTComp.java, LTExpand.java, LTProc.java
<jimregan> avinesh, yeah. So I was right when I thought he expected us to change all of apertium to suit the analyser :/
* abhiSri (i=AB-Alway@220.224.99.238) has joined #apertium
<jacobEo> Drew_: Use Netbeans if you can. It's kinda standard here in Apertium




<jacobEo_> Rah1: Great. The problem right now is that the XML parser in C++ and in Java behaves differently.
<jacobEo_> Rah1: C++ parser calls back when a tag BEGINS and when it ENDS
<jacobEo_> Rah1: Java parser calls back only when a tag BEGINS
<jacobEo_> Rah1: Therefore the code for Java parsing is not working.
<jacobEo_> Rah1: Apart from that there is probably some minor things.
<Rah1> hmm, ok
<jacobEo_> Rah1: You will have to debug the C++ and the Java version and compare executions.
<jacobEo_> Rah1: The Java version is a line-for-line more or less exact port of the C++.
<Rah1> yes, I think I can do that
<jacobEo_> Rah1: But the languages are different. C++ for example has some methods where some simple type variables are changed (the reference is passed)
<jacobEo_> Rah1: But in Java simple type variables can only be passed by value, and thus the caller's value is not changes.
<jacobEo_> Rah1: That sort of things needs to be sorted out.
<Rah1> Yeah, I see the problem
<Rah1> and I think I can take care of it
<jacobEo_> So Rah1 You don't have to know much about linguistics. You only have to understand what lt-expand, lt-comp and lt-proc does with a .dix file