Difference between revisions of "Java port of Apertium runtime"

From Apertium
Jump to navigation Jump to search
Line 15: Line 15:
Ideally should a self-contained Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.
Ideally should a self-contained Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.


An "embedding" approach is to use a client stub to an [[Apertium web service]], but there can be reasons why people prefers to have things installed locally (we don't need to repeat them here).
An "embedding" approach is to use a client stub to one of our [[Apertium services]], but there can be reasons why people prefers to have things installed locally (we don't need to repeat them here).





Revision as of 09:00, 27 May 2010

A "Java port" of Apertium would enable use on

  • Windows,
  • J2ME/Android phones,
  • web pages (applets),
  • desktop application,
  • Java server applications.

The last 2 is relevant as, for example an OpenOffice.org plugin should be platform independent to be maintainable.

AFAIK we havent seen anyone embedding Apertium in a desktop application. Currently Apertium is usable in a local subdir but installation isnt trivial to an end user, but note that 'embedding' something isnt the same as 'using a locally installed version'.

Having a packaged easy-to-use version of Apertium ready for embedding MT in a larger program would be very cool. Ideally should a self-contained Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.

An "embedding" approach is to use a client stub to one of our Apertium services, but there can be reasons why people prefers to have things installed locally (we don't need to repeat them here).


Missing for a complete port of apertium in java is tagger, piping/modes, interchunk/postchunk and format handling.


Af I see it:

  • Interchunk/postchunk I could do in very short time, so I could mentor this
  • Tagger: The C++ code is there and some has already been ported to Java (lttoolbox-java parts), probably someone could port it relatively easy, if someone understanding tagger would co-mentor that part. I'd say its not necessary to port tagger training, just the core (bigram) tagging during translation.
  • Piping/modes isnt too hard.
  • Format handling: I'd say that it would be OK just to be able to handle normal text.

--Jacob Nordfalk 08:33, 14 March 2010 (UTC)

Source code

Source code is here: http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/

Interested people

By #apertium IRC chat name:

Random snippets of information, that someone might find usefull but noone cared to organize yet

<Unhammer> I _think_ those days are OK with me
<jacobEo> Zerocool1989: please co-ordinate with jimregan as he's also working on the code.
<jacobEo> Zerocool1989: I think you first have to try to use a lang par with interchunk and postchunk
* sioraiocht (~tomh@unaffiliated/sioraiocht) has joined #apertium
<Zerocool1989> ok
<Zerocool1989> where can i get the interchunk code
<jacobEo> 1 sec
<jacobEo> esperanto/apertium/apertium/apertium/interchunk.cc
<jacobEo> sorry
<jacobEo> apertium dir in SVN  + /apertium/interchunk.cc
<jacobEo> but Zerocool1989 (and Kanmuri? ) you should compare with  esperanto/apertium/apertium/apertium/transfer.cc 
<jacobEo> and the Java transfer
<jacobEo> is working very differently
<jacobEo> actually, I think the best thing would be not to care about interchunk and postchunk for now, I can probably do them very fast
<jacobEo> better would be to help jimregan on the tagger work.
<jacobEo> and to sort out the modes/piping
<jacobEo> interchunk and postchunk is also only for 3+ - stage transfer. There are language pairs that only have 1 stage.
<jacobEo> so another thing that could be done is to try out an 1-stage lang pair (e.g. nn-nb) and see if you can get the parts running in Java

See also this thread: [1]


Debugging the C tagger

we should step-by-step debug the C++ version first, to see what happens

echo "^this/this<det><dem><sg>/this<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^a/a<det><ind><sg>$ ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^./.<sent>$" | ./apertium-tagger -g ../../lttoolbox-java/testdata/en-es.prob

<jacobEo> ok chy whats the problem?
<jacobEo> chy: you must be in apertium/apertium.
<chy> Sorry I did run
<chy> ^this<prn><tn><mf><sg> ^be<vbser><pri><p3><sg>$ $^a<det><ind><sg> ^test<n><sg>$Warning (internal): kIGNORE was returned while reading a word
<chy> Word being read: .<sent>
<chy> Debug: .<sent>
<chy> $.<sent>
<jacobEo> chy: this is the Java code
<jacobEo> chy: debug the C code
<chy> ok .. we do the same for lttoolbox not lttoolbox.java
<chy> ?
<jacobEo> no, not lttoolbox dir
<jacobEo> apertium dir
<chy> ok
<jacobEo> chy: this dir: http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium/apertium/
<jacobEo> first, do 'make' in parent dir
<jacobEo> then run
<jacobEo>  ./apertium-tagger -g  ../../lttoolbox-java/testdata/en-es.prob 
<jacobEo> to run the C code

<jacobEo> echo "^this/this<det><dem><sg>/this<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^a/a<det><ind><sg>$ ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^./.<sent>$" |  ./apertium-tagger -g /home/mhc/apertium/javaApertium/lttoolbox-java/testdata/en-es.prob
<chy> ^this<prn><tn><mf><sg>$ ^be<vbser><pri><p3><sg>$ ^a<det><ind><sg>$ ^test<n><sg>$^.<sent>$
<chy> sorry,,,,
<jacobEo> great, it works.
<jacobEo> chy: now, debug it :-)
<jacobEo> $ man apertium-tagger
<jacobEo> says
<jacobEo>        apertium-tagger --tagger|-g [--first|-f] PROB [--debug|-d] [INPUT [OUTPUT]]
<jacobEo> i.e. input can be in a file
<jacobEo> chy: do:
<jacobEo> echo "^this/this<det><dem><sg>/this<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^a/a<det><ind><sg>$ ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^./.<sent>$" > input.txt
<jacobEo> and then
<jacobEo>  ./apertium-tagger -g /home/mhc/apertium/javaApertium/lttoolbox-java/testdata/en-es.prob input.txt