Java port of Apertium runtime
A "Java port" of Apertium would enable use on
- Windows,
- J2ME/Android phones,
- web pages (applets),
- desktop application,
- Java server applications.
The last 2 is relevant as, for example an OpenOffice.org plugin should be platform independent to be maintainable.
AFAIK we havent seen anyone embedding Apertium in a desktop application. Currently Apertium is usable in a local subdir but installation isnt trivial to an end user, but note that 'embedding' something isnt the same as 'using a locally installed version'.
Having a packaged easy-to-use version of Apertium ready for embedding MT in a larger program would be very cool. Ideally should a self-contained Apertium JAR file, only dependent on JRE and an additional JAR file per language pair.
An "embedding" approach is to use a client stub to an Apertium web service, but there can be reasons why people prefers to have things installed locally (we don't need to repeat them here).
Missing for a complete port of apertium in java is tagger, piping/modes, interchunk/postchunk and format handling.
Af I see it:
- Interchunk/postchunk I could do in very short time, so I could mentor this
- Tagger: The C++ code is there and some has already been ported to Java (lttoolbox-java parts), probably someone could port it relatively easy, if someone understanding tagger would co-mentor that part. I'd say its not necessary to port tagger training, just the core (bigram) tagging during translation.
- Piping/modes isnt too hard.
- Format handling: I'd say that it would be OK just to be able to handle normal text.
--Jacob Nordfalk 08:33, 14 March 2010 (UTC)
Source code
Source code is here: http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/
Interested people
By #apertium IRC chat name:
- chy - Danish student Moffassal Hossain - User:Chy/Gsoc 2010 Application/Java_port_of_Apertium
- Kanmuri - User:Kanmuri/GSoC_2010_Application/Java_Runtime_Port
- Zerocool1989
- keshan
- ...
Random snippets of information, that someone might find usefull but noone cared to organize yet
<Unhammer> I _think_ those days are OK with me <jacobEo> Zerocool1989: please co-ordinate with jimregan as he's also working on the code. <jacobEo> Zerocool1989: I think you first have to try to use a lang par with interchunk and postchunk * sioraiocht (~tomh@unaffiliated/sioraiocht) has joined #apertium <Zerocool1989> ok <Zerocool1989> where can i get the interchunk code <jacobEo> 1 sec <jacobEo> esperanto/apertium/apertium/apertium/interchunk.cc <jacobEo> sorry <jacobEo> apertium dir in SVN + /apertium/interchunk.cc <jacobEo> but Zerocool1989 (and Kanmuri? ) you should compare with esperanto/apertium/apertium/apertium/transfer.cc <jacobEo> and the Java transfer <jacobEo> is working very differently <jacobEo> actually, I think the best thing would be not to care about interchunk and postchunk for now, I can probably do them very fast <jacobEo> better would be to help jimregan on the tagger work. <jacobEo> and to sort out the modes/piping <jacobEo> interchunk and postchunk is also only for 3+ - stage transfer. There are language pairs that only have 1 stage. <jacobEo> so another thing that could be done is to try out an 1-stage lang pair (e.g. nn-nb) and see if you can get the parts running in Java
See also this thread: [1]