Difference between revisions of "Java port of Apertium runtime"
Jump to navigation
Jump to search
(Redirected page to Lttoolbox-java) |
|||
Line 1: | Line 1: | ||
+ | #REDIRECT [[Lttoolbox-java]] |
||
− | {{TOCD}} |
||
− | A "Java port" of Apertium would enable use on |
||
− | |||
− | * Windows, |
||
− | * J2ME/Android phones, |
||
− | * web pages (applets), |
||
− | * desktop application, |
||
− | * Java server applications. |
||
− | |||
− | The last 2 is relevant as, for example an OpenOffice.org plugin should be platform independent to be maintainable. |
||
− | |||
− | AFAIK we havent seen anyone embedding Apertium in a desktop application. Currently Apertium is usable in a local subdir but installation isnt trivial to an end user, but note that 'embedding' something isnt the same as 'using a locally installed version'. |
||
− | |||
− | Having a packaged easy-to-use version of Apertium ready for embedding MT in a larger program would be very cool. |
||
− | Ideally should a self-contained Apertium JAR file, only dependent on JRE and an additional JAR file per language pair. |
||
− | |||
− | An "embedding" approach is to use a client stub to one of our [[Apertium services]], but there can be reasons why people prefers to have things installed locally (we don't need to repeat them here). |
||
− | |||
− | |||
− | Missing for a complete port of apertium in java is tagger, piping/modes, interchunk/postchunk and format handling. |
||
− | |||
− | |||
− | Af I see it: |
||
− | * Interchunk/postchunk I could do in very short time, so I could mentor this |
||
− | * Tagger: The C++ code is there and some has already been ported to Java ([[lttoolbox-java]] parts), probably someone could port it relatively easy, if someone understanding tagger would co-mentor that part. I'd say its not necessary to port tagger training, just the core (bigram) tagging during translation. |
||
− | * Piping/modes isnt too hard. |
||
− | * Format handling: I'd say that it would be OK just to be able to handle normal text. |
||
− | |||
− | --[[User:Jacob Nordfalk|Jacob Nordfalk]] 08:33, 14 March 2010 (UTC) |
||
− | |||
− | == Source code == |
||
− | Source code is here: http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/lttoolbox-java/ |
||
− | |||
− | == Interested people == |
||
− | |||
− | By #apertium IRC chat name: |
||
− | |||
− | * chy - Danish student Moffassal Hossain - [[User:Chy/Gsoc 2010 Application/Java_port_of_Apertium]] |
||
− | * Kanmuri - [[User:Kanmuri/GSoC_2010_Application/Java_Runtime_Port]] |
||
− | * Zerocool1989 |
||
− | * keshan |
||
− | * ... |
||
− | |||
− | == Random snippets of information, that someone might find usefull but noone cared to organize yet == |
||
− | <pre> |
||
− | <Unhammer> I _think_ those days are OK with me |
||
− | <jacobEo> Zerocool1989: please co-ordinate with jimregan as he's also working on the code. |
||
− | <jacobEo> Zerocool1989: I think you first have to try to use a lang par with interchunk and postchunk |
||
− | * sioraiocht (~tomh@unaffiliated/sioraiocht) has joined #apertium |
||
− | <Zerocool1989> ok |
||
− | <Zerocool1989> where can i get the interchunk code |
||
− | <jacobEo> 1 sec |
||
− | <jacobEo> esperanto/apertium/apertium/apertium/interchunk.cc |
||
− | <jacobEo> sorry |
||
− | <jacobEo> apertium dir in SVN + /apertium/interchunk.cc |
||
− | <jacobEo> but Zerocool1989 (and Kanmuri? ) you should compare with esperanto/apertium/apertium/apertium/transfer.cc |
||
− | <jacobEo> and the Java transfer |
||
− | <jacobEo> is working very differently |
||
− | <jacobEo> actually, I think the best thing would be not to care about interchunk and postchunk for now, I can probably do them very fast |
||
− | <jacobEo> better would be to help jimregan on the tagger work. |
||
− | <jacobEo> and to sort out the modes/piping |
||
− | <jacobEo> interchunk and postchunk is also only for 3+ - stage transfer. There are language pairs that only have 1 stage. |
||
− | <jacobEo> so another thing that could be done is to try out an 1-stage lang pair (e.g. nn-nb) and see if you can get the parts running in Java |
||
− | </pre> |
||
− | |||
− | See also this thread: [http://sourceforge.net/mailarchive/message.php?msg_name=20cf28cd1003080815x56dd1969h229c3f1c7c2e81e2%40mail.gmail.com] |
||
− | |||
− | |||
− | ===Debugging the C tagger=== |
||
− | |||
− | we should step-by-step debug the C++ version first, to see what happens |
||
− | |||
− | echo "^this/this<det><dem><sg>/this<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^a/a<det><ind><sg>$ ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^./.<sent>$" | ./apertium-tagger -g ../../lttoolbox-java/testdata/en-es.prob |
||
− | |||
− | <pre> |
||
− | <jacobEo> ok chy whats the problem? |
||
− | <jacobEo> chy: you must be in apertium/apertium. |
||
− | <chy> Sorry I did run |
||
− | <chy> ^this<prn><tn><mf><sg> ^be<vbser><pri><p3><sg>$ $^a<det><ind><sg> ^test<n><sg>$Warning (internal): kIGNORE was returned while reading a word |
||
− | <chy> Word being read: .<sent> |
||
− | <chy> Debug: .<sent> |
||
− | <chy> $.<sent> |
||
− | <jacobEo> chy: this is the Java code |
||
− | <jacobEo> chy: debug the C code |
||
− | <chy> ok .. we do the same for lttoolbox not lttoolbox.java |
||
− | <chy> ? |
||
− | <jacobEo> no, not lttoolbox dir |
||
− | <jacobEo> apertium dir |
||
− | <chy> ok |
||
− | <jacobEo> chy: this dir: http://apertium.svn.sourceforge.net/viewvc/apertium/trunk/apertium/apertium/ |
||
− | <jacobEo> first, do 'make' in parent dir |
||
− | <jacobEo> then run |
||
− | <jacobEo> ./apertium-tagger -g ../../lttoolbox-java/testdata/en-es.prob |
||
− | <jacobEo> to run the C code |
||
− | |||
− | <jacobEo> echo "^this/this<det><dem><sg>/this<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^a/a<det><ind><sg>$ ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^./.<sent>$" | ./apertium-tagger -g /home/mhc/apertium/javaApertium/lttoolbox-java/testdata/en-es.prob |
||
− | <chy> ^this<prn><tn><mf><sg>$ ^be<vbser><pri><p3><sg>$ ^a<det><ind><sg>$ ^test<n><sg>$^.<sent>$ |
||
− | <chy> sorry,,,, |
||
− | <jacobEo> great, it works. |
||
− | <jacobEo> chy: now, debug it :-) |
||
− | <jacobEo> $ man apertium-tagger |
||
− | <jacobEo> says |
||
− | <jacobEo> apertium-tagger --tagger|-g [--first|-f] PROB [--debug|-d] [INPUT [OUTPUT]] |
||
− | <jacobEo> i.e. input can be in a file |
||
− | <jacobEo> chy: do: |
||
− | <jacobEo> echo "^this/this<det><dem><sg>/this<prn><tn><mf><sg>$ ^is/be<vbser><pri><p3><sg>$ ^a/a<det><ind><sg>$ ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$^./.<sent>$" > input.txt |
||
− | <jacobEo> and then |
||
− | <jacobEo> ./apertium-tagger -g /home/mhc/apertium/javaApertium/lttoolbox-java/testdata/en-es.prob input.txt |
||
− | |||
− | </pre> |
Latest revision as of 08:47, 5 March 2012
Redirect to: