Task ideas for Google Code-in/Apy pipedebug

From Apertium
< Task ideas for Google Code-in
Revision as of 12:52, 26 March 2015 by Unhammer (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This task is almost done, see -r59428 and -r57945 in apy SVN.

Still TODO

  • It should translate as you type
    • No caching, just do one request per keypress, with a timeout
  • It should send a NUL on each \0, not escape it. This is still just one request, so we can send a \0 either with curl or from the HTML UI.
    • We might just want to skip apertium-deshtml and assume the user can escape things on their own? Or we could let the user turn on/off deshtml.
  • It should look slightly prettier
    • tags, lemmas and symbols should have different colours (see apertium-viewer)
  • _e in mode names turns into underlined e in the dropdown, should just be _e
  • it'd be cool to be able to edit each step, like with apertium-viewer, but maybe not that important.


APY endpoint

(This task mostly done, see -r59428.)

To start on this work, first look at what Apertium-viewer does.

Now have a look at the /stats endpoint, to see a very simple example of an APY endpoint.

We already have an APY endpoint /translate that does full translation, but that part of the code is rather complex since it has to keep pipelines open between requests; our new /pipedebug endpoint should not reuse those pipelines, but open its own on every request.

Example call, where we've written a \0 in the q to signify a NUL:

   curl 'http://localhost:2737/translate?langpair=isl%7Ceng&q=te\0ost'

Example pipedebug output

   {
     "responseDetails": null,
     "responseStatus": 200,
     "responseData": {
       "output": [
         "^te/te<n><m>/te<vblex><inf>$^./.<sent><clb>$[][\n]\0^ost/ose<vblex><pp>/ost<n><m>$^./.<sent><clb>$[][\n]\0",
         "^te<n><m>$^./.<sent><clb>$[][\n]\0^ost<n><m>$^./.<sent><clb>$[][\n]\0\0",
         "^tea<n>$^./.<sent><clb>$[][\n]\0^cheese<n>$^./.<sent><clb>$[][\n]\0\0\0",
         "^Tea<n>$^./.<sent><clb>$[][\n]\0^Cheese<n>$^./.<sent><clb>$[][\n]\0\0\0\0",
         "Tea.[][\n]\0^Cheese.[][\n]\0\0\0\0\0"
       ],
       "pipeline": [
         "lt-proc -z isl-eng.automorf.bin",
         "apertium-tagger -z -g isl-eng.prob",
         "lt-proc -z isl-eng.autobil.bin",
         "apertium-transfer -z apertium-isl-eng.t1x isl-eng.t1x.bin",
         "lt-proc -z -g isl-eng.autogen.bin",
       ]
     }
   }

You can use translate.py's parseModeFile() to grab the command line, but you can't use startPipeline() since we want to keep track of output between each step.

apertium-viewer.html

(This task mostly done, see tools/apertium-viewer.html in the apy source.)

This html+js page should have an input box where you can type things like "te\0ost", another input box where you can type "isl|eng", and a button, and on clicking the button it should make a request to http://localhost:2737/pipedebug?pair=isl%7Ceng&q=te\0ost and present the output like

   lt-proc -z isl-eng.automorf.bin
   ^te/te<n><m>/te<vblex><inf>$^./.<sent><clb>$[][\n]\0^ost/ose<vblex><pp>/ost<n><m>$^./.<sent><clb>$[][\n]\0
   
   apertium-tagger -z -g isl-eng.prob
   ^te<n><m>$^./.<sent><clb>$[][\n]\0^ost<n><m>$^./.<sent><clb>$[][\n]\0\0
   
   lt-proc -z isl-eng.autobil.bin
   ^tea<n>$^./.<sent><clb>$[][\n]\0^cheese<n>$^./.<sent><clb>$[][\n]\0\0\0
   
   apertium-transfer -z apertium-isl-eng.t1x isl-eng.t1x.bin
   ^Tea<n>$^./.<sent><clb>$[][\n]\0^Cheese<n>$^./.<sent><clb>$[][\n]\0\0\0\0
   
   lt-proc -z -g isl-eng.autogen.bin
   Tea.[][\n]\0^Cheese.[][\n]\0\0\0\0\0

with some colours to make it more readable.

Since /pipedebug is not implemented yet, your request function can make a call to /translate with the same parameters, and then simply disregard the output, and return a hardcoded string containing the JSON in #Example pipedebug output above.

You can also attach an event listener to the input box so it makes requests on typing (but with a timeout).

This should be done in plain javascript+html, no external libs. It has to work offline, with only a checkout of APY and the language pair in question.