User:SilentFlame/updatedPipeline

From Apertium
< User:SilentFlame
Revision as of 10:49, 29 August 2017 by Unhammer (talk | contribs) (Often, the "t1x" / apertium-transfer stage is called "chunker" since it *creates* the chunks for use by interchunk)
Jump to navigation Jump to search

For the work done at Progress regarding Automatic_blank_handling

Input and Output at different stages/modes

Input: "<div><i>Hello</i> <b>world</b></div>" 
Testing this input on the entire pipeline.

deformatter stage

run $ make command in https://github.com/SilentFlame/apertium/tree/master directory.

Command: $ echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml

Output: [<div>][{<i>}]Hello[] [{<b>}]world[][][</div>]

lt-proc(automorph) stage

after running the make install command in https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing directory (the updated module)

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIRECTORY/apertium-en-es/en-es.automorf.bin'

Output: [<div>][{<i>}]^Hello/Hello<ij>$[] [{<b>}]^world/world<adj>/world<n><sg>$[][][</div>]

tagger stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob'

Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]

pretransfer stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer

Output:[<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]

transfer(chunker) stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin'

Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]

lt-proc(auto-bilingual) stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin'

Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]

lrx-proc(auto-lexical) stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' 

Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]

transfer stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' | apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x'  'DIR/apertium-en-es/en-es.t1x.bin'

Output: [<div>]^default<default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><ND>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]

interchunk stage

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIRApertium/apertium-en-es/en-es.autolex.bin' | apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x'  'DIR/apertium-en-es/en-es.t1x.bin' \
| apertium-interchunk 'DIR/apertium-en-es/apertium-en-es.en-es.t2x'  'DIR/apertium-en-es/en-es.t2x.bin'

Output: [<div>]^default< default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><sg>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]


Tasks done

Pretransfer

All the pretransfer tests pass here.

Taransfer(chunker)

All the tests mentioned in https://github.com/SilentFlame/apertium-1/tree/blank-handling/tests/transfer passes with the updated transfer module.

Interchunk

  • Here removing "pos=1" from a "<b>" still outputs the right inline blank: This is because If given a "freeblank" which is between chunks and not a wordbound/inline blank so we need to treat it differently. let's say for example we have "^SN<sg>{^cheese<n>$}$🍰^SN<sg>{^sale<n>$}$" as an input. and the rule matches those two chunks and has an action " <out> <chunk pos="1" part="whole"/> <b/> <chunk pos="2" part="whole"/> </out> " so if here we treat "<b/>" as just a space then we'll loose "🍰" which won't give much good feel to our users. So to retain this in the output we handled the freeblanks between chunks.
  • Task: Interchunk was needed to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Here in this module we didn't deal with wordblanks, since we can't look inside chunks when in interchunk.
  • Category: Code enhancing
  • PR: https://github.com/unhammer/apertium/pull/6
  • Tests: https://github.com/SilentFlame/apertium-1/tree/blank-handling-interchunk/tests/interchunk

All tests mentioned in https://github.com/SilentFlame/apertium-1/blob/blank-handling-interchunk/tests/interchunk/__init__.py passes with the updated interchunk module.

Deformatters

All the tests run without fail and the run command is $pytyhon tests/run_test.py inside the apertium folder.

Reformatters

All the tests run without fail and the run command is $pytyhon tests/run_test.py inside the apertium folder.

lttoolbox

All the above tests for lt-proc passes with the updated module.