User:SilentFlame/updatedPipeline
Jump to navigation
Jump to search
For the work done at Progress regarding Automatic_blank_handling
Contents
Tasks completed
deformatting prototypes
pretransfer
transfer (chunker)
Interchunk
Deformatters
Reformatters
lttoolbox
Tasks left to do
postchunk
hfst
- Make hfst-proc correctly disperse inline blanks onto each lexical unit until the next
[
transfer (non-chunking)
Input and Output at different stages/modes
Input: "<div><i>Hello</i> <b>world</b></div>" Testing this input on the entire pipeline.
- The DIR/DIRECTORY in the below commands refers to the directory address where you have your language pair compiled. Here I have used apertium-en-es language pair.
deformatter stage
- run $ make command in https://github.com/SilentFlame/apertium/tree/master directory.
Command: $ echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml Output: [<div>][{<i>}]Hello[] [{<b>}]world[][][</div>]
lt-proc(automorph) stage
- after running the make install command in https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing directory (the updated module)
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIRECTORY/apertium-en-es/en-es.automorf.bin' Output: [<div>][{<i>}]^Hello/Hello<ij>$[] [{<b>}]^world/world<adj>/world<n><sg>$[][][</div>]
tagger stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
pretransfer stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \ | apertium-pretransfer Output:[<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
transfer(chunker) stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \ | apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
lt-proc(auto-bilingual) stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \ | apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \ | lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]
lrx-proc(auto-lexical) stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \ | apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \ | lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]
transfer stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \ | apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \ | lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' \ | apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x' 'DIR/apertium-en-es/en-es.t1x.bin' Output: [<div>]^default<default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><ND>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]
interchunk stage
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \ | apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \ | lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIRApertium/apertium-en-es/en-es.autolex.bin' \ | apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x' 'DIR/apertium-en-es/en-es.t1x.bin' \ | apertium-interchunk 'DIR/apertium-en-es/apertium-en-es.en-es.t2x' 'DIR/apertium-en-es/en-es.t2x.bin' Output: [<div>]^default< default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><sg>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]
Tasks done
Pretransfer
- Task: Making pretransfer disperse tags when splitting lexical units.
- Category: Code cleanup
- PR: https://github.com/unhammer/apertium/pull/4
- Tests: https://github.com/unhammer/apertium/tree/blank-handling/tests
- personnel repo branch for all the work on pretransfer: https://github.com/SilentFlame/apertium-1/tree/blank-handling
All the pretransfer tests pass here.
Transfer(chunker)
- Task: Fixing a memory bug which raises due to uncommenting of apertium/transfer.cc:1259 // delete[] format;
- Category: system bug
- PR: https://github.com/unhammer/apertium/pull/5
- Tests: https://github.com/unhammer/apertium/tree/blank-handling/tests
- personnel repo for the work on this module: https://github.com/SilentFlame/apertium-1/tree/blank-handling
All the tests mentioned in https://github.com/SilentFlame/apertium-1/tree/blank-handling/tests/transfer passes with the updated transfer module.
Interchunk
- Here removing "pos=1" from a "<b>" still outputs the right inline blank: This is because If given a "freeblank" which is between chunks and not a wordbound/inline blank so we need to treat it differently. let's say for example we have "^SN<sg>{^cheese<n>$}$🍰^SN<sg>{^sale<n>$}$" as an input. and the rule matches those two chunks and has an action " <out> <chunk pos="1" part="whole"/> <b/> <chunk pos="2" part="whole"/> </out> " so if here we treat "<b/>" as just a space then we'll loose "🍰" which won't give much good feel to our users. So to retain this in the output we handled the freeblanks between chunks.
- Task: Interchunk was needed to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Here in this module we didn't deal with wordblanks, since we can't look inside chunks when in interchunk.
- Category: Code enhancing
- PR: https://github.com/unhammer/apertium/pull/6
- Tests: https://github.com/SilentFlame/apertium-1/tree/blank-handling-interchunk/tests/interchunk
All tests mentioned in https://github.com/SilentFlame/apertium-1/blob/blank-handling-interchunk/tests/interchunk/__init__.py passes with the updated interchunk module.
Deformatters
- Task: Completing the prototype HTML deformatter written during the coding challenge and by the previous contributors.
- Category: Code enhancement
- PR: Not made a PR because of need to test some more edge cases, but the entire work is at https://github.com/SilentFlame/apertium/blob/master/deformatter.cpp.
- Tests: https://github.com/SilentFlame/apertium/tree/master/tests/deformatter
All the tests run without fail and the run command is $python tests/run_test.py inside the apertium folder.
Reformatters
- Task: Making the reformatter script able to correctly turn inline-blanks into real tags.
- Category: Code enhancement and compatibility
- PR: Not made a PR because of need to test some more edge cases, but the entire work is at https://github.com/SilentFlame/apertium/blob/master/reformatter.cpp.
- Tests: https://github.com/SilentFlame/apertium/tree/master/tests/reformatter
All the tests run without fail and the run command is $python tests/run_test.py inside the apertium folder.
lttoolbox
- Task: Making lt-proc correctly disperse inline blanks onto each lexical unit until the next [.
- Category: Code enhancement and functionality improvement
- PR: https://github.com/unhammer/lttoolbox/pull/2 witing for some last time edits before merge.
- Tests: Made a new file as per the tests present in transfer, pretransfer and other modules at https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing/tests/lt_proc
All the above tests for lt-proc passes with the updated module.