Difference between revisions of "User:SilentFlame/updatedPipeline"

From Apertium
Jump to navigation Jump to search
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
For the work done at [[User:SilentFlame/Progress|Progress regarding Automatic_blank_handling]]
For the work done at [[User:SilentFlame/Progress|Progress regarding Automatic_blank_handling]]

== Tasks completed ==
===deformatting prototypes===
===pretransfer===
===transfer (chunker)===
===Interchunk===
===Deformatters===
===Reformatters===
===lttoolbox===

== Tasks left to do ==

===postchunk===
===hfst===

* Make hfst-proc correctly disperse inline blanks onto each lexical unit until the next <code><nowiki>[</nowiki></code>

===transfer (non-chunking)===



== Input and Output at different stages/modes ==
== Input and Output at different stages/modes ==
Line 8: Line 27:
</pre>
</pre>


* The '''DIR/DIRECTORY''' in the below commands refers to the directory address where you have your language pair compiled. Here I have used '''apertium-en-es''' language pair.
===Deformatter stage===

run '''$ make''' command in https://github.com/SilentFlame/apertium/tree/master directory.
===deformatter stage===
*run '''$ make''' command in https://github.com/SilentFlame/apertium/tree/master directory.


<pre>
<pre>
Command: $ echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml
Command: $ echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml

Output: [<div>][{<i>}]Hello[] [{<b>}]world[][][</div>]
Output: [<div>][{<i>}]Hello[] [{<b>}]world[][][</div>]
</pre>
</pre>


===lt-proc stage===
===lt-proc(automorph) stage===
after running the '''make install''' command in https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing directory (the updated module)
* after running the '''make install''' command in https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing directory (the updated module)


<pre>
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIRECTORY/apertium-en-es/en-es.automorf.bin'
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIRECTORY/apertium-en-es/en-es.automorf.bin'

Output: [<div>][{<i>}]^Hello/Hello<ij>$[] [{<b>}]^world/world<adj>/world<n><sg>$[][][</div>]
Output: [<div>][{<i>}]^Hello/Hello<ij>$[] [{<b>}]^world/world<adj>/world<n><sg>$[][][</div>]
</pre>
</pre>


===Tagger stage===
===tagger stage===
<pre>
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.prob'
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob'

Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
</pre>
</pre>


===Pretransfer stage===
===pretransfer stage===
<pre>
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.prob' | apertium-pretransfer
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer

Output:[<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
Output:[<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
</pre>
</pre>


===Transfer(chunker) stage===
===transfer(chunker) stage===
<pre>
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.prob' | apertium-pretransfer | apertium-transfer -n '/home/speedy/FLAME/Apertium/apertium-en-es/apertium-en-es.en-es.genitive.t1x' '/home/speedy/FLAME/Apertium/apertium-en-es/en-es.genitive.bin' 2> /dev/null
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin'

Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]
</pre>
</pre>


===lt-proc(auto-bilingual) stage===
===
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin'


Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]
</pre>

===lrx-proc(auto-lexical) stage===
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin'

Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]
</pre>

===transfer stage===
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' \
| apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x' 'DIR/apertium-en-es/en-es.t1x.bin'

Output: [<div>]^default<default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><ND>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]
</pre>

===interchunk stage===
<pre>
Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x' 'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIRApertium/apertium-en-es/en-es.autolex.bin' \
| apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x' 'DIR/apertium-en-es/en-es.t1x.bin' \
| apertium-interchunk 'DIR/apertium-en-es/apertium-en-es.en-es.t2x' 'DIR/apertium-en-es/en-es.t2x.bin'

Output: [<div>]^default< default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><sg>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]
</pre>




Line 56: Line 120:
All the pretransfer tests pass here.
All the pretransfer tests pass here.


===Taransfer(chunker)===
===Transfer(chunker)===
* Task: Fixing a memory bug which raises due to uncommenting of '''apertium/transfer.cc:1259 // delete[] format;'''
* Task: Fixing a memory bug which raises due to uncommenting of '''apertium/transfer.cc:1259 // delete[] format;'''
* Category: system bug
* Category: system bug
Line 91: Line 155:
* PR: Not made a PR because of need to test some more edge cases, but the entire work is at '''https://github.com/SilentFlame/apertium/blob/master/deformatter.cpp'''.
* PR: Not made a PR because of need to test some more edge cases, but the entire work is at '''https://github.com/SilentFlame/apertium/blob/master/deformatter.cpp'''.
* Tests: https://github.com/SilentFlame/apertium/tree/master/tests/deformatter
* Tests: https://github.com/SilentFlame/apertium/tree/master/tests/deformatter
All the tests run without fail and the run command is '''$pytyhon tests/run_test.py''' inside the apertium folder.
All the tests run without fail and the run command is '''$python tests/run_test.py''' inside the apertium folder.


===Reformatters===
===Reformatters===
Line 98: Line 162:
* PR: Not made a PR because of need to test some more edge cases, but the entire work is at '''https://github.com/SilentFlame/apertium/blob/master/reformatter.cpp'''.
* PR: Not made a PR because of need to test some more edge cases, but the entire work is at '''https://github.com/SilentFlame/apertium/blob/master/reformatter.cpp'''.
* Tests: https://github.com/SilentFlame/apertium/tree/master/tests/reformatter
* Tests: https://github.com/SilentFlame/apertium/tree/master/tests/reformatter
All the tests run without fail and the run command is '''$pytyhon tests/run_test.py''' inside the apertium folder.
All the tests run without fail and the run command is '''$python tests/run_test.py''' inside the apertium folder.


===lttoolbox===
===lttoolbox===
Line 106: Line 170:
* Tests: Made a new file as per the tests present in transfer, pretransfer and other modules at https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing/tests/lt_proc
* Tests: Made a new file as per the tests present in transfer, pretransfer and other modules at https://github.com/SilentFlame/lttoolbox/tree/lt-proc_testing/tests/lt_proc
All the above tests for lt-proc passes with the updated module.
All the above tests for lt-proc passes with the updated module.

==All commits==
https://apertium.projectjj.com/gsoc2017/silentflame.html

Latest revision as of 08:02, 4 September 2017

For the work done at Progress regarding Automatic_blank_handling

Tasks completed[edit]

deformatting prototypes[edit]

pretransfer[edit]

transfer (chunker)[edit]

Interchunk[edit]

Deformatters[edit]

Reformatters[edit]

lttoolbox[edit]

Tasks left to do[edit]

postchunk[edit]

hfst[edit]

  • Make hfst-proc correctly disperse inline blanks onto each lexical unit until the next [

transfer (non-chunking)[edit]

Input and Output at different stages/modes[edit]

Input: "<div><i>Hello</i> <b>world</b></div>" 
Testing this input on the entire pipeline.
  • The DIR/DIRECTORY in the below commands refers to the directory address where you have your language pair compiled. Here I have used apertium-en-es language pair.

deformatter stage[edit]

Command: $ echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml

Output: [<div>][{<i>}]Hello[] [{<b>}]world[][][</div>]

lt-proc(automorph) stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIRECTORY/apertium-en-es/en-es.automorf.bin'

Output: [<div>][{<i>}]^Hello/Hello<ij>$[] [{<b>}]^world/world<adj>/world<n><sg>$[][][</div>]

tagger stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob'

Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]

pretransfer stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer

Output:[<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]

transfer(chunker) stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin'

Output: [<div>][{<i>}]^Hello<ij>$[] [{<b>}]^world<adj>$[][][</div>]

lt-proc(auto-bilingual) stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin'

Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]

lrx-proc(auto-lexical) stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' 

Output: [<div>][{<i>}]^Hello<ij>/Hola<ij>$[] [{<b>}]^world<adj>/mundial<adj><mf>$[][][</div>]

transfer stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIR/apertium-en-es/en-es.autolex.bin' \
| apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x'  'DIR/apertium-en-es/en-es.t1x.bin'

Output: [<div>]^default<default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><ND>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]

interchunk stage[edit]

Command: echo "<div><i>Hello</i> <b>world</b></div>" |./deshtml | lt-proc 'DIR/apertium-en-es/en-es.automorf.bin' | apertium-tagger -g $2 'DIR/apertium-en-es/en-es.prob' \
| apertium-pretransfer | apertium-transfer -n 'DIR/apertium-en-es/apertium-en-es.en-es.genitive.t1x'  'DIR/apertium-en-es/en-es.genitive.bin' \
| lt-proc -b 'DIR/apertium-en-es/en-es.autobil.bin' | lrx-proc -m 'DIRApertium/apertium-en-es/en-es.autolex.bin' \
| apertium-transfer -b 'DIR/apertium-en-es/apertium-en-es.en-es.t1x'  'DIR/apertium-en-es/en-es.t1x.bin' \
| apertium-interchunk 'DIR/apertium-en-es/apertium-en-es.en-es.t2x'  'DIR/apertium-en-es/en-es.t2x.bin'

Output: [<div>]^default< default>{[{<i>}]^Hola<ij>$}$[] ^Adj<SA><mf><sg>{[{<b>}]^mundial<adj><2><3>$}$[][][</div>]


Tasks done[edit]

Pretransfer[edit]

All the pretransfer tests pass here.

Transfer(chunker)[edit]

All the tests mentioned in https://github.com/SilentFlame/apertium-1/tree/blank-handling/tests/transfer passes with the updated transfer module.

Interchunk[edit]

  • Here removing "pos=1" from a "<b>" still outputs the right inline blank: This is because If given a "freeblank" which is between chunks and not a wordbound/inline blank so we need to treat it differently. let's say for example we have "^SN<sg>{^cheese<n>$}$🍰^SN<sg>{^sale<n>$}$" as an input. and the rule matches those two chunks and has an action " <out> <chunk pos="1" part="whole"/> <b/> <chunk pos="2" part="whole"/> </out> " so if here we treat "<b/>" as just a space then we'll loose "🍰" which won't give much good feel to our users. So to retain this in the output we handled the freeblanks between chunks.
  • Task: Interchunk was needed to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Here in this module we didn't deal with wordblanks, since we can't look inside chunks when in interchunk.
  • Category: Code enhancing
  • PR: https://github.com/unhammer/apertium/pull/6
  • Tests: https://github.com/SilentFlame/apertium-1/tree/blank-handling-interchunk/tests/interchunk

All tests mentioned in https://github.com/SilentFlame/apertium-1/blob/blank-handling-interchunk/tests/interchunk/__init__.py passes with the updated interchunk module.

Deformatters[edit]

All the tests run without fail and the run command is $python tests/run_test.py inside the apertium folder.

Reformatters[edit]

All the tests run without fail and the run command is $python tests/run_test.py inside the apertium folder.

lttoolbox[edit]

All the above tests for lt-proc passes with the updated module.

All commits[edit]

https://apertium.projectjj.com/gsoc2017/silentflame.html