From Apertium
< User:SilentFlame
Revision as of 20:11, 16 July 2017 by SilentFlame (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Progress on Automatic_blank_handling

Current task




transfer (non-chunking)

  • Test if current handles non-chunking/single-stage transfer correctly, if not, fix
  • Task: PR to with tests showing working for single-stage/non-chunking transfer, with inline vs block-level blank handling and test that rules using misnumbered/missing b-elements should not mess up formatting


(Should be done after interchunk is complete)

  • Task: PR to including tests showing working postchunk blank handling – test that rules using misnumbered/missing b-elements should not mess up formatting


  • Ensure all other modules are fine with the new format for inline blanks (e.g. cg-proc)
  • Work on other deformatters (mediawiki? latex?)


(Some of these are from coding challenges)

deformatting prototypes

  1. Make the HTML format handler apertium-deshtml turn "<i>foo <b>bar</b></i>" into "[{<i>}]foo [{<i><b>}]bar"


transfer (chunker)

  1. Fix a memory bug
    • uncommenting apertium/ // delete[] format; in the blank handling branch leads to a double-free – find out why and ensure we're correctly releasing memory
    • Install valgrind from your package manager or, then compile your program with -O0 -g3, then run valgrind -v --leak-check=full apertium/apertium-transfer and read the output


Interchunk needs to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Interchunk shouldn't have to deal with wordblanks, since we can't look inside chunks when in interchunk.

  1. Apply changes to to