User:Pmodi/GSOC 2020 proposal: Hindi-Punjabi/progress

From Apertium
< User:Pmodi‎ | GSOC 2020 proposal: Hindi-Punjabi
Revision as of 20:29, 9 May 2020 by Pmodi (talk | contribs) (Created page with "Progress on Automatic_blank_handling ==Current task== ===lttoolbox=== * Make lt-proc correctly disperse inline bl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Progress on Automatic_blank_handling

Current task




transfer (non-chunking)

  • Test if current handles non-chunking/single-stage transfer correctly, if not, fix
  • Task: PR to with tests showing working for single-stage/non-chunking transfer, with inline vs block-level blank handling and test that rules using misnumbered/missing b-elements should not mess up formatting


(Should be done after interchunk is complete)

  • Task: PR to including tests showing working postchunk blank handling – test that rules using misnumbered/missing b-elements should not mess up formatting


  • Ensure all other modules are fine with the new format for inline blanks (e.g. cg-proc)
  • Work on other deformatters (mediawiki? latex?)


(Some of these are from coding challenges)

deformatting prototypes

  1. Make the HTML format handler apertium-deshtml turn "<i>foo <b>bar</b></i>" into "[{<i>}]foo [{<i><b>}]bar"


transfer (chunker)

  1. Fix a memory bug
    • uncommenting apertium/ // delete[] format; in the blank handling branch leads to a double-free – find out why and ensure we're correctly releasing memory
    • Install valgrind from your package manager or, then compile your program with -O0 -g3, then run valgrind -v --leak-check=full apertium/apertium-transfer and read the output


Interchunk needs to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Interchunk shouldn't have to deal with wordblanks, since we can't look inside chunks when in interchunk.

  1. Apply changes to to