User:SilentFlame/Progress

From Apertium
Jump to navigation Jump to search

Progress on Automatic_blank_handling

Current task[edit]

lttoolbox[edit]


TODO[edit]

hfst[edit]

transfer (non-chunking)[edit]

  • Test if current transfer.cc handles non-chunking/single-stage transfer correctly, if not, fix
  • Task: PR to https://github.com/unhammer/apertium/ with tests showing working transfer.cc for single-stage/non-chunking transfer, with inline vs block-level blank handling and test that rules using misnumbered/missing b-elements should not mess up formatting

postchunk[edit]

(Should be done after interchunk is complete)

  • Task: PR to https://github.com/unhammer/apertium/ including tests showing working postchunk blank handling – test that rules using misnumbered/missing b-elements should not mess up formatting

etc[edit]

  • Ensure all other modules are fine with the new format for inline blanks (e.g. cg-proc)
  • Work on other deformatters (mediawiki? latex?)

Done[edit]

(Some of these are from coding challenges)

deformatting prototypes[edit]

  1. Make the HTML format handler apertium-deshtml turn "<i>foo <b>bar</b></i>" into "[{<i>}]foo [{<i><b>}]bar"

pretransfer[edit]

transfer (chunker)[edit]

  1. Fix a memory bug
    • uncommenting apertium/transfer.cc:1259 // delete[] format; in the blank handling branch leads to a double-free – find out why and ensure we're correctly releasing memory
    • Install valgrind from your package manager or http://valgrind.org/, then compile your program with -O0 -g3, then run valgrind -v --leak-check=full apertium/apertium-transfer and read the output

Interchunk[edit]

Interchunk needs to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Interchunk shouldn't have to deal with wordblanks, since we can't look inside chunks when in interchunk.

  1. Apply changes to transfer.cc to interchunk.cc

Deformatters[edit]

Reformatters[edit]