User:Pmodi/GSOC 2020 proposal: Hindi-Punjabi/progress

From Apertium
< User:Pmodi‎ | GSOC 2020 proposal: Hindi-Punjabi
Revision as of 20:29, 9 May 2020 by Pmodi (talk | contribs) (Created page with "Progress on Automatic_blank_handling ==Current task== ===lttoolbox=== * Make lt-proc correctly disperse inline bl...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Progress on Automatic_blank_handling

Current task

lttoolbox


TODO

hfst

transfer (non-chunking)

  • Test if current transfer.cc handles non-chunking/single-stage transfer correctly, if not, fix
  • Task: PR to https://github.com/unhammer/apertium/ with tests showing working transfer.cc for single-stage/non-chunking transfer, with inline vs block-level blank handling and test that rules using misnumbered/missing b-elements should not mess up formatting

postchunk

(Should be done after interchunk is complete)

  • Task: PR to https://github.com/unhammer/apertium/ including tests showing working postchunk blank handling – test that rules using misnumbered/missing b-elements should not mess up formatting

etc

  • Ensure all other modules are fine with the new format for inline blanks (e.g. cg-proc)
  • Work on other deformatters (mediawiki? latex?)

Done

(Some of these are from coding challenges)

deformatting prototypes

  1. Make the HTML format handler apertium-deshtml turn "<i>foo <b>bar</b></i>" into "[{<i>}]foo [{<i><b>}]bar"

pretransfer

transfer (chunker)

  1. Fix a memory bug
    • uncommenting apertium/transfer.cc:1259 // delete[] format; in the blank handling branch leads to a double-free – find out why and ensure we're correctly releasing memory
    • Install valgrind from your package manager or http://valgrind.org/, then compile your program with -O0 -g3, then run valgrind -v --leak-check=full apertium/apertium-transfer and read the output

Interchunk

Interchunk needs to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Interchunk shouldn't have to deal with wordblanks, since we can't look inside chunks when in interchunk.

  1. Apply changes to transfer.cc to interchunk.cc

Deformatters

Reformatters