User:Pmodi/GSOC 2020 proposal: Hindi-Punjabi/progress
< User:Pmodi | GSOC 2020 proposal: Hindi-Punjabi
Jump to navigation
Jump to search
Revision as of 20:29, 9 May 2020 by Pmodi (talk | contribs) (Created page with "Progress on Automatic_blank_handling ==Current task== ===lttoolbox=== * Make lt-proc correctly disperse inline bl...")
Progress on Automatic_blank_handling
Contents
Current task
lttoolbox
- Make lt-proc correctly disperse inline blanks onto each lexical unit until the next
[
- Task: Create a pull request to https://github.com/unhammer/lttoolbox/ with tests in https://github.com/unhammer/lttoolbox/tree/master/tests/lt_proc
TODO
hfst
- Make hfst-proc correctly disperse inline blanks onto each lexical unit until the next
[
- Task: Create a pull request to https://github.com/hfst/hfst/ with tests in https://github.com/hfst/hfst/tree/master/test/tools/
transfer (non-chunking)
- Test if current transfer.cc handles non-chunking/single-stage transfer correctly, if not, fix
- Task: PR to https://github.com/unhammer/apertium/ with tests showing working transfer.cc for single-stage/non-chunking transfer, with inline vs block-level blank handling and test that rules using misnumbered/missing b-elements should not mess up formatting
postchunk
(Should be done after interchunk is complete)
- Task: PR to https://github.com/unhammer/apertium/ including tests showing working postchunk blank handling – test that rules using misnumbered/missing b-elements should not mess up formatting
etc
- Ensure all other modules are fine with the new format for inline blanks (e.g. cg-proc)
- Work on other deformatters (mediawiki? latex?)
Done
(Some of these are from coding challenges)
deformatting prototypes
- Make the HTML format handler
apertium-deshtml
turn "<i>foo <b>bar</b></i>" into "[{<i>}]foo [{<i><b>}]bar"- Code at https://github.com/SilentFlame/apertium/blob/master/challenge-1.cpp
- make
apertium-deshtml
*not* wrap tags like<p>
or<div>
in{}
(ie. only for inline tags) - Code at https://github.com/SilentFlame/apertium/blob/master/challenge-2.cpp
pretransfer
- Make pretransfer disperse tags when splitting lexical units https://github.com/unhammer/apertium/commit/39bd7d9fa45c64586d3a9b0f1a7df89e7d007c1a , code cleanup:
- Fork https://github.com/unhammer/apertium and check out and compile the
master
branch - then in a different folder, do
git clone -b blank-handling https://github.com/junaidiiith/apertium
- from junaidiiith/blank-handling, copy over the changes that were made there to apertium_pretransfer.cc into your fork of unhammer/apertium, along with the pretransfer tests
- ensure tests pass
- PR at https://github.com/unhammer/apertium/pull/4
- Fork https://github.com/unhammer/apertium and check out and compile the
transfer (chunker)
- Fix a memory bug
- uncommenting apertium/transfer.cc:1259
// delete[] format;
in the blank handling branch leads to a double-free – find out why and ensure we're correctly releasing memory - Install valgrind from your package manager or http://valgrind.org/, then compile your program with -O0 -g3, then run
valgrind -v --leak-check=full apertium/apertium-transfer
and read the output
- uncommenting apertium/transfer.cc:1259
Interchunk
Interchunk needs to ignore the "pos" argument to b elements, and output each superblank exactly once, preferably where the rule has a b element (if there are not enough b's, output the rest at the end of the rule). Interchunk shouldn't have to deal with wordblanks, since we can't look inside chunks when in interchunk.
- Apply changes to transfer.cc to interchunk.cc
- Check
git clone -b blank-handling https://github.com/unhammer/apertium
- Apply the
git diff 4c7c4f8f1b..2025182991
from transfer.cc to interchunk.cc - Try to make it compile and run – report things that didn't seem to have a 1-1 correspondence
- Write tests for interchunk, like those for transfer at https://github.com/unhammer/apertium/tree/blank-handling/tests
- Check
Deformatters
- Complete prototype HTML deformatters
- Current prototype code at https://github.com/junaidiiith/apertium / https://github.com/junaidiiith/Apertium_Code and https://github.com/SilentFlame/apertium/
- Task: Create a clean pull request to https://github.com/unhammer with HTML deformatter and reformatter, including tests
Reformatters
- Make reformat turn inline-blanks back into real tags
- [{<i>}]foo [{<i><b>}]bar should become <i>foo</i> <i><b>bar</b></i>
- prototypes exist for this in https://github.com/junaidiiith/apertium / https://github.com/junaidiiith/Apertium_Code