User:SilentFlame/Progress
Jump to navigation
Jump to search
Progress on Automatic_blank_handling
Contents
Current task
Interchunk
- Apply changes to transfer.cc to interchunk.cc
- Check
git clone -b blank-handling https://github.com/unhammer/apertium
- Apply the diff (between that branch and master) from transfer.cc to interchunk.cc
- Try to make it compile and run – report things that didn't seem to have a 1-1 correspondence
- Write tests for interchunk, like those for transfer at https://github.com/unhammer/apertium/tree/blank-handling/tests
- Check
TODO
Deformatters
- Complete prototype HTML deformatters
- Current prototype code at https://github.com/junaidiiith/apertium / https://github.com/junaidiiith/Apertium_Code and https://github.com/SilentFlame/apertium/
- Task: Create a clean pull request to https://github.com/unhammer with HTML deformatter and reformatter, including tests
Reformatters
- Make reformat turn inline-blanks back into real tags
- [{<i>}]foo [{<i><b>}]bar should become <i>foo</i> <i><b>bar</b></i>
- prototypes exist for this in https://github.com/junaidiiith/apertium / https://github.com/junaidiiith/Apertium_Code
lttoolbox
- Make lt-proc correctly disperse inline blanks onto each lexical unit until the next
[
- Task: Create a pull request to https://github.com/unhammer/lttoolbox/ with tests in https://github.com/unhammer/lttoolbox/tree/master/tests/lt_proc
transfer (non-chunking)
- Test if current transfer.cc handles non-chunking/single-stage transfer correctly, if not, fix
- Task: PR to https://github.com/unhammer/apertium/ with tests showing working transfer.cc for single-stage/non-chunking transfer, with inline vs block-level blank handling and test that rules using misnumbered/missing b-elements should not mess up formatting
postchunk
(Should be done after interchunk is complete)
- Task: PR to https://github.com/unhammer/apertium/ including tests showing working postchunk blank handling – test that rules using wrong/missing b-elements should not mess up formatting
etc
- Ensure all other modules are fine with the new format for inline blanks
- Work on other deformatters (mediawiki? latex?)
Done
(Some of these are from coding challenges)
deformatting prototypes
- Make the HTML format handler
apertium-deshtml
turn "<i>foo <b>bar</b></i>" into "[{<i>}]foo [{<i><b>}]bar"- Code at https://github.com/SilentFlame/apertium/blob/master/challenge-1.cpp
- make
apertium-deshtml
*not* wrap tags like<p>
or<div>
in{}
(ie. only for inline tags) - Code at https://github.com/SilentFlame/apertium/blob/master/challenge-2.cpp
pretransfer
- Make pretransfer disperse tags when splitting lexical units https://github.com/unhammer/apertium/commit/39bd7d9fa45c64586d3a9b0f1a7df89e7d007c1a , code cleanup:
- Fork https://github.com/unhammer/apertium and check out and compile the
master
branch - then in a different folder, do
git clone -b blank-handling https://github.com/junaidiiith/apertium
- from junaidiiith/blank-handling, copy over the changes that were made there to apertium_pretransfer.cc into your fork of unhammer/apertium, along with the pretransfer tests
- ensure tests pass
- PR at https://github.com/unhammer/apertium/pull/4
- Fork https://github.com/unhammer/apertium and check out and compile the
transfer (chunker)
- Fix a memory bug
- uncommenting apertium/transfer.cc:1259
// delete[] format;
in the blank handling branch leads to a double-free – find out why and ensure we're correctly releasing memory - Install valgrind from your package manager or http://valgrind.org/, then compile your program with -O0 -g3, then run
valgrind -v --leak-check=full apertium/apertium-transfer
and read the output
- uncommenting apertium/transfer.cc:1259