Difference between revisions of "User:Khannatanmai/GSoC2020Progress"
Jump to navigation
Jump to search
Khannatanmai (talk | contribs) |
Khannatanmai (talk | contribs) |
||
(26 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
= To Do = |
= To Do = |
||
== Phase |
== Phase 3 (July 31 - August 24) == |
||
<strong style="color:maroon;font-size:1.5em;>All Done :)</strong> |
|||
⚫ | |||
⚫ | |||
= Ongoing = |
= Ongoing = |
||
== Phase |
== Phase 3 (July 31 - August 24) == |
||
<strong style="color:maroon;font-size:1.5em;>All Done :)</strong> |
|||
⚫ | |||
= Completed = |
= Completed = |
||
Line 55: | Line 52: | ||
* Modify pretransfer to split wordbound blanks |
* Modify pretransfer to split wordbound blanks |
||
* Write tests for pretransfer blank handling |
* Write tests for pretransfer blank handling |
||
⚫ | |||
* Add tests to -separable for wordbound blank handling |
|||
* Make lt-proc parse wordbound blanks as normal blanks correctly for analyser, generator, biltrans, and post generator |
|||
* Add tests for lt-proc analysis of wordbound blanks |
|||
* Add test for anaphora handling wordbound blanks |
|||
* Handle wordbound blanks in apertium streamparser. |
|||
* Add feature in transfer and postchunk so that it outputs the wordbound blank automatically if there's only one LU in the matching pattern. |
|||
* Wordbound blank handling in postgeneration as it has many-many rules |
|||
* Tests for wordbound blank handling in postgeneration |
|||
* Changes in separable for wblank handling to work with revautoseq as well. |
|||
== Phase 3 (July 31 - August 24) == |
|||
* Parse wblanks and store with LU in -recursive |
|||
* Get wblanks in recursive output using parallel wblank stack that mimics main stack operations |
|||
* Add tests in recursive |
|||
* Fix wblank handling with XML TRX rule files |
|||
* Wordbound blank handling with variables in recursive |
|||
* Test wblanks from variables and MLU wblanks in recursive |
|||
* Modify apertium-tagger to parse wblanks as normal blanks |
|||
* Modify hfst-proc to parse wblanks as normal blanks (analysis, generation) |
|||
* Fix this error: http://codepad.org/yU4uaSNX (transfer error in afr-nld) - used old transfer, convert to new |
|||
* Fix wblank printing error in pairs that use t4x |
|||
⚫ | |||
* [[User:Khannatanmai/GSoC2020_Final_Report|Final report]] for GSoC 2020 |
|||
⚫ | |||
* Use transfuse in apy, wikimedia translations |
|||
* Fix wblank printing with null flush |
|||
* Need to modify super blank handling in chunker so that user doesn't have to worry about blank position anymore. |
|||
* Need to modify super blank handling in interchunk so that user doesn't have to worry about blank position anymore. |
|||
* Need to modify super blank handling in postchunk so that user doesn't have to worry about blank position anymore. |
Latest revision as of 04:29, 9 September 2020
Work Plan: http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming#Work_Plan
Contents
To Do[edit]
Phase 3 (July 31 - August 24)[edit]
All Done :)
Ongoing[edit]
Phase 3 (July 31 - August 24)[edit]
All Done :)
Completed[edit]
Application Review Period (March 31 - May 3)[edit]
- Compile all the discussion about the modification to the stream format (in talk pages)
- Create dedicated page for the development of the new stream format: User:Khannatanmai/New_Apertium_stream_format
- Going through the documentation again and reading the wikis for each module just to ensure I haven't missed anything in the overall working of Apertium as I've never really made a language pair.
- http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format : Document modification to Apertium stream format (see talk pages for relevant discussion)
- Document how much change is needed in which parsers and what the change is
- Proof of Concept for the new format
- Document the change needed in tokeniser, bidix lookup, and generation to include surface form: User:Khannatanmai/Eliminating_Dictionary_Trimming
- Document all the proposed benefits with including secondary information
Community Bonding Period (May 4 - June 1)[edit]
- Create a suitable development and debugging environment for the pipe (Xcode)
- Modifying transfer to pass secondary tags ahead. Updates can be found here.
- Modify generator to ignore secondary tags while matching
- Deal with MLUs in generator, and special characters in sectags, etc.
- Analyse the code of the parsers of the modules
- Fix transfer behaviour with LUs with invariable parts and MLUs
- Need to deal with sec tags appearing before lemq if lemq comes from variable
- Wiki for all features being implemented for secondary tags here.
- Testcase: lemq comes from variable
- Create test t1x file which covers all test cases.
- Run thorough regression tests on eng-spa (multi stage transfer) and spa-cat(single stage transfer)
- Manually insert secondary tags in the stream and test if they reach the generator
- Prepare an alternate proposal to secondary tags: User:Khannatanmai/Alternate_stream_modification
Phase 1 (June 1 - July 4)[edit]
- Deal with the community's objections to secondary tags.
- Come up with a method everyone is happy with
- Analyse the needs of WikiMedia's markup handling.
- New page for the development of word bound blanks: User:Khannatanmai/Wordbound_blanks
- Add tests an examples which have merging, splitting, deletions, insertions, etc.
- Changed formalism so that wordbound blanks are now before an LU
- Modify chunker to deal with wordbound blanks
- Write tests for the chunker
Phase 2 (July 3 - July 27)[edit]
- Make sure regression tests show no regression
- Modify interchunk and postchunk to deal with wordbound blanks
- Write tests for chunker, interchunk, postchunk blank handling
- Modify pretransfer to split wordbound blanks
- Write tests for pretransfer blank handling
- Deal with separable and merge blanks when multiwords are formed
- Add tests to -separable for wordbound blank handling
- Make lt-proc parse wordbound blanks as normal blanks correctly for analyser, generator, biltrans, and post generator
- Add tests for lt-proc analysis of wordbound blanks
- Add test for anaphora handling wordbound blanks
- Handle wordbound blanks in apertium streamparser.
- Add feature in transfer and postchunk so that it outputs the wordbound blank automatically if there's only one LU in the matching pattern.
- Wordbound blank handling in postgeneration as it has many-many rules
- Tests for wordbound blank handling in postgeneration
- Changes in separable for wblank handling to work with revautoseq as well.
Phase 3 (July 31 - August 24)[edit]
- Parse wblanks and store with LU in -recursive
- Get wblanks in recursive output using parallel wblank stack that mimics main stack operations
- Add tests in recursive
- Fix wblank handling with XML TRX rule files
- Wordbound blank handling with variables in recursive
- Test wblanks from variables and MLU wblanks in recursive
- Modify apertium-tagger to parse wblanks as normal blanks
- Modify hfst-proc to parse wblanks as normal blanks (analysis, generation)
- Fix this error: http://codepad.org/yU4uaSNX (transfer error in afr-nld) - used old transfer, convert to new
- Fix wblank printing error in pairs that use t4x
- Test if wordbound blanks go through the pipe properly in all pairs
- Final report for GSoC 2020
- Proper error handling of wordbound blanks
- Use transfuse in apy, wikimedia translations
- Fix wblank printing with null flush
- Need to modify super blank handling in chunker so that user doesn't have to worry about blank position anymore.
- Need to modify super blank handling in interchunk so that user doesn't have to worry about blank position anymore.
- Need to modify super blank handling in postchunk so that user doesn't have to worry about blank position anymore.