Difference between revisions of "User:Khannatanmai/GSoC2020Progress"
Jump to navigation
Jump to search
Khannatanmai (talk | contribs) |
Khannatanmai (talk | contribs) |
||
(26 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
= To Do = |
= To Do = |
||
− | == |
+ | == Phase 2 (July 3 - July 27) == |
+ | * Proper error handling of wordbound blanks |
||
+ | * Deal with separable and merge blanks when multiwords are formed |
||
= Ongoing = |
= Ongoing = |
||
− | == |
+ | == Phase 2 (July 3 - July 27) == |
+ | * Test if wordbound blanks go through the pipe properly. |
||
⚫ | |||
+ | |||
+ | |||
= Completed = |
= Completed = |
||
Line 17: | Line 21: | ||
* Proof of Concept for the new format |
* Proof of Concept for the new format |
||
* Document the change needed in tokeniser, bidix lookup, and generation to include surface form: [[User:Khannatanmai/Eliminating_Dictionary_Trimming]] |
* Document the change needed in tokeniser, bidix lookup, and generation to include surface form: [[User:Khannatanmai/Eliminating_Dictionary_Trimming]] |
||
⚫ | |||
+ | |||
+ | |||
+ | == Community Bonding Period (May 4 - June 1) == |
||
+ | * Create a suitable development and debugging environment for the pipe (Xcode) |
||
+ | * Modifying transfer to pass secondary tags ahead. Updates can be found [https://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format#Progress here]. |
||
+ | * Modify generator to ignore secondary tags while matching |
||
+ | * Deal with MLUs in generator, and special characters in sectags, etc. |
||
+ | * Analyse the code of the parsers of the modules |
||
+ | * Fix transfer behaviour with LUs with invariable parts and MLUs |
||
+ | * Need to deal with sec tags appearing before lemq if lemq comes from variable |
||
+ | * '''Wiki for all features being implemented for secondary tags [[User:Khannatanmai/Secondary_tags_features|here]].''' |
||
+ | * Testcase: lemq comes from variable |
||
+ | * Create test t1x file which covers all test cases. |
||
+ | * Run thorough regression tests on eng-spa (multi stage transfer) and spa-cat(single stage transfer) |
||
+ | * Manually insert secondary tags in the stream and test if they reach the generator |
||
+ | * Prepare an alternate proposal to secondary tags: [[User:Khannatanmai/Alternate_stream_modification]] |
||
+ | |||
+ | == Phase 1 (June 1 - July 4) == |
||
+ | * Deal with the community's objections to secondary tags. |
||
+ | * Come up with a method everyone is happy with |
||
+ | * Analyse the needs of WikiMedia's markup handling. |
||
+ | * New page for the development of word bound blanks: [[User:Khannatanmai/Wordbound_blanks]] |
||
+ | * Add tests an examples which have merging, splitting, deletions, insertions, etc. |
||
+ | * Changed formalism so that wordbound blanks are now before an LU |
||
+ | * Modify chunker to deal with wordbound blanks |
||
+ | * Write tests for the chunker |
||
+ | |||
+ | == Phase 2 (July 3 - July 27) == |
||
+ | * Make sure regression tests show no regression |
||
+ | * Modify interchunk and postchunk to deal with wordbound blanks |
||
+ | * Write tests for chunker, interchunk, postchunk blank handling |
||
+ | * Modify pretransfer to split wordbound blanks |
||
+ | * Write tests for pretransfer blank handling |
Revision as of 16:38, 8 July 2020
Work Plan: http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming#Work_Plan
Contents
To Do
Phase 2 (July 3 - July 27)
- Proper error handling of wordbound blanks
- Deal with separable and merge blanks when multiwords are formed
Ongoing
Phase 2 (July 3 - July 27)
- Test if wordbound blanks go through the pipe properly.
Completed
Application Review Period (March 31 - May 3)
- Compile all the discussion about the modification to the stream format (in talk pages)
- Create dedicated page for the development of the new stream format: User:Khannatanmai/New_Apertium_stream_format
- Going through the documentation again and reading the wikis for each module just to ensure I haven't missed anything in the overall working of Apertium as I've never really made a language pair.
- http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format : Document modification to Apertium stream format (see talk pages for relevant discussion)
- Document how much change is needed in which parsers and what the change is
- Proof of Concept for the new format
- Document the change needed in tokeniser, bidix lookup, and generation to include surface form: User:Khannatanmai/Eliminating_Dictionary_Trimming
- Document all the proposed benefits with including secondary information
Community Bonding Period (May 4 - June 1)
- Create a suitable development and debugging environment for the pipe (Xcode)
- Modifying transfer to pass secondary tags ahead. Updates can be found here.
- Modify generator to ignore secondary tags while matching
- Deal with MLUs in generator, and special characters in sectags, etc.
- Analyse the code of the parsers of the modules
- Fix transfer behaviour with LUs with invariable parts and MLUs
- Need to deal with sec tags appearing before lemq if lemq comes from variable
- Wiki for all features being implemented for secondary tags here.
- Testcase: lemq comes from variable
- Create test t1x file which covers all test cases.
- Run thorough regression tests on eng-spa (multi stage transfer) and spa-cat(single stage transfer)
- Manually insert secondary tags in the stream and test if they reach the generator
- Prepare an alternate proposal to secondary tags: User:Khannatanmai/Alternate_stream_modification
Phase 1 (June 1 - July 4)
- Deal with the community's objections to secondary tags.
- Come up with a method everyone is happy with
- Analyse the needs of WikiMedia's markup handling.
- New page for the development of word bound blanks: User:Khannatanmai/Wordbound_blanks
- Add tests an examples which have merging, splitting, deletions, insertions, etc.
- Changed formalism so that wordbound blanks are now before an LU
- Modify chunker to deal with wordbound blanks
- Write tests for the chunker
Phase 2 (July 3 - July 27)
- Make sure regression tests show no regression
- Modify interchunk and postchunk to deal with wordbound blanks
- Write tests for chunker, interchunk, postchunk blank handling
- Modify pretransfer to split wordbound blanks
- Write tests for pretransfer blank handling