Latest revision as of 04:29, 9 September 2020

Work Plan: http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming#Work_Plan

To Do[edit]

Phase 3 (July 31 - August 24)[edit]

All Done :)

Ongoing[edit]

Phase 3 (July 31 - August 24)[edit]

All Done :)

Completed[edit]

Application Review Period (March 31 - May 3)[edit]

Compile all the discussion about the modification to the stream format (in talk pages)
Create dedicated page for the development of the new stream format: User:Khannatanmai/New_Apertium_stream_format
Going through the documentation again and reading the wikis for each module just to ensure I haven't missed anything in the overall working of Apertium as I've never really made a language pair.
http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format : Document modification to Apertium stream format (see talk pages for relevant discussion)
Document how much change is needed in which parsers and what the change is
Proof of Concept for the new format
Document the change needed in tokeniser, bidix lookup, and generation to include surface form: User:Khannatanmai/Eliminating_Dictionary_Trimming
Document all the proposed benefits with including secondary information

Community Bonding Period (May 4 - June 1)[edit]

Create a suitable development and debugging environment for the pipe (Xcode)
Modifying transfer to pass secondary tags ahead. Updates can be found here.
Modify generator to ignore secondary tags while matching
Deal with MLUs in generator, and special characters in sectags, etc.
Analyse the code of the parsers of the modules
Fix transfer behaviour with LUs with invariable parts and MLUs
Need to deal with sec tags appearing before lemq if lemq comes from variable
Wiki for all features being implemented for secondary tags here.
Testcase: lemq comes from variable
Create test t1x file which covers all test cases.
Run thorough regression tests on eng-spa (multi stage transfer) and spa-cat(single stage transfer)
Manually insert secondary tags in the stream and test if they reach the generator
Prepare an alternate proposal to secondary tags: User:Khannatanmai/Alternate_stream_modification

Phase 1 (June 1 - July 4)[edit]

Deal with the community's objections to secondary tags.
Come up with a method everyone is happy with
Analyse the needs of WikiMedia's markup handling.
New page for the development of word bound blanks: User:Khannatanmai/Wordbound_blanks
Add tests an examples which have merging, splitting, deletions, insertions, etc.
Changed formalism so that wordbound blanks are now before an LU
Modify chunker to deal with wordbound blanks
Write tests for the chunker

Phase 2 (July 3 - July 27)[edit]

Make sure regression tests show no regression
Modify interchunk and postchunk to deal with wordbound blanks
Write tests for chunker, interchunk, postchunk blank handling
Modify pretransfer to split wordbound blanks
Write tests for pretransfer blank handling
Deal with separable and merge blanks when multiwords are formed
Add tests to -separable for wordbound blank handling
Make lt-proc parse wordbound blanks as normal blanks correctly for analyser, generator, biltrans, and post generator
Add tests for lt-proc analysis of wordbound blanks
Add test for anaphora handling wordbound blanks
Handle wordbound blanks in apertium streamparser.
Add feature in transfer and postchunk so that it outputs the wordbound blank automatically if there's only one LU in the matching pattern.
Wordbound blank handling in postgeneration as it has many-many rules
Tests for wordbound blank handling in postgeneration
Changes in separable for wblank handling to work with revautoseq as well.

Phase 3 (July 31 - August 24)[edit]

Parse wblanks and store with LU in -recursive
Get wblanks in recursive output using parallel wblank stack that mimics main stack operations
Add tests in recursive
Fix wblank handling with XML TRX rule files
Wordbound blank handling with variables in recursive
Test wblanks from variables and MLU wblanks in recursive
Modify apertium-tagger to parse wblanks as normal blanks
Modify hfst-proc to parse wblanks as normal blanks (analysis, generation)
Fix this error: http://codepad.org/yU4uaSNX (transfer error in afr-nld) - used old transfer, convert to new
Fix wblank printing error in pairs that use t4x
Test if wordbound blanks go through the pipe properly in all pairs
Final report for GSoC 2020
Proper error handling of wordbound blanks
Use transfuse in apy, wikimedia translations
Fix wblank printing with null flush
Need to modify super blank handling in chunker so that user doesn't have to worry about blank position anymore.
Need to modify super blank handling in interchunk so that user doesn't have to worry about blank position anymore.
Need to modify super blank handling in postchunk so that user doesn't have to worry about blank position anymore.

@@ Line 2: / Line 2: @@
 = To Do =
-== Phase 2 (July 3 - July 27) ==
+== Phase 3 (July 31 - August 24) ==
+<strong style="color:maroon;font-size:1.5em;>All Done :)</strong>
-* Test if wordbound blanks go through the pipe properly.
-* Modify pretransfer to split wordbound blanks
 = Ongoing =
-== Phase 2 (July 3 - July 27) ==
+== Phase 3 (July 31 - August 24) ==
+<strong style="color:maroon;font-size:1.5em;>All Done :)</strong>
-* Modify interchunk and postchunk to deal with wordbound blanks
-* Write tests for chunker, interchunk, postchunk blank handling
 = Completed =
@@ Line 51: / Line 48: @@
 == Phase 2 (July 3 - July 27) ==
 * Make sure regression tests show no regression
+* Modify interchunk and postchunk to deal with wordbound blanks
+* Write tests for chunker, interchunk, postchunk blank handling
+* Modify pretransfer to split wordbound blanks
+* Write tests for pretransfer blank handling
+* Deal with separable and merge blanks when multiwords are formed
+* Add tests to -separable for wordbound blank handling
+* Make lt-proc parse wordbound blanks as normal blanks correctly for analyser, generator, biltrans, and post generator
+* Add tests for lt-proc analysis of wordbound blanks
+* Add test for anaphora handling wordbound blanks
+* Handle wordbound blanks in apertium streamparser.
+* Add feature in transfer and postchunk so that it outputs the wordbound blank automatically if there's only one LU in the matching pattern.
+* Wordbound blank handling in postgeneration as it has many-many rules
+* Tests for wordbound blank handling in postgeneration
+* Changes in separable for wblank handling to work with revautoseq as well.
+== Phase 3 (July 31 - August 24) ==
+* Parse wblanks and store with LU in -recursive
+* Get wblanks in recursive output using parallel wblank stack that mimics main stack operations
+* Add tests in recursive
+* Fix wblank handling with XML TRX rule files
+* Wordbound blank handling with variables in recursive
+* Test wblanks from variables and MLU wblanks in recursive
+* Modify apertium-tagger to parse wblanks as normal blanks
+* Modify hfst-proc to parse wblanks as normal blanks (analysis, generation)
+* Fix this error: http://codepad.org/yU4uaSNX (transfer error in afr-nld) - used old transfer, convert to new
+* Fix wblank printing error in pairs that use t4x
+* Test if wordbound blanks go through the pipe properly in all pairs
+* [[User:Khannatanmai/GSoC2020_Final_Report|Final report]] for GSoC 2020
+* Proper error handling of wordbound blanks
+* Use transfuse in apy, wikimedia translations
+* Fix wblank printing with null flush
+* Need to modify super blank handling in chunker so that user doesn't have to worry about blank position anymore.
+* Need to modify super blank handling in interchunk so that user doesn't have to worry about blank position anymore.
+* Need to modify super blank handling in postchunk so that user doesn't have to worry about blank position anymore.

Difference between revisions of "User:Khannatanmai/GSoC2020Progress"

Latest revision as of 04:29, 9 September 2020

Contents

To Do[edit]

Phase 3 (July 31 - August 24)[edit]

Ongoing[edit]

Phase 3 (July 31 - August 24)[edit]

Completed[edit]

Application Review Period (March 31 - May 3)[edit]

Community Bonding Period (May 4 - June 1)[edit]

Phase 1 (June 1 - July 4)[edit]

Phase 2 (July 3 - July 27)[edit]

Phase 3 (July 31 - August 24)[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools