Difference between revisions of "User:Khannatanmai/GSoC2020Progress"

From Apertium
Jump to navigation Jump to search
 
(57 intermediate revisions by the same user not shown)
Line 2: Line 2:
   
 
= To Do =
 
= To Do =
== Application Review Period (March 31 - May 3) ==
+
== Phase 3 (July 31 - August 24) ==
  +
<strong style="color:maroon;font-size:1.5em;>All Done :)</strong>
 
* Document how much change is needed in which parsers and what the change is
 
* Document the change needed in tokeniser, bidix lookup, and generation to include surface form
 
   
 
= Ongoing =
 
= Ongoing =
== Application Review Period (March 31 - May 3) ==
+
== Phase 3 (July 31 - August 24) ==
  +
<strong style="color:maroon;font-size:1.5em;>All Done :)</strong>
 
* http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format : Document modification to Apertium stream format (see talk pages for relevant discussion)
 
* Proof of Concept for the new format
 
* Document all the proposed benefits with including secondary information
 
   
 
= Completed =
 
= Completed =
Line 19: Line 14:
 
* Create dedicated page for the development of the new stream format: [[User:Khannatanmai/New_Apertium_stream_format]]
 
* Create dedicated page for the development of the new stream format: [[User:Khannatanmai/New_Apertium_stream_format]]
 
* Going through the documentation again and reading the wikis for each module just to ensure I haven't missed anything in the overall working of Apertium as I've never really made a language pair.
 
* Going through the documentation again and reading the wikis for each module just to ensure I haven't missed anything in the overall working of Apertium as I've never really made a language pair.
 
* http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format : Document modification to Apertium stream format (see talk pages for relevant discussion)
 
* Document how much change is needed in which parsers and what the change is
 
* Proof of Concept for the new format
 
* Document the change needed in tokeniser, bidix lookup, and generation to include surface form: [[User:Khannatanmai/Eliminating_Dictionary_Trimming]]
 
* Document all the proposed benefits with including secondary information
  +
  +
  +
== Community Bonding Period (May 4 - June 1) ==
  +
* Create a suitable development and debugging environment for the pipe (Xcode)
  +
* Modifying transfer to pass secondary tags ahead. Updates can be found [https://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format#Progress here].
  +
* Modify generator to ignore secondary tags while matching
  +
* Deal with MLUs in generator, and special characters in sectags, etc.
  +
* Analyse the code of the parsers of the modules
  +
* Fix transfer behaviour with LUs with invariable parts and MLUs
  +
* Need to deal with sec tags appearing before lemq if lemq comes from variable
  +
* '''Wiki for all features being implemented for secondary tags [[User:Khannatanmai/Secondary_tags_features|here]].'''
  +
* Testcase: lemq comes from variable
  +
* Create test t1x file which covers all test cases.
  +
* Run thorough regression tests on eng-spa (multi stage transfer) and spa-cat(single stage transfer)
  +
* Manually insert secondary tags in the stream and test if they reach the generator
  +
* Prepare an alternate proposal to secondary tags: [[User:Khannatanmai/Alternate_stream_modification]]
  +
  +
== Phase 1 (June 1 - July 4) ==
  +
* Deal with the community's objections to secondary tags.
  +
* Come up with a method everyone is happy with
  +
* Analyse the needs of WikiMedia's markup handling.
  +
* New page for the development of word bound blanks: [[User:Khannatanmai/Wordbound_blanks]]
  +
* Add tests an examples which have merging, splitting, deletions, insertions, etc.
  +
* Changed formalism so that wordbound blanks are now before an LU
  +
* Modify chunker to deal with wordbound blanks
  +
* Write tests for the chunker
  +
  +
== Phase 2 (July 3 - July 27) ==
  +
* Make sure regression tests show no regression
  +
* Modify interchunk and postchunk to deal with wordbound blanks
  +
* Write tests for chunker, interchunk, postchunk blank handling
  +
* Modify pretransfer to split wordbound blanks
  +
* Write tests for pretransfer blank handling
  +
* Deal with separable and merge blanks when multiwords are formed
  +
* Add tests to -separable for wordbound blank handling
  +
* Make lt-proc parse wordbound blanks as normal blanks correctly for analyser, generator, biltrans, and post generator
  +
* Add tests for lt-proc analysis of wordbound blanks
  +
* Add test for anaphora handling wordbound blanks
  +
* Handle wordbound blanks in apertium streamparser.
  +
* Add feature in transfer and postchunk so that it outputs the wordbound blank automatically if there's only one LU in the matching pattern.
  +
* Wordbound blank handling in postgeneration as it has many-many rules
  +
* Tests for wordbound blank handling in postgeneration
  +
* Changes in separable for wblank handling to work with revautoseq as well.
  +
  +
== Phase 3 (July 31 - August 24) ==
  +
* Parse wblanks and store with LU in -recursive
  +
* Get wblanks in recursive output using parallel wblank stack that mimics main stack operations
  +
* Add tests in recursive
  +
* Fix wblank handling with XML TRX rule files
  +
* Wordbound blank handling with variables in recursive
  +
* Test wblanks from variables and MLU wblanks in recursive
  +
* Modify apertium-tagger to parse wblanks as normal blanks
  +
* Modify hfst-proc to parse wblanks as normal blanks (analysis, generation)
  +
* Fix this error: http://codepad.org/yU4uaSNX (transfer error in afr-nld) - used old transfer, convert to new
  +
* Fix wblank printing error in pairs that use t4x
  +
* Test if wordbound blanks go through the pipe properly in all pairs
  +
* [[User:Khannatanmai/GSoC2020_Final_Report|Final report]] for GSoC 2020
  +
* Proper error handling of wordbound blanks
  +
* Use transfuse in apy, wikimedia translations
  +
* Fix wblank printing with null flush
  +
* Need to modify super blank handling in chunker so that user doesn't have to worry about blank position anymore.
  +
* Need to modify super blank handling in interchunk so that user doesn't have to worry about blank position anymore.
  +
* Need to modify super blank handling in postchunk so that user doesn't have to worry about blank position anymore.

Latest revision as of 04:29, 9 September 2020

Work Plan: http://wiki.apertium.org/wiki/User:Khannatanmai/GSoC2020Proposal_Trimming#Work_Plan

To Do[edit]

Phase 3 (July 31 - August 24)[edit]

All Done :)

Ongoing[edit]

Phase 3 (July 31 - August 24)[edit]

All Done :)

Completed[edit]

Application Review Period (March 31 - May 3)[edit]

  • Compile all the discussion about the modification to the stream format (in talk pages)
  • Create dedicated page for the development of the new stream format: User:Khannatanmai/New_Apertium_stream_format
  • Going through the documentation again and reading the wikis for each module just to ensure I haven't missed anything in the overall working of Apertium as I've never really made a language pair.
  • http://wiki.apertium.org/wiki/User:Khannatanmai/New_Apertium_stream_format : Document modification to Apertium stream format (see talk pages for relevant discussion)
  • Document how much change is needed in which parsers and what the change is
  • Proof of Concept for the new format
  • Document the change needed in tokeniser, bidix lookup, and generation to include surface form: User:Khannatanmai/Eliminating_Dictionary_Trimming
  • Document all the proposed benefits with including secondary information


Community Bonding Period (May 4 - June 1)[edit]

  • Create a suitable development and debugging environment for the pipe (Xcode)
  • Modifying transfer to pass secondary tags ahead. Updates can be found here.
  • Modify generator to ignore secondary tags while matching
  • Deal with MLUs in generator, and special characters in sectags, etc.
  • Analyse the code of the parsers of the modules
  • Fix transfer behaviour with LUs with invariable parts and MLUs
  • Need to deal with sec tags appearing before lemq if lemq comes from variable
  • Wiki for all features being implemented for secondary tags here.
  • Testcase: lemq comes from variable
  • Create test t1x file which covers all test cases.
  • Run thorough regression tests on eng-spa (multi stage transfer) and spa-cat(single stage transfer)
  • Manually insert secondary tags in the stream and test if they reach the generator
  • Prepare an alternate proposal to secondary tags: User:Khannatanmai/Alternate_stream_modification

Phase 1 (June 1 - July 4)[edit]

  • Deal with the community's objections to secondary tags.
  • Come up with a method everyone is happy with
  • Analyse the needs of WikiMedia's markup handling.
  • New page for the development of word bound blanks: User:Khannatanmai/Wordbound_blanks
  • Add tests an examples which have merging, splitting, deletions, insertions, etc.
  • Changed formalism so that wordbound blanks are now before an LU
  • Modify chunker to deal with wordbound blanks
  • Write tests for the chunker

Phase 2 (July 3 - July 27)[edit]

  • Make sure regression tests show no regression
  • Modify interchunk and postchunk to deal with wordbound blanks
  • Write tests for chunker, interchunk, postchunk blank handling
  • Modify pretransfer to split wordbound blanks
  • Write tests for pretransfer blank handling
  • Deal with separable and merge blanks when multiwords are formed
  • Add tests to -separable for wordbound blank handling
  • Make lt-proc parse wordbound blanks as normal blanks correctly for analyser, generator, biltrans, and post generator
  • Add tests for lt-proc analysis of wordbound blanks
  • Add test for anaphora handling wordbound blanks
  • Handle wordbound blanks in apertium streamparser.
  • Add feature in transfer and postchunk so that it outputs the wordbound blank automatically if there's only one LU in the matching pattern.
  • Wordbound blank handling in postgeneration as it has many-many rules
  • Tests for wordbound blank handling in postgeneration
  • Changes in separable for wblank handling to work with revautoseq as well.

Phase 3 (July 31 - August 24)[edit]

  • Parse wblanks and store with LU in -recursive
  • Get wblanks in recursive output using parallel wblank stack that mimics main stack operations
  • Add tests in recursive
  • Fix wblank handling with XML TRX rule files
  • Wordbound blank handling with variables in recursive
  • Test wblanks from variables and MLU wblanks in recursive
  • Modify apertium-tagger to parse wblanks as normal blanks
  • Modify hfst-proc to parse wblanks as normal blanks (analysis, generation)
  • Fix this error: http://codepad.org/yU4uaSNX (transfer error in afr-nld) - used old transfer, convert to new
  • Fix wblank printing error in pairs that use t4x
  • Test if wordbound blanks go through the pipe properly in all pairs
  • Final report for GSoC 2020
  • Proper error handling of wordbound blanks
  • Use transfuse in apy, wikimedia translations
  • Fix wblank printing with null flush
  • Need to modify super blank handling in chunker so that user doesn't have to worry about blank position anymore.
  • Need to modify super blank handling in interchunk so that user doesn't have to worry about blank position anymore.
  • Need to modify super blank handling in postchunk so that user doesn't have to worry about blank position anymore.