Difference between revisions of "User:Popcorndude/Recursive Transfer/Progress"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
  +
== Work Plan (from [[User:Popcorndude/Recursive_Transfer | proposal]]) ==
  +
  +
{| class="wikitable" border="1"
  +
|-
  +
! Time Period
  +
! Goal
  +
! Details
  +
! Deliverable
  +
|-
  +
| Community Bonding Period
  +
May 6-26
  +
| Finalize formalism
  +
|
  +
* Read up on GLR parsers
  +
* Decide variable semantics and syntax
  +
* See if there's a good way to handle interpolation (e.g. inserting clitics after first word of phrase)
  +
| Full description of planned formalism
  +
|-
  +
| Week 1
  +
May 27-June 2
  +
| Begin parser
  +
|
  +
* Get input
  +
* Match and build trees based on literal tags and attribute categories
  +
| Minimal parser
  +
|-
  +
| Week 2
  +
June 3-9
  +
| Add variables
  +
|
  +
* Agreement
  +
* Passing variables up the tree
  +
* Setting variables for child nodes
  +
| Minimal parser with agreement
  +
|-
  +
| Week 3
  +
June 10-16
  +
| Test with eng->spa
  +
|
  +
* Noun phrases (this was started in the coding challenge)
  +
* Basic verb phrases (some agreement, if time)
  +
| Simple eng->spa parser
  +
|-
  +
| Week 4
  +
June 17-23
  +
| Continue parser
  +
|
  +
* Weights
  +
* Conditionals
  +
* Multiple output nodes
  +
* Anything else deemed necessary during Community Bonding or testing
  +
| Majority of initial specifications implemented
  +
|-
  +
| '''evaluation 1'''
  +
| Basic parser done
  +
|
  +
| Parser-generator compliant with majority of initial specifications and rudimentary eng->spa instantiation
  +
|-
  +
| Week 5
  +
June 24-30
  +
| Finish parser and continue eng->spa
  +
|
  +
* Finish anything left over from week 4
  +
* Finish verb phrases
  +
| Fully implemented parser and working eng->spa for simple sentences
  +
|-
  +
| Week 6
  +
July 1-7
  +
| Finish eng->spa and write reverser
  +
|
  +
* Convert any remaining eng->spa rules
  +
* Evaluate parser against chunking system
  +
** Metrics: accuracy, speed of parser, compilation speed
  +
* Write script to automatically reverse a ruleset
  +
** All features currently described are at least in princible reversible
  +
| System comparison and rule-reverser
  +
|-
  +
| Week 7
  +
July 8-14
  +
| Evaluation and testing
  +
|
  +
* Evaluate the output of the reverser against current spa->eng system
  +
* Write tests for all features
  +
* Begin adding error messages
  +
| Test suite and report on the general effectiveness of direct rule-reversal
  +
|-
  +
| Week 8
  +
July 15-21
  +
| Optimization and interface
  +
|
  +
* Speed up the parser and compiler where possible
  +
* Build interfaces for compiler, parser, and reverser
  +
* Clean up code
  +
* Re-evaluate speed
  +
| Command-line interfaces and updated system comparison
  +
|-
  +
| '''evaluation 2'''
  +
| Complete program
  +
|
  +
| Optimized and polished parser-generator compliant with initial specifications, and complete end->spa transfer rules
  +
|-
  +
| Week 9
  +
July 22-28
  +
| Do spa->eng
  +
|
  +
* Identify differences between generated spa->eng and chunking spa->eng
  +
* Fix generated spa->eng rules
  +
* Report on effort required to correct reverser
  +
| Working spa->eng rules and report on the usefulness of rule-reverser
  +
|-
  +
| Week 10
  +
July 29-August 4
  +
| Documentation
  +
|
  +
* Convert initial specifications to full documentation
  +
* Write tutorial
  +
* Write recipe book containing at least minimal examples of everything listed at [[User_talk:Popcorndude/Recursive_Transfer#Linguistic.2Ftransfer_phenomena]]
  +
| Complete documentation of system
  +
|-
  +
| Weeks 11 and 12
  +
August 5-18
  +
| Buffer zone
  +
|
  +
These weeks will be used for one of the following, depending on preceding weeks and discussions with mentors:
  +
* Make up for delays in prior weeks
  +
* Converting another language pair
  +
* Experimenting with automated conversion of chunking rules
  +
* Writing a ruleset composer for generating a preliminary ruleset from two other pairs (e.g. combine eng->spa and spa->cat to get approximate rules for eng->cat)
  +
| TBD
  +
|-
  +
| '''final evaluation'''
  +
| Project done
  +
|
  +
| Complete, fully documented system with full ruleset for at least one language pair
  +
|}
 
== Community Bonding ==
 
== Community Bonding ==
 
=== Todo list ===
 
=== Todo list ===
* Determine exact semantics of lexical unit tag-matching
+
* <s>Determine exact semantics of lexical unit tag-matching</s>
** Are they ordered?
+
** <s>Are they ordered?</s>
** Are they consecutive?
+
** <s>Are they consecutive?</s>
 
* See if anyone has input on formalism syntax in general
 
* See if anyone has input on formalism syntax in general
 
* Mechanism for clitic-insertion
 
* Mechanism for clitic-insertion
Line 11: Line 146:
 
** Is there anything that can be done to make this finite-state? (probably not)
 
** Is there anything that can be done to make this finite-state? (probably not)
 
** Should we just start with the naive implementation (what the Python script does) as a baseline?
 
** Should we just start with the naive implementation (what the Python script does) as a baseline?
* Conjoined lexical units - just treat as consecutive elements with no blank between?
+
* <s>Conjoined lexical units - just treat as consecutive elements with no blank between?</s>
* Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)
+
* <s>Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)</s>
* Conditional output (e.g. modal verbs in English)
+
* <s>Conditional output (e.g. modal verbs in English)</s>
  +
* Make sure all syntax is written down
  +
* Begin writing tests
  +
* Some way to match absence of a tag
  +
  +
May 25: LU tags are unordered (basically, every tag operation has the same semantics as <clip>, <equal><clip>..., or <let><clip>... in the chunker). Various other things have syntax but that syntax may not be properly documented yet.
   
 
== Week 1 ==
 
== Week 1 ==

Revision as of 22:04, 25 May 2019

Work Plan (from proposal)

Time Period Goal Details Deliverable
Community Bonding Period

May 6-26

Finalize formalism
  • Read up on GLR parsers
  • Decide variable semantics and syntax
  • See if there's a good way to handle interpolation (e.g. inserting clitics after first word of phrase)
Full description of planned formalism
Week 1

May 27-June 2

Begin parser
  • Get input
  • Match and build trees based on literal tags and attribute categories
Minimal parser
Week 2

June 3-9

Add variables
  • Agreement
  • Passing variables up the tree
  • Setting variables for child nodes
Minimal parser with agreement
Week 3

June 10-16

Test with eng->spa
  • Noun phrases (this was started in the coding challenge)
  • Basic verb phrases (some agreement, if time)
Simple eng->spa parser
Week 4

June 17-23

Continue parser
  • Weights
  • Conditionals
  • Multiple output nodes
  • Anything else deemed necessary during Community Bonding or testing
Majority of initial specifications implemented
evaluation 1 Basic parser done Parser-generator compliant with majority of initial specifications and rudimentary eng->spa instantiation
Week 5

June 24-30

Finish parser and continue eng->spa
  • Finish anything left over from week 4
  • Finish verb phrases
Fully implemented parser and working eng->spa for simple sentences
Week 6

July 1-7

Finish eng->spa and write reverser
  • Convert any remaining eng->spa rules
  • Evaluate parser against chunking system
    • Metrics: accuracy, speed of parser, compilation speed
  • Write script to automatically reverse a ruleset
    • All features currently described are at least in princible reversible
System comparison and rule-reverser
Week 7

July 8-14

Evaluation and testing
  • Evaluate the output of the reverser against current spa->eng system
  • Write tests for all features
  • Begin adding error messages
Test suite and report on the general effectiveness of direct rule-reversal
Week 8

July 15-21

Optimization and interface
  • Speed up the parser and compiler where possible
  • Build interfaces for compiler, parser, and reverser
  • Clean up code
  • Re-evaluate speed
Command-line interfaces and updated system comparison
evaluation 2 Complete program Optimized and polished parser-generator compliant with initial specifications, and complete end->spa transfer rules
Week 9

July 22-28

Do spa->eng
  • Identify differences between generated spa->eng and chunking spa->eng
  • Fix generated spa->eng rules
  • Report on effort required to correct reverser
Working spa->eng rules and report on the usefulness of rule-reverser
Week 10

July 29-August 4

Documentation Complete documentation of system
Weeks 11 and 12

August 5-18

Buffer zone

These weeks will be used for one of the following, depending on preceding weeks and discussions with mentors:

  • Make up for delays in prior weeks
  • Converting another language pair
  • Experimenting with automated conversion of chunking rules
  • Writing a ruleset composer for generating a preliminary ruleset from two other pairs (e.g. combine eng->spa and spa->cat to get approximate rules for eng->cat)
TBD
final evaluation Project done Complete, fully documented system with full ruleset for at least one language pair

Community Bonding

Todo list

  • Determine exact semantics of lexical unit tag-matching
    • Are they ordered?
    • Are they consecutive?
  • See if anyone has input on formalism syntax in general
  • Mechanism for clitic-insertion
    • e.g. V2, Wackernagel
  • Read about GLR parser algorithms
    • Find reading materials
    • Is there anything that can be done to make this finite-state? (probably not)
    • Should we just start with the naive implementation (what the Python script does) as a baseline?
  • Conjoined lexical units - just treat as consecutive elements with no blank between?
  • Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)
  • Conditional output (e.g. modal verbs in English)
  • Make sure all syntax is written down
  • Begin writing tests
  • Some way to match absence of a tag

May 25: LU tags are unordered (basically, every tag operation has the same semantics as <clip>, <equal><clip>..., or <let><clip>... in the chunker). Various other things have syntax but that syntax may not be properly documented yet.

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12