Difference between revisions of "User:Popcorndude/Recursive Transfer/Progress"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
== Work Plan (from [[User:Popcorndude/Recursive_Transfer | proposal]]) ==

{| class="wikitable" border="1"
|-
! Time Period
! Goal
! Details
! Deliverable
|-
| Community Bonding Period
May 6-26
| Finalize formalism
|
* Read up on GLR parsers
* Decide variable semantics and syntax
* See if there's a good way to handle interpolation (e.g. inserting clitics after first word of phrase)
| Full description of planned formalism
|-
| Week 1
May 27-June 2
| Begin parser
|
* Get input
* Match and build trees based on literal tags and attribute categories
| Minimal parser
|-
| Week 2
June 3-9
| Add variables
|
* Agreement
* Passing variables up the tree
* Setting variables for child nodes
| Minimal parser with agreement
|-
| Week 3
June 10-16
| Test with eng->spa
|
* Noun phrases (this was started in the coding challenge)
* Basic verb phrases (some agreement, if time)
| Simple eng->spa parser
|-
| Week 4
June 17-23
| Continue parser
|
* Weights
* Conditionals
* Multiple output nodes
* Anything else deemed necessary during Community Bonding or testing
| Majority of initial specifications implemented
|-
| '''evaluation 1'''
| Basic parser done
|
| Parser-generator compliant with majority of initial specifications and rudimentary eng->spa instantiation
|-
| Week 5
June 24-30
| Finish parser and continue eng->spa
|
* Finish anything left over from week 4
* Finish verb phrases
| Fully implemented parser and working eng->spa for simple sentences
|-
| Week 6
July 1-7
| Finish eng->spa and write reverser
|
* Convert any remaining eng->spa rules
* Evaluate parser against chunking system
** Metrics: accuracy, speed of parser, compilation speed
* Write script to automatically reverse a ruleset
** All features currently described are at least in princible reversible
| System comparison and rule-reverser
|-
| Week 7
July 8-14
| Evaluation and testing
|
* Evaluate the output of the reverser against current spa->eng system
* Write tests for all features
* Begin adding error messages
| Test suite and report on the general effectiveness of direct rule-reversal
|-
| Week 8
July 15-21
| Optimization and interface
|
* Speed up the parser and compiler where possible
* Build interfaces for compiler, parser, and reverser
* Clean up code
* Re-evaluate speed
| Command-line interfaces and updated system comparison
|-
| '''evaluation 2'''
| Complete program
|
| Optimized and polished parser-generator compliant with initial specifications, and complete end->spa transfer rules
|-
| Week 9
July 22-28
| Do spa->eng
|
* Identify differences between generated spa->eng and chunking spa->eng
* Fix generated spa->eng rules
* Report on effort required to correct reverser
| Working spa->eng rules and report on the usefulness of rule-reverser
|-
| Week 10
July 29-August 4
| Documentation
|
* Convert initial specifications to full documentation
* Write tutorial
* Write recipe book containing at least minimal examples of everything listed at [[User_talk:Popcorndude/Recursive_Transfer#Linguistic.2Ftransfer_phenomena]]
| Complete documentation of system
|-
| Weeks 11 and 12
August 5-18
| Buffer zone
|
These weeks will be used for one of the following, depending on preceding weeks and discussions with mentors:
* Make up for delays in prior weeks
* Converting another language pair
* Experimenting with automated conversion of chunking rules
* Writing a ruleset composer for generating a preliminary ruleset from two other pairs (e.g. combine eng->spa and spa->cat to get approximate rules for eng->cat)
| TBD
|-
| '''final evaluation'''
| Project done
|
| Complete, fully documented system with full ruleset for at least one language pair
|}
== Community Bonding ==
== Community Bonding ==
=== Todo list ===
=== Todo list ===
* Determine exact semantics of lexical unit tag-matching
* <s>Determine exact semantics of lexical unit tag-matching</s>
** Are they ordered?
** <s>Are they ordered?</s>
** Are they consecutive?
** <s>Are they consecutive?</s>
* See if anyone has input on formalism syntax in general
* See if anyone has input on formalism syntax in general
* Mechanism for clitic-insertion
* Mechanism for clitic-insertion
Line 11: Line 146:
** Is there anything that can be done to make this finite-state? (probably not)
** Is there anything that can be done to make this finite-state? (probably not)
** Should we just start with the naive implementation (what the Python script does) as a baseline?
** Should we just start with the naive implementation (what the Python script does) as a baseline?
* Conjoined lexical units - just treat as consecutive elements with no blank between?
* <s>Conjoined lexical units - just treat as consecutive elements with no blank between?</s>
* Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)
* <s>Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)</s>
* Conditional output (e.g. modal verbs in English)
* <s>Conditional output (e.g. modal verbs in English)</s>
* Make sure all syntax is written down
* Begin writing tests
* Some way to match absence of a tag

May 25: LU tags are unordered (basically, every tag operation has the same semantics as <clip>, <equal><clip>..., or <let><clip>... in the chunker). Various other things have syntax but that syntax may not be properly documented yet.


== Week 1 ==
== Week 1 ==

Revision as of 22:04, 25 May 2019

Work Plan (from proposal)

Time Period Goal Details Deliverable
Community Bonding Period

May 6-26

Finalize formalism
  • Read up on GLR parsers
  • Decide variable semantics and syntax
  • See if there's a good way to handle interpolation (e.g. inserting clitics after first word of phrase)
Full description of planned formalism
Week 1

May 27-June 2

Begin parser
  • Get input
  • Match and build trees based on literal tags and attribute categories
Minimal parser
Week 2

June 3-9

Add variables
  • Agreement
  • Passing variables up the tree
  • Setting variables for child nodes
Minimal parser with agreement
Week 3

June 10-16

Test with eng->spa
  • Noun phrases (this was started in the coding challenge)
  • Basic verb phrases (some agreement, if time)
Simple eng->spa parser
Week 4

June 17-23

Continue parser
  • Weights
  • Conditionals
  • Multiple output nodes
  • Anything else deemed necessary during Community Bonding or testing
Majority of initial specifications implemented
evaluation 1 Basic parser done Parser-generator compliant with majority of initial specifications and rudimentary eng->spa instantiation
Week 5

June 24-30

Finish parser and continue eng->spa
  • Finish anything left over from week 4
  • Finish verb phrases
Fully implemented parser and working eng->spa for simple sentences
Week 6

July 1-7

Finish eng->spa and write reverser
  • Convert any remaining eng->spa rules
  • Evaluate parser against chunking system
    • Metrics: accuracy, speed of parser, compilation speed
  • Write script to automatically reverse a ruleset
    • All features currently described are at least in princible reversible
System comparison and rule-reverser
Week 7

July 8-14

Evaluation and testing
  • Evaluate the output of the reverser against current spa->eng system
  • Write tests for all features
  • Begin adding error messages
Test suite and report on the general effectiveness of direct rule-reversal
Week 8

July 15-21

Optimization and interface
  • Speed up the parser and compiler where possible
  • Build interfaces for compiler, parser, and reverser
  • Clean up code
  • Re-evaluate speed
Command-line interfaces and updated system comparison
evaluation 2 Complete program Optimized and polished parser-generator compliant with initial specifications, and complete end->spa transfer rules
Week 9

July 22-28

Do spa->eng
  • Identify differences between generated spa->eng and chunking spa->eng
  • Fix generated spa->eng rules
  • Report on effort required to correct reverser
Working spa->eng rules and report on the usefulness of rule-reverser
Week 10

July 29-August 4

Documentation Complete documentation of system
Weeks 11 and 12

August 5-18

Buffer zone

These weeks will be used for one of the following, depending on preceding weeks and discussions with mentors:

  • Make up for delays in prior weeks
  • Converting another language pair
  • Experimenting with automated conversion of chunking rules
  • Writing a ruleset composer for generating a preliminary ruleset from two other pairs (e.g. combine eng->spa and spa->cat to get approximate rules for eng->cat)
TBD
final evaluation Project done Complete, fully documented system with full ruleset for at least one language pair

Community Bonding

Todo list

  • Determine exact semantics of lexical unit tag-matching
    • Are they ordered?
    • Are they consecutive?
  • See if anyone has input on formalism syntax in general
  • Mechanism for clitic-insertion
    • e.g. V2, Wackernagel
  • Read about GLR parser algorithms
    • Find reading materials
    • Is there anything that can be done to make this finite-state? (probably not)
    • Should we just start with the naive implementation (what the Python script does) as a baseline?
  • Conjoined lexical units - just treat as consecutive elements with no blank between?
  • Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)
  • Conditional output (e.g. modal verbs in English)
  • Make sure all syntax is written down
  • Begin writing tests
  • Some way to match absence of a tag

May 25: LU tags are unordered (basically, every tag operation has the same semantics as <clip>, <equal><clip>..., or <let><clip>... in the chunker). Various other things have syntax but that syntax may not be properly documented yet.

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12