Difference between revisions of "User:Popcorndude/Recursive Transfer/Progress"

From Apertium
Jump to navigation Jump to search
m (→‎Community Bonding: progress update)
Line 156: Line 156:
   
 
== Week 1 ==
 
== Week 1 ==
  +
May 27: As it turns out, I should have accounted for building a compiler/parser-generator in the workplan. We can currently mostly parse the file. Todo for tomorrow: finish parsing reduction rules and add some more error messages.
  +
 
== Week 2 ==
 
== Week 2 ==
 
== Week 3 ==
 
== Week 3 ==

Revision as of 20:08, 27 May 2019

Work Plan (from proposal)

Time Period Goal Details Deliverable
Community Bonding Period

May 6-26

Finalize formalism
  • Read up on GLR parsers
  • Decide variable semantics and syntax
  • See if there's a good way to handle interpolation (e.g. inserting clitics after first word of phrase)
Full description of planned formalism
Week 1

May 27-June 2

Begin parser
  • Get input
  • Match and build trees based on literal tags and attribute categories
Minimal parser
Week 2

June 3-9

Add variables
  • Agreement
  • Passing variables up the tree
  • Setting variables for child nodes
Minimal parser with agreement
Week 3

June 10-16

Test with eng->spa
  • Noun phrases (this was started in the coding challenge)
  • Basic verb phrases (some agreement, if time)
Simple eng->spa parser
Week 4

June 17-23

Continue parser
  • Weights
  • Conditionals
  • Multiple output nodes
  • Anything else deemed necessary during Community Bonding or testing
Majority of initial specifications implemented
evaluation 1 Basic parser done Parser-generator compliant with majority of initial specifications and rudimentary eng->spa instantiation
Week 5

June 24-30

Finish parser and continue eng->spa
  • Finish anything left over from week 4
  • Finish verb phrases
Fully implemented parser and working eng->spa for simple sentences
Week 6

July 1-7

Finish eng->spa and write reverser
  • Convert any remaining eng->spa rules
  • Evaluate parser against chunking system
    • Metrics: accuracy, speed of parser, compilation speed
  • Write script to automatically reverse a ruleset
    • All features currently described are at least in princible reversible
System comparison and rule-reverser
Week 7

July 8-14

Evaluation and testing
  • Evaluate the output of the reverser against current spa->eng system
  • Write tests for all features
  • Begin adding error messages
Test suite and report on the general effectiveness of direct rule-reversal
Week 8

July 15-21

Optimization and interface
  • Speed up the parser and compiler where possible
  • Build interfaces for compiler, parser, and reverser
  • Clean up code
  • Re-evaluate speed
Command-line interfaces and updated system comparison
evaluation 2 Complete program Optimized and polished parser-generator compliant with initial specifications, and complete end->spa transfer rules
Week 9

July 22-28

Do spa->eng
  • Identify differences between generated spa->eng and chunking spa->eng
  • Fix generated spa->eng rules
  • Report on effort required to correct reverser
Working spa->eng rules and report on the usefulness of rule-reverser
Week 10

July 29-August 4

Documentation Complete documentation of system
Weeks 11 and 12

August 5-18

Buffer zone

These weeks will be used for one of the following, depending on preceding weeks and discussions with mentors:

  • Make up for delays in prior weeks
  • Converting another language pair
  • Experimenting with automated conversion of chunking rules
  • Writing a ruleset composer for generating a preliminary ruleset from two other pairs (e.g. combine eng->spa and spa->cat to get approximate rules for eng->cat)
TBD
final evaluation Project done Complete, fully documented system with full ruleset for at least one language pair

Community Bonding

Todo list

  • Determine exact semantics of lexical unit tag-matching
    • Are they ordered?
    • Are they consecutive?
  • See if anyone has input on formalism syntax in general
  • Mechanism for clitic-insertion
    • e.g. V2, Wackernagel
  • Read about GLR parser algorithms
    • Find reading materials
    • Is there anything that can be done to make this finite-state? (probably not)
    • Should we just start with the naive implementation (what the Python script does) as a baseline?
  • Conjoined lexical units - just treat as consecutive elements with no blank between?
  • Syntax for mapping between sets of tags (e.g. <o3pl> -> <p3><pl>, <o3sg> -> <p3><sg>)
  • Conditional output (e.g. modal verbs in English)
  • Make sure all syntax is written down
  • Begin writing tests
  • Some way to match absence of a tag

May 25: LU tags are unordered (basically, every tag operation has the same semantics as <clip>, <equal><clip>..., or <let><clip>... in the chunker). Various other things have syntax but that syntax may not be properly documented yet.

Week 1

May 27: As it turns out, I should have accounted for building a compiler/parser-generator in the workplan. We can currently mostly parse the file. Todo for tomorrow: finish parsing reduction rules and add some more error messages.

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12