User:Popcorndude/Recursive Transfer

Google Summer of Code 2019 proposal draft

Contact

Name: Daniel Swanson

Email: awesomeevildudes@gmail.com

IRC: popcorndude

Timezone: UTC-5

Proposal

I would like to implement an alternative to the current chunking system for structural transfer as described at Ideas_for_Google_Summer_of_Code/Robust_recursive_transfer. The new system would take a set of recursively defined rules generate a GLR parser which will make it much easier to handle long-distance phrasal reordering and will probably also significantly reduce the size of existing rule sets. A draft of the formalism for these rules can be found at User:Popcorndude/Recursive_Transfer/Formalism.

This project would benefit the community by making it much easier to write transfer rules for syntactically dissimilar languages and to the extent that it makes rule sets smaller, it will presumably also make them easier to maintain.

Background

I am a sophomore at Swarthmore College studying math and linguistics. Last year I took a class in computational linguistics using Apertium and this year I am a course assistant for that class. Last summer I worked on a personal translation project (code here) which involved a lot of structural transfer and writing a recursive descent parser.

I have a lot of experience with Python and a basic knowledge of C++. I am a native speaker of English and can read Spanish and Biblical Hebrew.

I have been interested in rule-based machine translation for several years, particularly as it might be applied to Bible translation. I am interested in Apertium because it already does pretty much everything I was trying to do with the system I was building on my own except for complex syntactic relations, and this GSoC project would fill that gap.

Coding Challenge

All my code is on GitHub at https://github.com/mr-martian/GSoC19-recursive

3/4/19

So far I have reimplemented the Python script from the prototype and added support for attribute categories and parameterized nodes.

Example:

gender = m f;
#noun $gender -> #(n.$gender);
#adj $gender -> #(adj.$gender);
NP $gender -> noun adj { 2 1 } ;

The defines a category "gender" consisting of <m> and <f>. The lexical categories "noun" and "adj", matching things of the form "word<n><C>" and "word<adj><C>", respectively, where C is <m> or <f>. The last line defines a non-terminal node "NP" which matches a noun followed by an adjective of the same gender, so "carro<n><m>/car<n> rojo<adj><m>/red<adj>" would become "rojo<adj><m>/red<adj> carro<n><m>/car<n>" but "carro<n><m>/car<n> roja<adj><f>" would not be matched.

3/12/19

I rewrote a portion of the English->Spanish noun phrase rules in a potential transfer formalism. https://github.com/mr-martian/GSoC19-recursive/blob/master/eng-spa.rtx

Could you add comments the rules that give examples of what they do? —Firespeaker (talk) 04:40, 22 March 2019 (CET)

Done Popcorndude (talk) 15:07, 22 March 2019 (CET)

Work Plan

Time Period	Goal	Details	Deliverable
Community Bonding Period May 6-26	Finalize formalism	Read up on GLR parsers Decide variable semantics and syntax See if there's a good way to handle interpolation (e.g. inserting clitics after first word of phrase)	Full description of planned formalism
Week 1 May 27-June 2	Begin parser	Get input Match and build trees based on literal tags and attribute categories	Minimal parser
Week 2 June 3-9	Add variables	Agreement Passing variables up the tree Setting variables for child nodes	Minimal parser with agreement
Week 3 June 10-16	Test with eng->spa	Noun phrases (this was started in the coding challenge) Basic verb phrases (some agreement, if time)	Simple eng->spa parser
Week 4 June 17-23	Continue parser	Weights Conditionals Multiple output nodes Anything else deemed necessary during Community Bonding or testing	Majority of initial specifications implemented
evaluation 1	Basic parser done		Parser-generator compliant with majority of initial specifications and rudimentary eng->spa instantiation
Week 5 June 24-30	Finish parser and continue eng->spa	Finish anything left over from week 4 Finish verb phrases	Fully implemented parser and working eng->spa for simple sentences
Week 6 July 1-7	Finish eng->spa and write reverser	Convert any remaining eng->spa rules Evaluate parser against chunking system Metrics: accuracy, speed of parser, compilation speed Write script to automatically reverse a ruleset All features currently described are at least in princible reversible	System comparison and rule-reverser
Week 7 July 8-14	Evaluation and testing	Evaluate the output of the reverser against current spa->eng system Write tests for all features Begin adding error messages	Test suite and report on the general effectiveness of direct rule-reversal
Week 8 July 15-21	Optimization and interface	Speed up the parser and compiler where possible Build interfaces for compiler, parser, and reverser Clean up code Re-evaluate speed	Command-line interfaces and updated system comparison
evaluation 2	Complete program		Optimized and polished parser-generator compliant with initial specifications, and complete end->spa transfer rules
Week 9 July 22-28	Do spa->eng	Identify differences between generated spa->eng and chunking spa->eng Fix generated spa->eng rules Report on effort required to correct reverser	Working spa->eng rules and report on the usefulness of rule-reverser
Week 10 July 29-August 4	Documentation	Convert initial specifications to full documentation Write tutorial Write recipe book containing at least minimal examples of everything listed at User_talk:Popcorndude/Recursive_Transfer#Linguistic.2Ftransfer_phenomena	Complete documentation of system
Weeks 11 and 12 August 5-18	Buffer zone	These weeks will be used for one of the following, depending on preceding weeks and discussions with mentors: Make up for delays in prior weeks Converting another language pair Experimenting with automated conversion of chunking rules Writing a ruleset composer for generating a preliminary ruleset from two other pairs (e.g. combine eng->spa and spa->cat to get approximate rules for eng->cat)	TBD
final evaluation	Project done		Complete, fully documented system with full ruleset for at least one language pair

I have no other commitments this summer and would be able to work on this project full-time.

User:Popcorndude/Recursive Transfer

Contents

Contact

Proposal

Background

Coding Challenge

3/4/19

3/12/19

Work Plan

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools