User:Popcorndude/Recursive Transfer
Google Summer of Code 2019 proposal draft
Contact
Name: Daniel Swanson
Email: awesomeevildudes@gmail.com
IRC: popcorndude
GitHub: https://github.com/mr-martian
Timezone: UTC-5
Proposal
I would like to implement an alternative to the current chunking system for structural transfer as described at Ideas_for_Google_Summer_of_Code/Robust_recursive_transfer. The new system would take a set of recursively defined rules generate a GLR parser which will make it much easier to handle long-distance phrasal reordering and will probably also significantly reduce the size of existing rule sets. A draft of the formalism for these rules can be found at User:Popcorndude/Recursive_Transfer/Formalism.
This project would benefit the community by making it much easier to write transfer rules for syntactically dissimilar languages and to the extent that it makes rule sets smaller, it will presumably also make them easier to maintain.
Background
I am a sophomore at Swarthmore College studying math and linguistics. Last year I took a class in computational linguistics using Apertium and this year I am a course assistant for that class. Last summer I worked on a personal translation project (code here) which involved a lot of structural transfer and writing a recursive descent parser.
I have a lot of experience with Python and a basic knowledge of C++. I am a native speaker of English and can read Spanish and Biblical Hebrew.
I have been interested in rule-based machine translation for several years, particularly as it might be applied to Bible translation. I am interested in Apertium because it already does pretty much everything I was trying to do with the system I was building on my own except for complex syntactic relations, and this GSoC project would fill that gap.
Coding Challenge
All my code is on GitHub at https://github.com/mr-martian/GSoC19-recursive
3/4/19
So far I have reimplemented the Python script from the prototype and added support for attribute categories and parameterized nodes.
Example:
gender = m f; #noun $gender -> #(n.$gender); #adj $gender -> #(adj.$gender); NP $gender -> noun adj { 2 1 } ;
The defines a category "gender" consisting of <m> and <f>. The lexical categories "noun" and "adj", matching things of the form "word<n><C>" and "word<adj><C>", respectively, where C is <m> or <f>. The last line defines a non-terminal node "NP" which matches a noun followed by an adjective of the same gender, so "carro<n><m>/car<n> rojo<adj><m>/red<adj>" would become "rojo<adj><m>/red<adj> carro<n><m>/car<n>" but "carro<n><m>/car<n> roja<adj><f>" would not be matched.
3/12/19
I rewrote a portion of the English->Spanish noun phrase rules in a potential transfer formalism. https://github.com/mr-martian/GSoC19-recursive/blob/master/eng-spa.rtx
Could you add comments the rules that give examples of what they do? —Firespeaker (talk) 04:40, 22 March 2019 (CET)
- Done Popcorndude (talk) 15:07, 22 March 2019 (CET)
Work Plan
Time Period | Work Plan | Deliverable |
---|---|---|
Community Bonding Period and week 1 | Read up on GLR parsers and finalize first draft of formalism | List of operations with syntax |
weeks 2 and 3 | Build parser which implements a subset of the formalism | Parser-generator which can extract attributes from lexical units and build trees with some agreement |
week 4 | Test by writing noun phrase rules for eng->spa | Ruleset which accurately translates basic English noun phrases to Spanish |
evaluation 1 | Basic parser done | |
weeks 5 and 6 | Implement remainder of formalism, testing with eng->spa | Parser-generator with all behavior specified in week 1 |
weeks 7 and 8 | Write the rest of eng->spa, begin working on spa->eng | Transfer system equivalent to current chunking system for eng-spa |
evaluation 2 | Working eng->spa transfer program | |
week 9 | Finish spa->eng
It may be possible to automatically reverse a ruleset and post-edit the result, which would significantly reduce the time needed for spa->eng |
Transfer system equivalent to current chunking system for spa->eng |
week 10 | Documentation and further testing
Specifically, attempt to write minimal examples for all the phenomena listed at User_talk:Popcorndude/Recursive_Transfer#Linguistic.2Ftransfer_phenomena. |
Full description of each syntax feature and examples for common and tricky transfer phenomena |
weeks 11 and 12 | Either a buffer for things taking longer than expected or conversion of a second language pair | TBD |
final evaluation |
Could you add explicit mention of deliverables? —Firespeaker (talk) 20:01, 30 March 2019 (CET)
I have no other commitments this summer and would be able to work on this project full-time.