Ideas for Google Summer of Code/Robust recursive transfer
< Ideas for Google Summer of Code
Jump to navigation
Jump to search
Revision as of 15:26, 14 March 2013 by Francis Tyers (talk | contribs)
The purpose of this task is to create a prototype module to replace the apertium-transfer module(s) which will parse and allow transfer operations on an input. Currently we have a problem with very distantly related languages that have long-distance constituent reordering, because we can only do finite-state chunking. The module should be designed to be able to work cleanly with partial input. e.g. word by word processing, not sentence by sentence.
Tasks
- Do a review of the literature on finite-state dependency parsing
- Propose a transfer rule formalism
- Write a number of transfer rules in this formalism for translating between a language pair.
- Reimplement an existing language pair in trunk using your new formalism. This will involve rewriting the existing rules to be compatible with your new formalism.
- Integrate your new rules into the existing pair.
- Evaluate the improvement
Coding challenge
- Install Apertium (see Minimal installation from SVN)
- Parse one or more sentences from the story in your language by hand.
- Formalise some rules to show how the parsed representation could be converted to a representation suitable for generation in another language.
- Write a stream processor (see Apertium stream format) that takes as input the output of the lexical transfer module and processes character by character.
Frequently asked questions
See also
- (2011) VM for transfer: Relevant to understand how the current transfer implementation works
Further reading
- Elworthy, D. (1999) "A Finite-State Parser with Dependency Structure Output"
- Öflazer, K. (1999) "Dependency Parsing with an Extended Finite State Approach"
- Alshawi, H., Douglas, S., Bangalore, S. (2000) "Learning Dependency Translation Models as Collections of Finite-State Head Transducers". Computational Linguistics 26(1)