Ideas for Google Summer of Code/Robust recursive transfer

From Apertium
Jump to navigation Jump to search

The purpose of this task is to create a prototype module to replace the apertium-transfer module(s) which will parse and allow transfer operations on an input. Currently we have a problem with very distantly related languages that have long-distance constituent reordering, because we can only do finite-state chunking. The module should be designed to be able to work cleanly with partial input. e.g. word by word processing, not sentence by sentence.

Tasks

  1. Do a review of the literature on:
    1. finite-state dependency parsing
    2. LALR(1) grammars
  2. Propose a transfer rule formalism
  3. Write a number of transfer rules in this formalism for translating between a language pair.
  4. Reimplement an existing language pair in trunk using your new formalism. This will involve rewriting the existing rules to be compatible with your new formalism.
  5. Integrate your new rules into the existing pair.
  6. Evaluate the improvement

Coding challenge

  1. Install Apertium (see Minimal installation from SVN)
  2. Parse one or more sentences from the story in your language by hand.
  3. Formalise some rules to show how the parsed representation could be converted to a representation suitable for generation in another language.
  4. Write a stream processor (see Apertium stream format) that takes as input the output of the lexical transfer module and processes character by character.

Frequently asked questions

  • none yet, ask us something! :)

See also

  • (2011) VM for transfer: Relevant to understand how the current transfer implementation works

Further reading

  • Elworthy, D. (1999) "A Finite-State Parser with Dependency Structure Output"
  • Öflazer, K. (1999) "Dependency Parsing with an Extended Finite State Approach"
  • Alshawi, H., Douglas, S., Bangalore, S. (2000) "Learning Dependency Translation Models as Collections of Finite-State Head Transducers". Computational Linguistics 26(1)