Ideas for Google Summer of Code/Robust recursive transfer

From Apertium
Jump to navigation Jump to search

The purpose of this task is to create a module to replace the apertium-transfer module(s) which will parse and allow transfer operations on an input.

Currently we have a problem with very distantly related languages that have long-distance constituent reordering, because we can only do finite-state chunking. The module should be designed to be able to work cleanly with partial input. e.g. word by word processing, not sentence by sentence.

It should expect morphologically disambiguated input, and its own output should also be unambiguous (it should create a single parse tree).

Tasks[edit]

  1. Do a review of the literature on:
    1. finite-state dependency parsing
    2. LALR(1) grammars
  2. Propose a transfer rule formalism
  3. Write a number of transfer rules in this formalism for translating between a language pair.
  4. Reimplement an existing language pair in trunk using your new formalism. This will involve rewriting the existing rules to be compatible with your new formalism.
  5. Integrate your new rules into the existing pair.
  6. Evaluate the improvement

Coding challenge[edit]

  1. Install Apertium (see Minimal installation from SVN)
  2. Compile the prototype code at recursive transfer.
  3. Write a transfer grammar to perform word-reordering for this story (other link here) for your chosen language pair.
Optional
  1. Adjust prototype code to include support for attributes.

Frequently asked questions[edit]

  • none yet, ask us something! :)

See also[edit]

Further reading[edit]

  • Elworthy, D. (1999) "A Finite-State Parser with Dependency Structure Output"
  • Öflazer, K. (1999) "Dependency Parsing with an Extended Finite State Approach"
  • Alshawi, H., Douglas, S., Bangalore, S. (2000) "Learning Dependency Translation Models as Collections of Finite-State Head Transducers". Computational Linguistics 26(1)