Implementation of n-Stage Transfer
A proposal for Google Summer of Code 2009
Pranava Swaroop, March 2009
Partial parsing technique tries to grasp the syntactic information reliably and efficiently without digging deep into the analysis. The technique is tested for its robustness and speed. It is basically controlled by a cascade of finite-state automata which consists of a pipeline of recognizers. There have been several implementations of partial parsing including Cass - a fast, robust partial parser, Mirine 2.2 - a Korean grammar-checker and APOLN - a partial parser of unrestricted natural language sentences. These implementations (especially the latter) have shown that partial parsers can be used to resolve treatment of less related languages. Therefore its implementation would be quite beneficial for Apertium. This document describes a novel implementation of the partial parser which would improve the treatment of less related languages and allow for more complex verb movement and proposes the funding of the project through Google Summer of Code 2009 program as a part of Apertium.
A partial parser uses 'semi-deterministic' robust parsing algorithms which permit the analysis of unrestricted texts. It works with simple grammars, which are usually defined with regular patterns. The output of the parser is a complete analysis tree. Partial parsers recognize phrase boundaries mainly on the basis of cues provided by the local contexts. Regardless of whether or not abstractions such as phrases occur in the model, most of the relevant information is contained directly in the sequence of words and part-of-speech tags to be processed.
A partial parser is mostly controlled by a cascaded set of finite state automata, hence forth it is described by a number of levels. In this finite state cascade, a genuine recursion is not possible. The whole strategy is based on "simple-first-parsing", as put by Steven Abney, we make easy calls first, whittling away at the harder decisions in the process.
Present Status of Apertium
Apertium structural transfer uses finite-state pattern matching to detect, in the usual left-to-right, longest-match way, fixed-length patterns of lexical forms to process and performs the corresponding transformations. A shallow-transfer rule consists of a sequence of lexical forms to detect and the transformations that have to be applied to them. Apertium currently has between one and three stages of transfer. This may not be favorable for translation of distant languages. The joining or in other words merging of chunks, which also involves the Verb movement is restricted. I have realized this after working on the transfer rules and inter-chunk transfer rules.