User:David Nemeskey/GSOC 2013 literature review

From Apertium
Jump to navigation Jump to search

Papers I read about CGs and FSTs.

Hulden, Mans. Constraint Grammar parsing with left and right sequential finite transducers[1]

The paper behind fomacg. Describes a way of translating most of CG-2's functionality to foma transducers.

  1. Rule application seems to be composition-based, the FSTs are run on the sentence string.
  2. Missing functions:
    • IFF
    • MAPPINGS
    • DELIMITERS (I have implemented it since)
  3. Differences between the paper and the implementation:
    • Conversion to left-and-right transducers is missing from fomacg
  4. No performance tests.

Peltonen, Janne. A Finite State Constraint Grammar Parser[2]

Very similar in vein to Mans's work, but the compiler is implemented in Python, and it doesn't handle LINKed contexts.

  1. Rule application is intersection-based.
    • The sentence FSAs need no be linear -- the readings in a cohort can be on separate paths. He uses the linear form, though.
  2. No implementation available
  3. "VISL CG-3’s CG-2 compatibility mode is 1 500 times faster."

Tapanainen, Pasi. The Constraint Grammar Parser CG-2[3]

THE CG-2 reference book. Does not contain implementation details (not much, anyway -- ran at about 1500 tokens/second on Sun Sparcstation 10, but the number of CPUs, CPU frequency or the amount of memory is not mentioned).

Tapanainen, Pasi. 10 Applying a Finite-State Intersection Grammar[4]

Describes how the CG-2 parser works? Contains several methods of how to intersect the sentence automaton with rule transducers, complete with performance analysis. Nothing of real practical value, though.

  1. Rule application is intersection-based.
  2. Intersection strategies:
    • Randomized intersection: apply the rules in random order. But what if one rule enables another, and the "another" has already been applied? Also, the intermediate size of the sentence blows up.
    • "The best next" method: at each step, intersect the sentence with ALL rules individually. Keep the best. The intermediate size seems to remain relatively small, but O(n2) intersections.
    • Approximated "best next": intersect the sentence with ALL rules, order them by size, and apply the rules in that order. O(2n).
    • Run-time compression: add some ambiguity -- the proposed method doesn't work, as our input is in that format anyway.
    • Depth-first search: parallel. Make a tree from the sentence and DPS, apply all rules in parallel in each step, cut the sub-tree if any of the rule automatons reject the string.
    • Hybrid intersection-search: intersect the rules the reduce ambiguity the most (crop the search tree), and then do the DPS.
  3. Speed comparison: hybrix > approximation > DPS >> random.
  4. What we can take away from this:
    • Rules can reject readings? Only the rules in DPS, or is there a way to use fail-fast transducers efficiently?
    • They probably don't use a standard FST library (xfst?), if they can efficiently move between an FSA and a search tree.

Tapanainen, Pasi. Parsing in two frameworks: finite-state and functional dependency grammar[5]

Haven't read yet.

References

  1. Hulden, Mans. 2011. Constraint Grammar parsing with left and right sequential finite transducers. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.
  2. Peltonen, Janne. 2011. A Finite State Constraint Grammar Parser
  3. Tapanainen, Pasi. 1996. The Constraint Grammar Parser CG-2. University of Helsinki Publications No. 27.
  4. Tapanainen, Pasi. "10 Applying a Finite-State Intersection Grammar." In: Finite-state language processing (1997), pages 311--237.
  5. Tapanainen, Pasi. 1999. Parsing in two frameworks: finite-state and functional dependency grammar. University of Helsinki, Department of General Linguistics.