User:David Nemeskey/GSOC 2013 literature review
Papers I read about CGs and FSTs.
Contents
- 1 Hulden, Mans. Constraint Grammar parsing with left and right sequential finite transducers[1]
- 2 Peltonen, Janne. A Finite State Constraint Grammar Parser[2]
- 3 Tapanainen, Pasi. The Constraint Grammar Parser CG-2[3]
- 4 Tapanainen, Pasi. 10 Applying a Finite-State Intersection Grammar[4]
- 5 Karttunen, Lauri. The Proper Treatment of Optimality in Computational Phonology[5]
- 6 Tapanainen, Pasi. Parsing in two frameworks: finite-state and functional dependency grammar[6]
- 7 References
Hulden, Mans. Constraint Grammar parsing with left and right sequential finite transducers[1]
The paper behind fomacg. Describes a way of translating most of CG-2's functionality to foma transducers.
- Rule application seems to be composition-based, the FSTs are run on the sentence string.
- Missing functions:
IFF
MAPPINGS
DELIMITERS
(I have implemented it since)
- Differences between the paper and the implementation:
- Conversion to left-and-right transducers is missing from fomacg
- No performance tests.
Peltonen, Janne. A Finite State Constraint Grammar Parser[2]
Very similar in vein to Mans's work, but the compiler is implemented in Python, and it doesn't handle LINKed contexts.
- Rule application is intersection-based.
- The sentence FSAs need no be linear -- the readings in a cohort can be on separate paths. He uses the linear form, though.
- No implementation available
- "VISL CG-3’s CG-2 compatibility mode is 1 500 times faster."
Tapanainen, Pasi. The Constraint Grammar Parser CG-2[3]
THE CG-2 reference book. Does not contain implementation details (not much, anyway -- ran at about 1500 tokens/second on Sun Sparcstation 10, but the number of CPUs, CPU frequency or the amount of memory is not mentioned).
Tapanainen, Pasi. 10 Applying a Finite-State Intersection Grammar[4]
Describes how the CG-2 parser works? Contains several methods of how to intersect the sentence automaton with rule transducers, complete with performance analysis. Nothing of real practical value, though.
- Rule application is intersection-based.
- Intersection strategies:
- Randomized intersection: apply the rules in random order. But what if one rule enables another, and the "another" has already been applied? Also, the intermediate size of the sentence blows up.
- "The best next" method: at each step, intersect the sentence with ALL rules individually. Keep the best. The intermediate size seems to remain relatively small, but O(n2) intersections.
- Approximated "best next": intersect the sentence with ALL rules, order them by size, and apply the rules in that order. O(2n).
- Run-time compression: add some ambiguity -- the proposed method doesn't work, as our input is in that format anyway.
- Depth-first search: parallel. Make a tree from the sentence and DPS, apply all rules in parallel in each step, cut the sub-tree if any of the rule automatons reject the string.
- Hybrid intersection-search: intersect the rules the reduce ambiguity the most (crop the search tree), and then do the DPS.
- Speed comparison: hybrix > approximation > DPS >> random.
- What we can take away from this:
- Rules can reject readings? Only the rules in DPS, or is there a way to use fail-fast transducers efficiently?
- They probably don't use a standard FST library (xfst?), if they can efficiently move between an FSA and a search tree.
Karttunen, Lauri. The Proper Treatment of Optimality in Computational Phonology[5]
Describes a way to handle optimality theory with the FST calculus. Basically, this is the paper the introduces lenient composition (.O.
). As anything else with "composition" in its name, it is not really relevant for us: while it might work well for the toy problem of phonology, where the rule FSTs are in the 1-4 state range, it would choke on our CG rules.
Actually, his handling of X[]
's in section 5 might not be entirely correct. If I define the same rules in foma, the result of applying down bebop and abracadabra will be different: no unparsed elements will be accepted. I first thought it was because Parse0 .O. Parse1 .O. ... .O. ParseN
(which is again erroneous: it should be Parsed
) is a stronger constraint than FillNuc
and FillOns
, so it will delete all X[]
's before the latter two have a chance to do their work, but apparently the doesn't matter: regardless of the order of constraints, Parsed0
discards all readings with X[
's in it.
Tapanainen, Pasi. Parsing in two frameworks: finite-state and functional dependency grammar[6]
Haven't read yet.
References
- ↑ Hulden, Mans. 2011. Constraint Grammar parsing with left and right sequential finite transducers. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.
- ↑ Peltonen, Janne. 2011. A Finite State Constraint Grammar Parser
- ↑ Tapanainen, Pasi. 1996. The Constraint Grammar Parser CG-2. University of Helsinki Publications No. 27.
- ↑ Tapanainen, Pasi. "10 Applying a Finite-State Intersection Grammar." In: Finite-state language processing (1997), pages 311--237.
- ↑ Karttunen, Lauri. 1998. The Proper Treatment of Optimality in Computational Phonology. In: Proceedings of the International Workshop on Finite State Methods in Natural Language Processing. Association for Computational Linguistics.
- ↑ Tapanainen, Pasi. 1999. Parsing in two frameworks: finite-state and functional dependency grammar. University of Helsinki, Department of General Linguistics.