Difference between revisions of "User:David Nemeskey/GSOC 2013 literature review"
Jump to navigation
Jump to search
(Created page with 'Papers I read about CGs and FSTs. ==== Hulden, Mans. Constraint Grammar parsing with left and right sequential finite transducers<ref>Hulden, Mans. 2011. Constraint Grammar pars…') |
|||
Line 18: | Line 18: | ||
# Rule application is intersection-based. |
# Rule application is intersection-based. |
||
#* The sentence FSAs need no be linear -- the readings in a cohort can be on separate paths. He uses the linear form, though. |
|||
# No implementation available |
# No implementation available |
||
# "VISL CG-3’s CG-2 compatibility mode is 1 500 times faster." |
# "VISL CG-3’s CG-2 compatibility mode is 1 500 times faster." |
||
==== Tapanainen, Pasi. The Constraint Grammar Parser CG-2<ref>Tapanainen, Pasi. 1996. The Constraint Grammar Parser CG-2. University of Helsinki Publications No. 27.</ref> ==== |
|||
THE CG-2 reference book. Does not contain implementation details (not much, anyway -- ran at about 1500 tokens/second on Sun Sparcstation 10, but the number of CPUs, CPU frequency or the amount of memory is not mentioned). |
|||
==== Tapanainen, Pasi. 10 Applying a Finite-State Intersection Grammar<ref>Tapanainen, Pasi. "10 Applying a Finite-State Intersection Grammar." In: Finite-state language processing (1997), pages 311--237.</ref> ==== |
|||
Describes how the CG-2 parser works? Contains several methods of how to intersect the sentence automaton with rule transducers, complete with performance analysis. Nothing of real practical value, though. |
|||
# Rule application is intersection-based. |
|||
# Intersection strategies: |
|||
#* ''Randomized intersection'': apply the rules in random order. But what if one rule enables another, and the "another" has already been applied? Also, the intermediate size of the sentence blows up. |
|||
#* ''"The best next" method'': at each step, intersect the sentence with ALL rules individually. Keep the best. The intermediate size seems to remain relatively small, but ''O(n<sup>2</sup>)'' intersections. |
|||
#* ''Approximated "best next"'': intersect the sentence with ALL rules, order them by size, and apply the rules in that order. ''O(2n)''. |
|||
#* ''Run-time compression'': add some ambiguity -- the proposed method doesn't work, as our input is in that format anyway. |
|||
#* ''Depth-first search'': parallel. Make a tree from the sentence and DPS, apply all rules in parallel in each step, cut the sub-tree if any of the rule automatons reject the string. |
|||
#* ''Hybrid intersection-search'': intersect the rules the reduce ambiguity the most (crop the search tree), and then do the DPS. |
|||
# Speed comparison: hybrix > approximation > DPS >> random. |
|||
# What we can take away from this: |
|||
#* Rules can reject readings? Only the rules in DPS, or is there a way to use fail-fast transducers efficiently? |
|||
#* They probably don't use a standard FST library (xfst?), if they can efficiently move between an FSA and a search tree. |
|||
==== Tapanainen, Pasi. Parsing in two frameworks: finite-state and functional dependency grammar<ref>Tapanainen, Pasi. 1999. Parsing in two frameworks: finite-state and functional dependency grammar. University of Helsinki, Department of General Linguistics.</ref> ==== |
|||
Haven't read yet. |
|||
== References == |
== References == |
Revision as of 07:45, 26 July 2013
Papers I read about CGs and FSTs.
Contents
- 1 Hulden, Mans. Constraint Grammar parsing with left and right sequential finite transducers[1]
- 2 Peltonen, Janne. A Finite State Constraint Grammar Parser[2]
- 3 Tapanainen, Pasi. The Constraint Grammar Parser CG-2[3]
- 4 Tapanainen, Pasi. 10 Applying a Finite-State Intersection Grammar[4]
- 5 Tapanainen, Pasi. Parsing in two frameworks: finite-state and functional dependency grammar[5]
- 6 References
Hulden, Mans. Constraint Grammar parsing with left and right sequential finite transducers[1]
The paper behind fomacg. Describes a way of translating most of CG-2's functionality to foma transducers.
- Rule application seems to be composition-based, the FSTs are run on the sentence string.
- Missing functions:
IFF
MAPPINGS
DELIMITERS
(I have implemented it since)
- Differences between the paper and the implementation:
- Conversion to left-and-right transducers is missing from fomacg
- No performance tests.
Peltonen, Janne. A Finite State Constraint Grammar Parser[2]
Very similar in vein to Mans's work, but the compiler is implemented in Python, and it doesn't handle LINKed contexts.
- Rule application is intersection-based.
- The sentence FSAs need no be linear -- the readings in a cohort can be on separate paths. He uses the linear form, though.
- No implementation available
- "VISL CG-3’s CG-2 compatibility mode is 1 500 times faster."
Tapanainen, Pasi. The Constraint Grammar Parser CG-2[3]
THE CG-2 reference book. Does not contain implementation details (not much, anyway -- ran at about 1500 tokens/second on Sun Sparcstation 10, but the number of CPUs, CPU frequency or the amount of memory is not mentioned).
Tapanainen, Pasi. 10 Applying a Finite-State Intersection Grammar[4]
Describes how the CG-2 parser works? Contains several methods of how to intersect the sentence automaton with rule transducers, complete with performance analysis. Nothing of real practical value, though.
- Rule application is intersection-based.
- Intersection strategies:
- Randomized intersection: apply the rules in random order. But what if one rule enables another, and the "another" has already been applied? Also, the intermediate size of the sentence blows up.
- "The best next" method: at each step, intersect the sentence with ALL rules individually. Keep the best. The intermediate size seems to remain relatively small, but O(n2) intersections.
- Approximated "best next": intersect the sentence with ALL rules, order them by size, and apply the rules in that order. O(2n).
- Run-time compression: add some ambiguity -- the proposed method doesn't work, as our input is in that format anyway.
- Depth-first search: parallel. Make a tree from the sentence and DPS, apply all rules in parallel in each step, cut the sub-tree if any of the rule automatons reject the string.
- Hybrid intersection-search: intersect the rules the reduce ambiguity the most (crop the search tree), and then do the DPS.
- Speed comparison: hybrix > approximation > DPS >> random.
- What we can take away from this:
- Rules can reject readings? Only the rules in DPS, or is there a way to use fail-fast transducers efficiently?
- They probably don't use a standard FST library (xfst?), if they can efficiently move between an FSA and a search tree.
Tapanainen, Pasi. Parsing in two frameworks: finite-state and functional dependency grammar[5]
Haven't read yet.
References
- ↑ Hulden, Mans. 2011. Constraint Grammar parsing with left and right sequential finite transducers. In: Proceedings of the 9th International Workshop on Finite State Methods and Natural Language Processing, pages 39--47.
- ↑ Peltonen, Janne. 2011. A Finite State Constraint Grammar Parser
- ↑ Tapanainen, Pasi. 1996. The Constraint Grammar Parser CG-2. University of Helsinki Publications No. 27.
- ↑ Tapanainen, Pasi. "10 Applying a Finite-State Intersection Grammar." In: Finite-state language processing (1997), pages 311--237.
- ↑ Tapanainen, Pasi. 1999. Parsing in two frameworks: finite-state and functional dependency grammar. University of Helsinki, Department of General Linguistics.