Apertium-recursive/Parser
Pseudocode
def checkForReduce(branch): result = [] if it would be possible to shift another token to this branch: result.append(copy(branch)) rule = get best rule from current transducer state (longest pattern, highest weight) while rule != -1: take the appropriate number of chunks off branch and pass them to the virtual machine if the rule is rejected: get the next best rule else: add the output to the output to the branch update the transducer state call checkForReduce() append updated branch(es) to result break if result is empty: result.append(branch) return result parseGraph = the parse stack (actually a directed graph in this case) next = readToken() while true: add next to each branch of parseGraph, calling checkForReduce() for each branch in parseGraph: if the number of transducer states is 0: discard branch if parseGraph is empty: output everything in the branch that had the fewest nodes if there is a tie, choose the branch with the highest weight if there is still a tie, choose the one that happens to be listed first next = readToken() if the input stream is now empty: output next (it's a blank) end
Notes
The premise of this algorithm is that the information contained in an LR parse table is also contained in the transducers generated by apertium-preprocess-transfer
.
Since we allow incomplete parses, ACCEPT
and ERROR
are equivalent and both are represented by the transducer having no live states.
SHIFT
is represented by an outgoing connection from the current state, and REDUCE
is represented by the current node being final.
Since there are rules which generate different output in different situations as well as rules which generate multiple outputs, the simplest way do deal with them is to take the output of a reduction and act as if it were normal input. The result is that pattern matching is linear in the number of nodes in the tree, rather than being linear in just the leaves, but this increase appears to be cancelled out by the decrease in the amount of checking and rejecting that would have to be done in the virtual machine otherwise.
This algorithm currently branches on shift-reduce conflicts but not reduce-reduce conflicts. The reason for this is that it seemed to me like we had enough information to decide reduce-reduce conflicts by applying LRLM and it also limits the number of times we need to fork the stack, which helps with speed. If it is desired, I can change this, however.
The nice thing about checking for shift-reduce conflicts, is that if there is a possible reduction, the token on the top of the stack is definitely an LU or a chunk and if there is a conflicting shift, then the token to be shifted is definitely a blank, so we can check whether any shifts are possible by stepping the transducer forward by a space and checking whether there are any live states.