Difference between revisions of "Apertium-recursive/Bytecode"
Jump to navigation
Jump to search
Popcorndude (talk | contribs) (add distag) |
Popcorndude (talk | contribs) (→How it works: no longer true - removing) |
||
Line 143: | Line 143: | ||
| removes initial < and final > from the string on top of the stack (this makes compiling comparisons easier) |
| removes initial < and final > from the string on top of the stack (this makes compiling comparisons easier) |
||
|} |
|} |
||
== How it works == |
|||
There is an object called <code>parseTower</code> which is an array of arrays (which I call "layers"). When tokens are read from the input stream they are added to layer 0. <code>longestPattern</code> is the length of the longest pattern of any rule and <code>MAXLAYERS</code> is an optional user-defined limit the recursion (currently 1). |
|||
def do_pass(): |
|||
if any layer contains more tokens than longestPattern, use the highest one |
|||
else if there is more input return and wait for it to be read in |
|||
else use the lowest layer that contains tokens |
|||
for the layer chosen, attempt to match as in apertium-interchunk |
|||
if any rules match, apply the longest one |
|||
else move the first token in this layer to the next layer |
|||
def interchunk(): |
|||
while parseTower and the input stream are not both empty: |
|||
if there is input, read 1 token |
|||
do_pass() |
|||
if the number of layers has reached MAXLAYERS: output everything in the top layer |
|||
if longestPattern tokens have been shifted to the top layer without matching, output the first one |
Revision as of 18:43, 26 June 2019
The first 2 characters of the file are the length of the longest pattern and the number of rules. Each rule begins with a byte indicating specifying the length of the rule.
[int] after the name indicates that this instruction is two characters long and the second is to be interpreted as an integer.
Name | Action |
---|---|
drop | pop the top of the stack |
dup | push a copy of the top element |
over | push a copy of the second element |
swap | exchange the first and second elements |
string [int] | pushes the next [int] characters onto the stack as a literal string |
int [int] | pushes [int] onto the stack |
pushfalse | pushes false onto the stack |
pushtrue | pushes true onto the stack |
jump [int] | increments the instruction pointer by [int] |
jumpontrue [int] | pops a bool off the stack and increments the instruction pointer by [int] if it is true |
jumponfalse [int] | pops a bool off the stack and increments the instruction pointer by [int] if it is false |
and | pops 2 bools of the stack and pushes whether both of them are true |
or | pops 2 bools of the stack and pushes whether either of them is true |
not | logically negates top of stack |
equal | push whether the first two strings popped are the same |
isprefix | push whether the first string popped occurs at the beginning of the second |
issuffix | push whether the first string popped occurs at the end of the second |
issubstring | pushes whether the first string popped appears anywhere in the second |
equalcl | equal , but ignores case
|
isprefixcl | isprefix , but ignores case
|
issuffixcl | issuffix , but ignores case
|
issubstringcl | issubstring , but ignores case
|
hasprefix | push whether the second string popped begins with any member of the list named by the first string popped |
hassuffix | push whether the second string popped ends with any member of the list named by the first string popped |
in | push whether the second string popped is a member of the list named by the first |
hasprefixcl | hasprefix , but ignores case
|
hassuffixcl | hassuffix , but ignores case
|
incl | in , but ignores case
|
getcase | pushes "aa", "Aa", or "AA", depending on the case of the first string popped |
setcase | pops two strings, copies the case of the first to the second and pushes the result |
fetchvar | pops a string and pushes the value of the variable with that name |
setvar | pops a two strings and sets the second as the value of the variable named by the first |
sourceclip | pops an int and a string, pushes the value of the source-side clip identified by them |
targetclip | pops an int and a string, pushes the value of the target-side clip identified by them |
referenceclip | pops an int and a string, pushes the value of the reference-side clip identified by them |
setclip | pops an int and two strings, sets the second string as the value of the target-side clip identified by the int and the first string |
chunk | creates an empty chunk and pushes it |
appendchild | pops a chunk and appends it as a child to the chunk underneath it (which remains on the stack) |
appendsurface | pops a string and appends it to the target-side surface chunk underneath it (which remains on the stack) |
appendallchildren | pops a chunk and appends all of its children as children to the chunk underneath it (which remains on the stack) |
output | pops a chunk and appends it to the output queue |
blank | pops an int and pushes the corresponding blank (or a single space if the int is 0) |
concat | pops two strings, concatenates them, and pushes the result |
rejectrule | abort evaluation of current rule and attempt to match a different one |
distag | removes initial < and final > from the string on top of the stack (this makes compiling comparisons easier) |