Apertium-recursive/Bytecode
< Apertium-recursive
Jump to navigation
Jump to search
Revision as of 13:59, 5 June 2019 by Popcorndude (talk | contribs)
The first 2 characters of the file are the length of the longest pattern and the number of rules.
Code | Name | Action |
---|---|---|
R [int] | rule | marks the start of a new rule composed of the next [int] characters |
s [int] | string | pushes the next [int] characters onto the stack as a literal string |
j [int] | jump | increments the instruction pointer by [int] |
? [int] | jump if not | pops a bool off the stack, increments instruction pointer by [int] if its false |
& [int] | and | pops [int] bools of the stack and pushes whether all of them are true |
[int] | or | pops [int] bools of the stack and pushes whether any of them are true |
! | not | logically negates top of stack |
= / =# | equal | push whether the first two strings popped are the same (=# ignores case) |
( / (# | begins with | push whether the first string popped occurs at the beginning of the second ((# ignores case )
|
) / )# | ends with | push whether the first string popped occurs at the end of the second ((# ignores case )
|
[ / [# | begins with list | push whether the second string popped begins with any member of the list named by the first string popped ([# ignores case) |
] / ]# | ends with list | push whether the second string popped ends with any member of the list named by the first string popped (]# ignores case) |
c / c# | contains | push whether the first string popped appears anywhere in the second (c# ignores case) |
n / n# | in | push whether the second string popped is a member of the list named by the first (n# ignores case) |
> | begin let | indicates that the next clip or var statement should not be evaluated |
* / *# | end let clip | pops a value and an unevaluated clip and sets the clip to the value (*# copies the case of the value to the clip) |
4 / 4# | end let var | pops a value and a variable name and sets the variable to the value (4# copies the case of the value to the variable) |
< [int] | out | pops [int] chunks off the stack and appends them to the output queue in the order that they were pushed (in recursive mode, the output queue is later passed back to the rule applier) |
. [int] | clip | if preceded by >, pushes [int] onto the stack, otherwise pops a string off the stack and retrieves that property of the position indicated by [int] |
$ | var | if preceded by >, do nothing, otherwise pops a string off the stack and pushes the value of the variable with that name |
G | get case | pops a string off the stack, pushes "AA", "Aa", or "aa" depending on its case |
A | copy case | pops a string off the stack, copies its cases onto the next string on the stack |
+ [int] | concat | pops [int] strings off the stack, concatenates them, and pushes the result |
{ [int] | chunk | pops [int] items off the stack and puts them into a chunk (there are currently problems with this command) |
p | pseudolemma | pop a chunk off the stack and push its pseudolemma |
(space) | space | push a blank containing a single space onto the stack |
_ [int] | blank | push the superblank after position [int] onto the stack |
Features of .t?x that aren't covered yet:
- reject-current-rule (add skip_rules list as input to interchunk_do_pass)
- mlu
- lu-count
- clip side (also add anaphora as an option)
How it works
There is an object called parseTower
which is an array of arrays (which I call "layers"). When tokens are read from the input stream they are added to layer 0. longestPattern
is the length of the longest pattern of any rule and MAXLAYERS
is an optional user-defined limit the recursion (currently 1).
def do_pass(): if any layer contains more tokens than longestPattern, use the highest one else if there is more input return and wait for it to be read in else use the lowest layer that contains tokens for the layer chosen, attempt to match as in apertium-interchunk if any rules match, apply the longest one else move the first token in this layer to the next layer def interchunk(): while parseTower and the input stream are not both empty: if there is input, read 1 token do_pass() if the number of layers has reached MAXLAYERS: output everything in the top layer if longestPattern tokens have been shifted to the top layer without matching, output the first one