Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Apertium-recursive/Bytecode

From Apertium
Jump to: navigation, search

The first 2 characters of the file are the length of the longest pattern and the number of rules. Each rule begins with a byte indicating specifying the length of the rule.

[int] after the name indicates that this instruction is two characters long and the second is to be interpreted as an integer.

Name Action
drop pop the top of the stack
dup push a copy of the top element
over push a copy of the second element
swap exchange the first and second elements
string [int] pushes the next [int] characters onto the stack as a literal string
int [int] pushes [int] onto the stack
pushfalse pushes false onto the stack
pushtrue pushes true onto the stack
jump [int] increments the instruction pointer by [int]
jumpontrue [int] pops a bool off the stack and increments the instruction pointer by [int] if it is true
jumponfalse [int] pops a bool off the stack and increments the instruction pointer by [int] if it is false
and pops 2 bools of the stack and pushes whether both of them are true
or pops 2 bools of the stack and pushes whether either of them is true
not logically negates top of stack
equal push whether the first two strings popped are the same
isprefix push whether the first string popped occurs at the beginning of the second
issuffix push whether the first string popped occurs at the end of the second
issubstring pushes whether the first string popped appears anywhere in the second
equalcl equal, but ignores case
isprefixcl isprefix, but ignores case
issuffixcl issuffix, but ignores case
issubstringcl issubstring, but ignores case
hasprefix push whether the second string popped begins with any member of the list named by the first string popped
hassuffix push whether the second string popped ends with any member of the list named by the first string popped
in push whether the second string popped is a member of the list named by the first
hasprefixcl hasprefix, but ignores case
hassuffixcl hassuffix, but ignores case
incl in, but ignores case
getcase pushes "aa", "Aa", or "AA", depending on the case of the first string popped
setcase pops two strings, copies the case of the first to the second and pushes the result
fetchvar pops a string and pushes the value of the variable with that name
setvar pops a two strings and sets the second as the value of the variable named by the first
sourceclip pops an int and a string, pushes the value of the source-side clip identified by them
targetclip pops an int and a string, pushes the value of the target-side clip identified by them
referenceclip pops an int and a string, pushes the value of the reference-side clip identified by them
setclip pops an int and two strings, sets the second string as the value of the target-side clip identified by the int and the first string
chunk creates an empty chunk and pushes it
appendchild pops a chunk and appends it as a child to the chunk underneath it (which remains on the stack)
appendsurface pops a string and appends it to the target-side surface chunk underneath it (which remains on the stack)
appendallchildren pops a chunk and appends all of its children as children to the chunk underneath it (which remains on the stack)
output pops a chunk and appends it to the output queue
blank pops an int and pushes the corresponding blank (or a single space if the int is 0)
concat pops two strings, concatenates them, and pushes the result
rejectrule abort evaluation of current rule and attempt to match a different one

How it works

There is an object called parseTower which is an array of arrays (which I call "layers"). When tokens are read from the input stream they are added to layer 0. longestPattern is the length of the longest pattern of any rule and MAXLAYERS is an optional user-defined limit the recursion (currently 1).

def do_pass():
  if any layer contains more tokens than longestPattern, use the highest one
  else if there is more input return and wait for it to be read in
  else use the lowest layer that contains tokens
  
  for the layer chosen, attempt to match as in apertium-interchunk
  if any rules match, apply the longest one
  else move the first token in this layer to the next layer

def interchunk():
  while parseTower and the input stream are not both empty:
    if there is input, read 1 token
    do_pass()
    if the number of layers has reached MAXLAYERS: output everything in the top layer
    if longestPattern tokens have been shifted to the top layer without matching, output the first one
Personal tools