VM for transfer
Contents
Instruction Sets
Mnemonic | Opcode (in hex) |
Other operands | Stack [before]→[after] |
Description |
---|---|---|---|---|
push | - | value | [empty] → value | Pushes a value in stack |
pushv | - | var | [empty] → value | Evaluates the var and pushes its value in stack |
pusht | - | var | [empty] → <value> | Evaluates the var and pushes its value as a tag in stack |
pushbl | - | N/A | [empty] → blank | pushes a blank in the stack |
pushsb | - | pos | [empty] → superblank | pushes the superblank at 'pos' in stack |
pushz | - | N/A | [empty] → [zero_flag] | pushes the current value of zero_flag in stack |
pushnz | - | N/A | [empty] → [not_zero_flag] | first takes the NOT of the current value of zero_flag, then pushes the value in stack |
cliptl | - | N/A | pos, regex → value | Matches 'regex' in target language 'pos' and pushes the value in stack |
clipsl | - | N/A | pos, regex → value | Matches 'regex' in source language 'pos' and pushes the value in stack |
storetl | - | N/A | pos, regex, data → value | Replace 'regex' in source language 'pos' with 'data' |
addtrie | - | address | pattern, pattern, ..., no_of_patterns → [empty] | Pops 'no_of_pattern' amount of data from the stack, combine these patterns, add that to the trie pointing to given 'address' |
lu | - | num | lemma, tag1, ..., tagn → ^(lexical_unit)$ | Pops 'num' amount of data from the stack and creates a lexical unit ^... ...$ with them, pushes the lu back in the stack |
brace | - | num | lu1, blank1, lu2, blank2, ..., lun → {... ...} | Pops 'num' amount of data from the stack and creates the braced version {... ... ...}, pushes it back in the stack |
chunk | - | num | chunk_name, tag1, tag2, ... , {^... ...$} → ^chunk_name<tag1>...<tagn>{^... ...$}$ | Pops 'num' amount of data from the stack and creates the chunk, pushes back in the stack |
out | - | num | chunk1, chunk2, ... → [empty] | Pops 'num' amount of data from the stack and puts then in standard output |
cmpi | - | N/A | data1, data2 → [empty] | Pops data1 and data2, string compares them (ignorecase), if matches (successful), set zero flag to 1 (it means we have a zero) |
cmp | - | N/A | data1, data2 → [empty] | Pops data1 and data2, string compares them (case sensitive), if matches (successful), set zero flag to 1 |
match | - | N/A | string, regex → [empty] | Pops 'string' and 'regex', matches the string against the regex, if matches (successful), set ZF = 1 |
jmp | - | label | [empty] → [empty] | Jumps to the label (unconditional jump) |
jz | - | label | [empty] → [empty] | Jumps to the label if zero flag is 1 |
jnz | - | label | [empty] → [empty] | Jumps to the label if zero flag is 0 |
hlt | - | N/A | Halts the program | |
call | - | label | arg(n),..., arg2, arg1, npar → [empty] | call a macro (subroutine), see example 6 for details |
ret | - | N/A | PC → [empty] | Returns from a macro, PC will be placed in stack by call statement, so no need to manually push PC |
nop | - | N/A | No operation |
Sample compilation of XML code fragments
Example 1
XML t1x Code: chunking
<out>
<chunk name="det_det_nom_adj" case="caseFirstWord">
<tags>
<tag><lit-tag v="SN"/></tag>
<tag></tag>
<tag></tag>
<tag></tag>
</tags>
<lu>
<clip pos="1" side="tl" part="lem"/>
<clip pos="1" side="tl" part="a_det"/>
<clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
<clip pos="1" side="tl" part="gen_mf"/>
<clip pos="1" side="tl" part="nbr_sense_sp" link-to="4"/>
<clip pos="1" side="tl" part="nbr_sp"/>
</lu>
<lu>
<lit v="el"/>
<lit-tag v="det.def"/>
<clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
<lit-tag v="pl"/>
</lu>
<lu>
<clip pos="3" side="tl" part="lemh"/>
<clip pos="3" side="tl" part="a_nom"/>
<clip pos="3" side="tl" part="gen_sense_mf" link-to="3"/>
<clip pos="3" side="tl" part="gen_mf"/>
<clip pos="3" side="tl" part="nbr_sense_sp" link-to="4"/>
<clip pos="3" side="tl" part="nbr_sp"/>
<clip pos="3" side="tl" part="lemq"/>
</lu>
<lu>
<clip pos="2" side="tl" part="lemh"/>
<clip pos="2" side="tl" part="a_adj"/>
<clip pos="2" side="tl" part="gen_sense_mf" link-to="3"/>
<clip pos="2" side="tl" part="gen_mf"/>
<clip pos="2" side="tl" part="nbr_sense_sp" link-to="4"/>
<clip pos="2" side="tl" part="nbr_sp" link-to="4"/>
<clip pos="2" side="tl" part="lemq"/>
</lu>
</chunk>
</out>
Compiled Code
push "det_det_nom_adj"
push "<SN>"
pusht tipus_det ; first evaluate the variable, append/prepend '<>', then push in the stack
pusht gen_chunk
pusht nbr_chunk
push 1
push "^\w+" ; lem
cliptl
push 1
push [regex] ; a_det
cliptl
push "<3>" ; since link-to overrides everything else, we do not need any dedicated instruction
; for that
push 1
push [regex] ; gen_mf
cliptl
push "<4>"
push 1
push [regex] ; nbr_sp
cliptl
lu 6 ; pop 6 items, concat, create lexical unit ^...$ and push back in stack
pushbl ; push a blank
push "el"
push "<det><def>"
push "<3>"
push "<pl>"
lu 4 ; pop 4 items from the stack, create a lexical unit ^...$ and then
; push in the stack
pushsb 1
push 3
push [regex] ; lemh
cliptl
push 3
push [regex] ; a_nom
cliptl
push "<3>"
push 3
push [regex] ; gen_mf
cliptl
push "<4>"
push 3
push [regex] ; nbr_sp
cliptl
push 3
push [regex] ; lemq
cliptl
lu 7
pushbl
pushsb 2
pushv adjectiu1 ; its a var, so eval and push the value
push 3
push [regex] ; lemh
cliptl
push 3
push [regex] ; a_adj
cliptl
push "<3>"
push 3
push [regex] ; gen_mf
cliptl
push "<4>"
push "<4>" ; a bit confused, there are two link-to in the XML
push 3
push [regex] ; lemq
cliptl
lu 7
brace 7 ; no of blank + lexical unit = 7
; pop 7 items, concat, prepend and append {, } then push back
chunk 6 ; create the chunk, ^...{^...$}$, and push back in stack
out 1 ; give output (number of chunks = 1)
Example 2
XML t1x Code
<section-def-cats>
<def-cat n="nom">
<cat-item tags="n.*"/>
</def-cat>
<def-cat n="det">
<cat-item tags="det.*"/>
<cat-item tags="predet.*"/>
</def-cat>
</section-def-cats>
<section-rules>
<rule>
<pattern>
<pattern-item n="det"/>
</pattern>
</rule>
<rule>
<pattern>
<pattern-item n="nom"/>
</pattern>
<action/>
</rule>
<rule>
<pattern>
<pattern-item n="det"/>
<pattern-item n="nom"/>
</pattern>
<action/>
</rule>
</section-rules>
Compiled Code
;first rule: def-cat has two equivalent cat-items
push "\w<det>\t" ;load pattern into stack
push 1
addtrie [address1] ;define a trie pattern with value 1 (the first rule)
push "\w<predet>\t" ;same with the second cat-item
push 1
addtrie [address1]
;second rule (and so on) very simple, unique cat-item
push "\w<n>\t"
push 1
addtrie [address2]
;third rule (here is the trick: multiple cat-items in one of the words)
push "\w<det>\t"
push "\w<n>\t"
push 2 ; we have 'det' followed by a 'nom', so addtrie has to pop two elements
addtrie [address3]
push "\w<predet>\t"
push "\w<n>\t"
push 2
addtrie [address3]
Example 3
XML t1x Code
<def-macro n="f_coma" npar="1">
<choose>
<when>
<test>
<equal caseless="yes">
<clip pos="1" side="sl" part="lem"/>
<lit v="como"/>
</equal>
</test>
<let>
<clip pos="1" side="tl" part="lem"/>
<get-case-from pos="1">
<lit v="com a"/>
</get-case-from>
</let>
</when>
</choose>
</def-macro>
Compiled code
f_coma: push 1 ; "pos" of "clip"
push "^\w+" ; "lem"
clipsl ; gets the value clips on the top of the stack.
; "sl" side is implied in the name of the instruction
push "como"
cmpi ; does the comparison and cleans the stack, it means caseless
jnz end ; if the comparison does not succeeds, go to end
; semantics: j = jump n = not z = zero flag is activated
; zero flag is activated when a comparison succeeds
; or an arithmetical operation gives 0
push 1 ; "pos" of "clip"
push "^\w+"
push "com a"
storetl ; store the value provided in the top of the stack
; given position 1, "tl" side and "lem"
end: ...
Example 4
XML t1x Code
<test>
<or>
<not>
<equal>
<clip pos="1" side="sl" part="gen"/>
<clip pos="3" side="sl" part="gen"/>
</equal>
</not>
<not>
<equal>
<clip pos="2" side="sl" part="gen"/>
<clip pos="3" side="sl" part="gen"/>
</equal>
</not>
</or>
</test>
Compiled code
start: push 1
push [regex] ; part="gen"
clipsl
push 3
push [regex] ; part="gen"
clipsl
cmp ; compare (case sensitive)
pushnz ; NOT zero flag and push in stack
push 2
push [regex] ; part="gen"
clipsl
push 3
push [regex] ; part="gen"
clipsl
cmp ; compare (case sensitive)
pushnz
or ; pop 2 items and OR, push result in stack
jnz end ; jump if zero flag is 0 (we did not get ZERO as the result)
... ... ...
(code for successful test)
... ... ...
end: ...
Example 5
XML t1x Code
<def-list n="verbos_est"> <list-item v="actuar"/> <list-item v="buscar"/> <list-item v="estudiar"/> <list-item v="existir"/> <list-item v="ingressar"/> <list-item v="introduir"/> <list-item v="penetrar"/> <list-item v="publicar"/> <list-item v="treballar"/> <list-item v="viure"/> </def-list> <rule> <pattern> <pattern-item n="verb"/> <pattern-item n="a"/> </pattern> <action> <choose> <when> <test> <in caseless="yes"/> <clip pos="1" side="sl" part="lem"/> <list n="verbos_est"/> </in> </test> <let> <clip pos="2" side="tl" part="lem"/> <lit v="en"/> </let> </when> </choose> </rule>
Compiled code
push "actuar" push "buscar" push "estudiar" push "existir" push "ingressar" push "introduir" push "penetrar" push "publicar" push "treballar" push "viure" push 10 ; number of elements in the list mklist verbos_est ; make a list variable named 'verbos_est' and put the last 10 data ; from the stack in the list rule1: push [regex_verb] push [regex_a] push 2 addtrie rule1_action ... ... ... ... ... ... rule1_action: push 1 push "^\w+" ; lem clipsl ; we have lemmma in stack now incini verbox_est ; if in verbos_est (ignore case), set ZF = 1, else ZF = 0 jnz rule1_end push 2 push "^\w+" push "en" storetl rule1_end: ...
Example 6
XML t1x Code
<def-macro n="firstWord" npar="1"> <choose> <when> <test> <equal> <clip pos="1" side="sl" part="a_np_acr"/> <lit v=""/> </equal> </test> <choose> <when> <test> <equal> <lit v="true"/> </equal> </test> <modify-case> <clip pos="1" side="tl" part="lem"/> <lit v="aa"/> </modify-case> <let> <lit v="Aa"/> </let> </when> <otherwise> <let> <lit v="aa"/> </let> </otherwise> </choose> </when> <otherwise> <let> <lit v="aa"/> </let> </otherwise> </choose> <let> <lit v="false"/> </let> </def-macro> <rule comment="REGLA: DET DET ADJ NOM (your many beautiful cats)"> ... ... <action> <call-macro n="firstWord"> <with-param pos="1"/> </call-macro> <call-macro n="f_concord4"> <with-param pos="4"/> <with-param pos="3"/> <with-param pos="2"/> <with-param pos="1"/> </call-macro> ... <out> <chunk name="det_det_nom_adj" case="caseFirstWord"> ... ... </chunk> </out> </action> </rule>
Compiled code
firstWord: ... ... ; normal translation of instructions, all the variables are assumed global ... ... ret ; ret instruction does a number of things ; pops 'frame stack', current 'local variable frame' is reset with popped ; values (actually its more pointer assignment), C++ version will also ; do the necessary deallocations ; pops global stack, update PC with the popped value ... ... ... ... rule_ddan_action: push 1 ; pos = 1 push 1 ; number of parameters 1 call firstWord ; macro label ; call statement does a number of things ; 1. temppc = PC + 1, set PC = firstWord ; 2. pushes the current 'local variable frame' into 'frame stack' ; 3. create a new 'local variable frame' ; 4. pops the arguments from the stack and places then in the 'local ; variable frame' ; 5. pushes temppc in global stack (it will be used by the return ; statement) ; 6. continue (instruction at firstWord will be evaluated next) push 1 ; notice that the arguments are pushed in reverse order ; when popped, they will be in the right order push 2 push 3 push 4 push 4 call f_concord4 ... ...
Development Notes
- None of the macro and actions need to return anything (unlike conventional functions), so provision for returning a value (using stack) is unnecessary
- The local variable frame is actually a queue with a maximum length equal to the maximum pattern length in the trie.