Difference between revisions of "VM for transfer"

From Apertium
Jump to navigation Jump to search
Line 299: Line 299:
end: ...
end: ...
</code>
</code>

== Development Notes ==

* None of the macro and actions need to return anything (unlike conventional functions), so provision for returning a value (using stack) is unnecessary

Revision as of 01:22, 31 May 2010

Instruction Sets

Mnemonic Opcode
(in hex)
Other operands Stack
[before]→[after]
Description
push - value [empty] → value Pushes a value in stack
pushv - var [empty] → value Evaluates the var and pushes its value in stack
pusht - var [empty] → <value> Evaluates the var and pushes its value as a tag in stack
pushbl - N/A [empty] → <blank> pushes a blank in the stack
pushsb - N/A [pos] → superblank pushes the superblank at 'pos' in stack
cliptl - N/A [pos, regex] → value Matches 'regex' in target language 'pos' and pushes the value in stack
clipsl - N/A [pos, regex] → value Matches 'regex' in source language 'pos' and pushes the value in stack
storetl - N/A [pos, regex, data] → value Replace 'regex' in source language 'pos' with 'data'
addtrie - address [pattern, pattern, ..., no_of_patterns] → [empty] Pops 'no_of_pattern' amount of data from the stack, combine these patterns, add that to the trie pointing to given 'address'
lu - N/A [lemma, tag1, ..., num] → ^(lexical_unit)$ Pops 'num' amount of data from the stack and creates a lexical unit ^... ...$ of of them, pushes the lu back in the stack
brace - N/A [lu1, blank1, lu2, blank2, ..., num] → {... ...} Pops 'num' amount of data from the stack and creates the braced version {... ... ...}, pushes back in the stack
chunk - N/A [chunk_name, tag1, tag2, ... , {^... ...$}, num] → ^chunk_name<tag1>...<tagn>{^... ...$}$ Pops 'num' amount of data from the stack and creates the chunk, pushes back in the stack
out - N/A [chunk1, chunk2, ..., num] → [empty] Pops 'num' amount of data from the stack and puts then in standard output
cmpi - N/A [data1, data2] → [empty] Pops data1 and data2, string compares them (ignorecase), if matches (succssful), set zero flag to 0
jmp - N/A label → [empty] Jumps to the label (unconditional jump)
jnz - N/A label → [empty] Jumps to the label if zero flag is 1 (non zero)
hlt - N/A Halts the program
return - N/A PC → [empty] Returns from a subroutine
nop - N/A [empty] → [empty] No operation

Sample Translation from XML to byte-code

example 1

t1x code

<out>
  <chunk name="det_det_nom_adj" case="caseFirstWord">
    <tags>
      <tag><lit-tag v="SN"/></tag>
      <tag></tag>
      <tag></tag>
      <tag></tag>
    </tags>
    <lu>
      <clip pos="1" side="tl" part="lem"/>
      <clip pos="1" side="tl" part="a_det"/>
      <clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
      <clip pos="1" side="tl" part="gen_mf"/>
      <clip pos="1" side="tl" part="nbr_sense_sp" link-to="4"/>
      <clip pos="1" side="tl" part="nbr_sp"/>
    </lu>
    
    <lu>
      <lit v="el"/>
      <lit-tag v="det.def"/>
      <clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
      <lit-tag v="pl"/>
    </lu>
    
    <lu>
      <clip pos="3" side="tl" part="lemh"/>
      <clip pos="3" side="tl" part="a_nom"/>
      <clip pos="3" side="tl" part="gen_sense_mf" link-to="3"/>
      <clip pos="3" side="tl" part="gen_mf"/>
      <clip pos="3" side="tl" part="nbr_sense_sp" link-to="4"/>
      <clip pos="3" side="tl" part="nbr_sp"/>
      <clip pos="3" side="tl" part="lemq"/>
    </lu>
    
    
    <lu>
      
      <clip pos="2" side="tl" part="lemh"/>
      <clip pos="2" side="tl" part="a_adj"/>
      <clip pos="2" side="tl" part="gen_sense_mf" link-to="3"/>
      <clip pos="2" side="tl" part="gen_mf"/>
      <clip pos="2" side="tl" part="nbr_sense_sp" link-to="4"/>
      <clip pos="2" side="tl" part="nbr_sp" link-to="4"/>
      <clip pos="2" side="tl" part="lemq"/>
    </lu>
  </chunk>
</out>

compiled code

push    "det_det_nom_adj"
push    "<SN>"
pusht   tipus_det          ; first evaluate the variable, append/prepend '<>', then push in the stack
pusht   gen_chunk
pusht   nbr_chunk

push    1
push    "^\w+"             ; lem
cliptl
push    1
push    [regex]            ; a_det
cliptl
push    "<3>"              ; since link-to overrides everything else, we do not need any dedicated instruction
                           ; for that
push    1
push    [regex]            ; gen_mf
cliptl
push    "<4>"
push    1
push    [regex]            ; nbr_sp
cliptl
push    6
lu                         ; pop 6 items, concat, create lexical unit ^...$ and push back in stack

pushbl                     ; push a blank

push    "el"
push    "<det><def>"
push    "<3>"
push    "<pl>"
push    4
lu                         ; pop 4 items from the stack, create a lexical unit ^...$ and then
                           ; push in the stack

push   1
pushsb

push   3
push   [regex]             ; lemh
cliptl
push   3
push   [regex]             ; a_nom
cliptl
push   "<3>"
push   3
push   [regex]             ; gen_mf
cliptl
push   "<4>"
push   3
push   [regex]             ; nbr_sp
cliptl
push   3
push   [regex]             ; lemq
cliptl
push   7
lu

pushbl
push   2
pushsb

pushv  adjectiu1           ; its a var, so eval and push the value
push   3
push   [regex]             ; lemh
cliptl
push   3
push   [regex]             ; a_adj
cliptl
push   "<3>"
push   3
push   [regex]             ; gen_mf
cliptl
push   "<4>"
push   "<4>"               ; a bit confused, there are two link-to in the XML
push   3
push   [regex]             ; lemq
cliptl
push   7
lu

push   7                   ; no of blank + lexical unit = 7
brace                      ; pop 7 items, concat, prepend and append {, } then push back

push   6
chunk                      ; create the chunk, ^...{^...$}$, and push back in stack

push   1                   ; number of chunks
out                        ; give output

example 2

t1x code

<section-def-cats>
  <def-cat n="nom">
    <cat-item tags="n.*"/>
  </def-cat>
 
  <def-cat n="det">
    <cat-item tags="det.*"/>
    <cat-item tags="predet.*"/>
  </def-cat>
</section-def-cats>

<section-rules>
  <rule>
    <pattern>
      <pattern-item n="det"/>
    </pattern>
  </rule>
  <rule>
    <pattern>
      <pattern-item n="nom"/>
    </pattern>
    <action/>
  </rule>
  <rule>
    <pattern>
      <pattern-item n="det"/>
      <pattern-item n="nom"/>
    </pattern>
  <action/>
  </rule>
</section-rules>

compiled code

                         ;first rule: def-cat has two equivalent cat-items
push      "\w<det>\t"    ;load pattern into stack
push      1
addtrie   [address1]     ;define a trie pattern with value 1 (the first rule)

push      "\w<predet>\t" ;same with the second cat-item
push      1
addtrie   [address1]
                         ;second rule (and so on) very simple, unique cat-item
push      "\w<n>\t"
push      1
addtrie   [address2]
                         ;third rule (here is the trick: multiple cat-items in one of the words)
push      "\w<det>\t"
push      "\w<n>\t"
push      2              ; we have 'det' followed by a 'nom', so addtrie has to pop two elements
addtrie   [address3]

push      "\w<predet>\t"
push      "\w<n>\t"
push      2
addtrie   [address3]

example 3

t1x code

<def-macro n="f_coma" npar="1">
  <choose>
    <when>
      <test>
        <equal caseless="yes">
          <clip pos="1" side="sl" part="lem"/>
          <lit v="como"/>
        </equal>
      </test>
      <let>
        <clip pos="1" side="tl" part="lem"/>
        <get-case-from pos="1">
          <lit v="com a"/>
        </get-case-from>
      </let>
    </when>
  </choose>
</def-macro>

compiled code

f_coma:  push      1        ; "pos" of "clip"
         push      "^\w+"   ; "lem"
         clipsl             ; gets the value clips on the top of the stack.
                            ; "sl" side is implied in the name of the instruction
         push      "como"
         cmpi               ; does the comparison and cleans the stack, it means caseless
         jnz       end      ; if the comparison does not succeeds, go to end
                            ; semantics: j = jump n = not z = zero flag is activated
                            ; zero flag is activated when a comparison succeeds
                            ; or an arithmetical operation gives 0
         push      1        ; "pos" of "clip"
         push      "^\w+"
         push      "com a"
         storetl            ; store the value provided in the top of the stack
                            ; given position 1, "tl" side and "lem"

end:     ...

Development Notes

  • None of the macro and actions need to return anything (unlike conventional functions), so provision for returning a value (using stack) is unnecessary