Difference between revisions of "VM for transfer"

From Apertium
Jump to navigation Jump to search
Line 388: Line 388:


==== Compiled code ====
==== Compiled code ====
push "actuar"
push "actuar"
push "buscar"
push "buscar"
push "estudiar"
push "estudiar"
push "existir"
push "existir"
push "ingressar"
push "ingressar"
push "introduir"
push "introduir"
push "penetrar"
push "penetrar"
push "publicar"
push "publicar"
push "treballar"
push "treballar"
push "viure"
push "viure"
push 10 ; number of elements in the list
push 10 ; number of elements in the list
mklist verbos_est ; make a list vaiable named 'verbos_est' and put the last 10 data
mklist verbos_est ; make a list variable named 'verbos_est' and put the last 10 data
; from the stack in the list
; from the stack in the list
rule1: push [regex_verb]
rule1: push [regex_verb]
Line 410: Line 410:
rule1_action: push 1
rule1_action: push 1
push "^\w+" ; lem
push "^\w+" ; lem
clipsl ; we have lem in stack now
clipsl ; we have lemmma in stack now
incin verbox_est ; if in verbos_est, set ZF = 1, else ZF = 0
incin verbox_est ; if in verbos_est, set ZF = 1, else ZF = 0
jnz rule1_end
jnz rule1_end
push 2
push 2

Revision as of 00:46, 5 June 2010

Instruction Sets

Mnemonic Opcode
(in hex)
Other operands Stack
[before]→[after]
Description
push - value [empty] → value Pushes a value in stack
pushv - var [empty] → value Evaluates the var and pushes its value in stack
pusht - var [empty] → <value> Evaluates the var and pushes its value as a tag in stack
pushbl - N/A [empty] → blank pushes a blank in the stack
pushsb - pos [empty] → superblank pushes the superblank at 'pos' in stack
pushz - N/A [empty] → [zero_flag] pushes the current value of zero_flag in stack
pushnz - N/A [empty] → [not_zero_flag] first takes the NOT of the current value of zero_flag, then pushes the value in stack
cliptl - N/A pos, regex → value Matches 'regex' in target language 'pos' and pushes the value in stack
clipsl - N/A pos, regex → value Matches 'regex' in source language 'pos' and pushes the value in stack
storetl - N/A pos, regex, data → value Replace 'regex' in source language 'pos' with 'data'
addtrie - address pattern, pattern, ..., no_of_patterns → [empty] Pops 'no_of_pattern' amount of data from the stack, combine these patterns, add that to the trie pointing to given 'address'
lu - num lemma, tag1, ..., tagn → ^(lexical_unit)$ Pops 'num' amount of data from the stack and creates a lexical unit ^... ...$ with them, pushes the lu back in the stack
brace - num lu1, blank1, lu2, blank2, ..., lun → {... ...} Pops 'num' amount of data from the stack and creates the braced version {... ... ...}, pushes it back in the stack
chunk - num chunk_name, tag1, tag2, ... , {^... ...$} → ^chunk_name<tag1>...<tagn>{^... ...$}$ Pops 'num' amount of data from the stack and creates the chunk, pushes back in the stack
out - num chunk1, chunk2, ... → [empty] Pops 'num' amount of data from the stack and puts then in standard output
cmpi - N/A data1, data2 → [empty] Pops data1 and data2, string compares them (ignorecase), if matches (successful), set zero flag to 1 (it means we have a zero)
cmp - N/A data1, data2 → [empty] Pops data1 and data2, string compares them (case sensitive), if matches (successful), set zero flag to 1
jmp - label [empty] → [empty] Jumps to the label (unconditional jump)
jz - label [empty] → [empty] Jumps to the label if zero flag is 1
jnz - label [empty] → [empty] Jumps to the label if zero flag is 0
hlt - N/A Halts the program
return - N/A PC → [empty] Returns from a subroutine
nop - N/A No operation

Sample compilation of XML code fragments

Example 1

XML t1x Code: chunking

<out>
  <chunk name="det_det_nom_adj" case="caseFirstWord">
    <tags>
      <tag><lit-tag v="SN"/></tag>
      <tag></tag>
      <tag></tag>
      <tag></tag>
    </tags>
    <lu>
      <clip pos="1" side="tl" part="lem"/>
      <clip pos="1" side="tl" part="a_det"/>
      <clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
      <clip pos="1" side="tl" part="gen_mf"/>
      <clip pos="1" side="tl" part="nbr_sense_sp" link-to="4"/>
      <clip pos="1" side="tl" part="nbr_sp"/>
    </lu>
    
    <lu>
      <lit v="el"/>
      <lit-tag v="det.def"/>
      <clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
      <lit-tag v="pl"/>
    </lu>
    
    <lu>
      <clip pos="3" side="tl" part="lemh"/>
      <clip pos="3" side="tl" part="a_nom"/>
      <clip pos="3" side="tl" part="gen_sense_mf" link-to="3"/>
      <clip pos="3" side="tl" part="gen_mf"/>
      <clip pos="3" side="tl" part="nbr_sense_sp" link-to="4"/>
      <clip pos="3" side="tl" part="nbr_sp"/>
      <clip pos="3" side="tl" part="lemq"/>
    </lu>
    
    
    <lu>
      
      <clip pos="2" side="tl" part="lemh"/>
      <clip pos="2" side="tl" part="a_adj"/>
      <clip pos="2" side="tl" part="gen_sense_mf" link-to="3"/>
      <clip pos="2" side="tl" part="gen_mf"/>
      <clip pos="2" side="tl" part="nbr_sense_sp" link-to="4"/>
      <clip pos="2" side="tl" part="nbr_sp" link-to="4"/>
      <clip pos="2" side="tl" part="lemq"/>
    </lu>
  </chunk>
</out>

Compiled Code

push    "det_det_nom_adj"
push    "<SN>"
pusht   tipus_det          ; first evaluate the variable, append/prepend '<>', then push in the stack
pusht   gen_chunk
pusht   nbr_chunk

push    1
push    "^\w+"             ; lem
cliptl
push    1
push    [regex]            ; a_det
cliptl
push    "<3>"              ; since link-to overrides everything else, we do not need any dedicated instruction
                           ; for that
push    1
push    [regex]            ; gen_mf
cliptl
push    "<4>"
push    1
push    [regex]            ; nbr_sp
cliptl
lu      6                  ; pop 6 items, concat, create lexical unit ^...$ and push back in stack

pushbl                     ; push a blank

push    "el"
push    "<det><def>"
push    "<3>"
push    "<pl>"
lu      4                  ; pop 4 items from the stack, create a lexical unit ^...$ and then
                           ; push in the stack

pushsb 1

push   3
push   [regex]             ; lemh
cliptl
push   3
push   [regex]             ; a_nom
cliptl
push   "<3>"
push   3
push   [regex]             ; gen_mf
cliptl
push   "<4>"
push   3
push   [regex]             ; nbr_sp
cliptl
push   3
push   [regex]             ; lemq
cliptl
lu     7

pushbl
pushsb 2

pushv  adjectiu1           ; its a var, so eval and push the value
push   3
push   [regex]             ; lemh
cliptl
push   3
push   [regex]             ; a_adj
cliptl
push   "<3>"
push   3
push   [regex]             ; gen_mf
cliptl
push   "<4>"
push   "<4>"               ; a bit confused, there are two link-to in the XML
push   3
push   [regex]             ; lemq
cliptl
lu     7

brace  7                   ; no of blank + lexical unit = 7
                           ; pop 7 items, concat, prepend and append {, } then push back

chunk  6                   ; create the chunk, ^...{^...$}$, and push back in stack

out    1                   ; give output (number of chunks = 1)

Example 2

XML t1x Code

<section-def-cats>
  <def-cat n="nom">
    <cat-item tags="n.*"/>
  </def-cat>
 
  <def-cat n="det">
    <cat-item tags="det.*"/>
    <cat-item tags="predet.*"/>
  </def-cat>
</section-def-cats>

<section-rules>
  <rule>
    <pattern>
      <pattern-item n="det"/>
    </pattern>
  </rule>
  <rule>
    <pattern>
      <pattern-item n="nom"/>
    </pattern>
    <action/>
  </rule>
  <rule>
    <pattern>
      <pattern-item n="det"/>
      <pattern-item n="nom"/>
    </pattern>
  <action/>
  </rule>
</section-rules>

Compiled Code

                         ;first rule: def-cat has two equivalent cat-items
push      "\w<det>\t"    ;load pattern into stack
push      1
addtrie   [address1]     ;define a trie pattern with value 1 (the first rule)

push      "\w<predet>\t" ;same with the second cat-item
push      1
addtrie   [address1]
                         ;second rule (and so on) very simple, unique cat-item
push      "\w<n>\t"
push      1
addtrie   [address2]
                         ;third rule (here is the trick: multiple cat-items in one of the words)
push      "\w<det>\t"
push      "\w<n>\t"
push      2              ; we have 'det' followed by a 'nom', so addtrie has to pop two elements
addtrie   [address3]

push      "\w<predet>\t"
push      "\w<n>\t"
push      2
addtrie   [address3]

Example 3

XML t1x Code

<def-macro n="f_coma" npar="1">
  <choose>
    <when>
      <test>
        <equal caseless="yes">
          <clip pos="1" side="sl" part="lem"/>
          <lit v="como"/>
        </equal>
      </test>
      <let>
        <clip pos="1" side="tl" part="lem"/>
        <get-case-from pos="1">
          <lit v="com a"/>
        </get-case-from>
      </let>
    </when>
  </choose>
</def-macro>

Compiled code

f_coma:  push      1        ; "pos" of "clip"
         push      "^\w+"   ; "lem"
         clipsl             ; gets the value clips on the top of the stack.
                            ; "sl" side is implied in the name of the instruction
         push      "como"
         cmpi               ; does the comparison and cleans the stack, it means caseless
         jnz       end      ; if the comparison does not succeeds, go to end
                            ; semantics: j = jump n = not z = zero flag is activated
                            ; zero flag is activated when a comparison succeeds
                            ; or an arithmetical operation gives 0
         push      1        ; "pos" of "clip"
         push      "^\w+"
         push      "com a"
         storetl            ; store the value provided in the top of the stack
                            ; given position 1, "tl" side and "lem"

end:     ...

Example 4

XML t1x Code

<test>
 <or>
   <not>
     <equal>
       <clip pos="1" side="sl" part="gen"/>
       <clip pos="3" side="sl" part="gen"/>
     </equal>
   </not>
   <not>
     <equal>
       <clip pos="2" side="sl" part="gen"/>
       <clip pos="3" side="sl" part="gen"/>
     </equal>
   </not>
 </or>
</test>

Compiled code

start:      push       1
            push       [regex]           ; part="gen"
            clipsl
            push       3
            push       [regex]           ; part="gen"
            clipsl
            cmp                          ; compare (case sensitive)
            pushnz                       ; NOT zero flag and push in stack
            
            push       2
            push       [regex]           ; part="gen"
            clipsl
            push       3
            push       [regex]           ; part="gen"
            clipsl
            cmp                          ; compare (case sensitive)
            pushnz
            
            or                           ; pop 2 items and OR, push result in stack
            jnz        end               ; jump if zero flag is 0 (we did not get ZERO as the result)
                        
            ... ... ...
            (code for successful test)
            ... ... ...
end:        ...

Example 5

XML t1x Code

<def-list n="verbos_est">
  <list-item v="actuar"/>
  <list-item v="buscar"/>
  <list-item v="estudiar"/>
  <list-item v="existir"/>
  <list-item v="ingressar"/>
  <list-item v="introduir"/>
  <list-item v="penetrar"/>
  <list-item v="publicar"/>
  <list-item v="treballar"/>
  <list-item v="viure"/>
</def-list>

<rule>
  <pattern>
    <pattern-item n="verb"/>
    <pattern-item n="a"/>
  </pattern>
  <action>
  <choose>
    <when>
       <test>
         <in caseless="yes"/>
           <clip pos="1" side="sl" part="lem"/>
           <list n="verbos_est"/>
         </in>
       </test>
       <let>
         <clip pos="2" side="tl" part="lem"/>
         <lit v="en"/>
       </let>
    </when>
  </choose>
</rule>

Compiled code

                push       "actuar"
                push       "buscar"
                push       "estudiar"
                push       "existir"
                push       "ingressar"
                push       "introduir"
                push       "penetrar"
                push       "publicar"
                push       "treballar"
                push       "viure"
                push       10                ; number of elements in the list
                mklist     verbos_est        ; make a list variable named 'verbos_est' and put the last 10 data
                                             ; from the stack in the list
                                         
rule1:          push      [regex_verb]
                push      [regex_a]
                push      2
                addtrie   rule1_action
                ... ... ...
                ... ... ...
                         
rule1_action:   push      1
                push      "^\w+"            ; lem
                clipsl                      ; we have lemmma in stack now
                incin     verbox_est        ; if in verbos_est, set ZF = 1, else ZF = 0       
                jnz       rule1_end

                push      2
                push      "^\w+"
                push      "en"
                storetl
rule1_end:      ...

Development Notes

  • None of the macro and actions need to return anything (unlike conventional functions), so provision for returning a value (using stack) is unnecessary