Difference between revisions of "VM for transfer"

From Apertium
Jump to navigation Jump to search
(Link to French page)
 
(51 intermediate revisions by 7 users not shown)
Line 1: Line 1:
[[Machine virtuelle pour le transfert|En français]]
== Instruction Sets ==

== Instruction Set ==


{| class="wikitable" border="1"
{| class="wikitable" border="1"
! Mnemonic !! Opcode<br>''(in hex)'' !! Other operands !! Stack<br>[before]&rarr;[after] !! Description
! Mnemonic !! Opcode<br>''(in hex)'' !! Other operands !! Stack<br>[before]&rarr;[after] (top, top<sub>-1</sub>, ...) !! Description
|-
|-
| push || - || value || [empty] &rarr; value || Pushes a value in stack
| push || - || value || [empty] &rarr; value || Pushes a string or a variable value onto the stack. Strings go between quotes ("string") but variable's names not
|-
|-
| pushv || - || var || [empty] &rarr; value || Evaluates the var and pushes its value in stack
| pushbl || - || N/A || [empty] &rarr; blank || Pushes a blank onto the stack
|-
|-
| pusht || - || var || [empty] &rarr; <value> || Evaluates the var and pushes its value as a tag in stack
| pushsb || - || pos || [empty] &rarr; superblank || Pushes the superblank at 'pos' onto the stack
|-
|-
| pushbl || - || N/A || [empty] &rarr; blank || pushes a blank in the stack
| append || - || N || value<sub>N</sub>, ..., value<sub>1</sub>, varName &rarr; [empty] || Pops 'N' elements and appends them to a variable or clip
|-
|-
| concat || - || N || value<sub>N</sub>, ..., value<sub>1</sub> &rarr; value<sub>1</sub>...value<sub>N</sub> || Pops 'N' elements and pushes them back concatenated
| pushsb || - || pos || [empty] &rarr; superblank || pushes the superblank at 'pos' in stack
|-
|-
| pushz || - || N/A || [empty] &rarr; [zero_flag] || pushes the current value of zero_flag in stack
| clip || - || N/A || part &rarr; value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack
|-
|-
| pushnz || - || N/A || [empty] &rarr; [not_zero_flag] || first takes the NOT of the current value of zero_flag, then pushes the value in stack
| clipsl || - || link-to || part, pos &rarr; value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "clipsl <3>".
|-
|-
| cliptl || - || N/A || pos, regex &rarr; value || Matches 'regex' in target language 'pos' and pushes the value in stack
| cliptl || - || link-to || part, pos &rarr; value || Obtains the 'part' in target language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "cliptl <3>".
|-
|-
| clipsl || - || N/A || pos, regex &rarr; value || Matches 'regex' in source language 'pos' and pushes the value in stack
| storecl || - || N/A || value, part &rarr; [empty] || Stores 'value' in the only language there is (inter/post-chunk)
|-
|-
| storetl || - || N/A || pos, regex, data &rarr; value || Replace 'regex' in source language 'pos' with 'data'
| storesl || - || N/A || value, part, pos &rarr; [empty] || Stores 'value' as the 'part' of the source language in position 'pos'
|-
|-
| addtrie || - || address || pattern, pattern, ..., no_of_patterns &rarr; [empty] || Pops 'no_of_pattern' amount of data from the stack, combine these patterns, add that to the trie pointing to given 'address'
| storetl || - || N/A || value, part, pos &rarr; [empty] || Stores 'value' as the 'part' of the target language in position 'pos'
|-
|-
| lu || - || num || lemma, tag1, ..., tagn &rarr; ^(lexical_unit)$ || Pops 'num' amount of data from the stack and creates a lexical unit ^... ...$ with them, pushes the lu back in the stack
| storev || - || N/A || value, varName &rarr; [empty] || Stores 'value' in the variable with name 'varName'
|-
|-
| brace || - || num || lu1, blank1, lu2, blank2, ..., lun &rarr; {... ...} || Pops 'num' amount of data from the stack and creates the braced version {... ... ...}, pushes it back in the stack
| addtrie || - || address || N, pattern<sub>N</sub>, ..., pattern<sub>1</sub> &rarr; [empty] || Pops 'N' patterns and creates a trie entry pointing to 'address'
|-
|-
| chunk || - || num || chunk_name, tag1, tag2, ... , {^... ...$} &rarr; ^chunk_name<tag1>...<tagn>{^... ...$}$ || Pops 'num' amount of data from the stack and creates the chunk, pushes back in the stack
| lu || - || N || value<sub>N</sub>, ..., value<sub>1</sub> &rarr; ^(lexical_unit)$ || Pops 'N' values from the stack, creates a lexical unit ^...$ with them and pushes the lu back onto the stack
|-
|-
| out || - || num || chunk1, chunk2, ... &rarr; [empty] || Pops 'num' amount of data from the stack and puts then in standard output
| mlu || - || N || lu<sub>N</sub>, ..., lu<sub>1</sub> &rarr; multiword || Pops 'N' lu from the stack, creates a multiword with them and pushes the multiword back onto the stack
|-
|-
| cmpi || - || N/A || data1, data2 &rarr; [empty] || Pops data1 and data2, string compares them (ignorecase), if matches (successful), set zero flag to 1 (it means we have a zero)
| lu-count || - || N/A || [empty] &rarr; number || Pushes the number of lexical units (words inside the chunk) in the rule onto the stack
|-
|-
| chunk || - || N || elem<sub>N-2</sub>, ... , elem<sub>1</sub>, <tags>, name &rarr; ^name<tags>{elem<sub>1</sub>...elem<sub>N-2</sub>}$ || Pops 'N' amount of data from the stack, creates the chunk and pushes it back onto the stack
| cmp || - || N/A || data1, data2 &rarr; [empty] || Pops data1 and data2, string compares them (case sensitive), if matches (successful), set zero flag to 1
|-
|-
| jmp || - || label || [empty] &rarr; [empty] || Jumps to the label (unconditional jump)
| out || - || N || value<sub>N</sub>, ..., value<sub>1</sub> &rarr; [empty] || Pops 'N' values from the stack and outputs them
|-
|-
| cmp || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Pops 'value<sub>1</sub>' and 'value<sub>2</sub>', compares them, if they are equal pushes a 1 (true), if they aren't pushes a 0 (false)
| jz || - || label || [empty] &rarr; [empty] || Jumps to the label if zero flag is 1
|-
|-
| cmpi || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Pops 'value<sub>1</sub>' and 'value<sub>2</sub>', compares them (ignoring case for each string), if they are equal pushes a 1 (true), if they aren't pushes a 0 (false)
| jnz || - || label || [empty] &rarr; [empty] || Jumps to the label if zero flag is 0 (non zero)
|-
|-
| cmp-substr || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Tests if 'value<sub>1</sub>' contains the substring 'value<sub>2</sub>', result can be 1 (true) or 0 (false).
| hlt || - || N/A || || Halts the program
|-
|-
| cmpi-substr || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Tests if 'value<sub>1</sub>' contains the substring 'value<sub>2</sub>' (ignoring case for each string), result can be 1 (true) or 0 (false).
| return || - || N/A || PC &rarr; [empty] || Returns from a subroutine
|-
|-
| nop || - || N/A || [empty] &rarr; [empty] || No operation
| not || - || N || value &rarr; result || Negates the value on top of the stack, 0 -> 1 or 1 -> 0
|-
| and || - || N || value<sub>N</sub>, ..., value<sub>1</sub> &rarr; result || And operation of 'N' values, result can be 1 (true) or 0 (false)
|-
| or || - || N || value<sub>N</sub>, ..., value<sub>1</sub> &rarr; result || Or operation of 'N' values, result can be 1 (true) or 0 (false)
|-
| in || - || N/A || list, value &rarr; result || Performs a search of a 'value' in a 'list'
|-
| inig || - || N/A || list, value &rarr; result || Performs a search (ignoring case) of a 'value' in a 'list'
|-
| jmp || - || label || [empty] &rarr; [empty] || Jumps to the label, unconditionally
|-
| jz || - || label || top &rarr; [empty] || Jumps to the label if stack.top == 0
|-
| jnz || - || label || top &rarr; [empty] || Jumps to the label if stack.top == 1
|-
| call || - || label || N, arg<sub>N</sub>, ..., arg<sub>1</sub> &rarr; [empty] || Calls a macro with the arguments on the stack
|-
| ret || - || N/A || [empty] &rarr; [empty] || Returns from a macro, PC will be handled automatically by the VM.
|-
| nop || - || N/A || [empty] &rarr; [empty] || No operation
|-
| case-of || - || N/A || container &rarr; case || Gets the case from the container in the stack. The container would usually be the result of a clip instruction but can be any string.
|-
| get-case-from || - || N/A || pos &rarr; case || Gets the case from the lexical unit in position 'pos'
|-
| modify-case || - || N/A || case, container &rarr; modifiedContainer || Modifies the case of the 'container' to 'case' and leaves the modified container on the stack
|-
| begins-with || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Checks if 'value<sub>1</sub>' begins with 'value<sub>2</sub>' and pushes 1 (true) or 0 (false), 'value<sub>2</sub>' can be a list
|-
| begins-with-ig || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Checks if 'value<sub>1</sub>' begins with 'value<sub>2</sub>' (ignoring the case) and pushes 1 (true) or 0 (false), 'value<sub>2</sub>' can be a list
|-
| ends-with || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Checks if 'value<sub>1</sub>' ends with 'value<sub>2</sub>' and pushes 1 (true) or 0 (false), 'value<sub>2</sub>' can be a list
|-
| ends-with-ig || - || N/A || value<sub>2</sub>, value<sub>1</sub> &rarr; result || Checks if 'value<sub>1</sub>' ends with 'value<sub>2</sub>' (ignoring the case) and pushes 1 (true) or 0 (false), 'value<sub>2</sub>' can be a list
|}
|}


* Lists are represented as a concatenation of items separated by '|', e.g. uno|otro|poco|cuánto|menos|mucho|tanto|demasiado
== Sample compilation of XML code fragments ==
* The case is represented as "aa" (all lowercase), "Aa" (first uppercase) and "AA", (all uppercase).

== Code generation ==

=== Code sections ===

The code generated by the compiler is divided in several sections. In addition, the VM reads and stores the code in its own sections.
{| class="wikitable" border="1"
! Section !! Code !! VM's section !! Information
|-
| align=center | Header || <code>#<assembly> <br /> #<transfer default="chunk"></code> || align=center | ---- || This section establishes the type of code generated and the transfer stage.
|-
| align=center | Initialisation || <code>push "genere"<br /> push "<m>"<br /> storev <br /> ... <br /> jmp rules_section_start</code> || align=center | Code section || In this section we initialize the variables with their default value and execute other initialisation code. <br /> At the end we jmp to the section rules section, although rules will only execute when a pattern is matched,<br /> we need to process all the patterns which are in the rules section.
|-
| align=center | Macros || <code>macro_firstWord_start: <br /> ... <br /> macro_firstWord_end: <br />...</code> || align=center | Macros code section || This section contains all the macro's code delimited by labels.<br /> Each macro can be called with the 'call' instruction.
|-
| align=center | Patterns || <code>section_rules_start:<br /> patterns_start:<br /> push "all<predet><sp>"<br /> push "<n><pl>"<br /> push 2<br /> addtrie action_0_start<br /> ... <br />patterns_end:</code> || align=center | Preprocess code section || In this section all the patterns will be added to the system trie. <br />In this example you can see that two patterns are pushed, then the number of patterns is pushed and finally<br /> the addtrie instruction pops them and adds an entry in the trie to the rule 0.
|-
| align=center | Rules || <code>action_0_start:<br /> ...<br /> action_0_end:<br /> ...<br /> section_rules_end:</code> || align=center | Rules code section || Finally the rules section contains every rule delimited by its labels and all its code.
|}

*One line comments can be made by using the '#' symbol at the start of the line.

=== Code examples ===

==== Macro example ====

===== Transfer file (.t1x) =====


=== Example 1 ===
==== XML t1x Code: chunking ====
<code>
<code>

<out>
<source lang="xml">
<chunk name="det_det_nom_adj" case="caseFirstWord">

<tags>
<transfer default="chunk">
<tag><lit-tag v="SN"/></tag>

<tag><var n="tipus_det"/></tag>
<section-def-attrs>
<tag><var n="gen_chunk"/></tag>
<tag><var n="nbr_chunk"/></tag>
<def-attr n="nbr">
</tags>
<attr-item tags="sg"/>
<lu>
<attr-item tags="pl"/>
<clip pos="1" side="tl" part="lem"/>
<attr-item tags="sp"/>
<clip pos="1" side="tl" part="a_det"/>
<attr-item tags="ND"/>
</def-attr>
<clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
</section-def-attrs>
<clip pos="1" side="tl" part="gen_mf"/>

<clip pos="1" side="tl" part="nbr_sense_sp" link-to="4"/>
<section-def-vars>
<clip pos="1" side="tl" part="nbr_sp"/>
<def-var n="nombre" v="&amp;lt;sg&amp;gt;"/>
</lu>
<def-var n="genere" v="&amp;lt;m&amp;gt;"/>
<b/>
</section-def-vars>
<lu>

<lit v="el"/>
<lit-tag v="det.def"/>
<section-def-macros>
<def-macro n="nombre_nom" npar="1">
<clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/>
<lit-tag v="pl"/>
<let>
</lu>
<var n="nombre"/>
<b pos="1"/>
<lit v=""/>
<lu>
</let>
<choose>
<clip pos="3" side="tl" part="lemh"/>
<when>
<clip pos="3" side="tl" part="a_nom"/>
<test>
<clip pos="3" side="tl" part="gen_sense_mf" link-to="3"/>
<clip pos="3" side="tl" part="gen_mf"/>
<and>
<equal>
<clip pos="3" side="tl" part="nbr_sense_sp" link-to="4"/>
<clip pos="3" side="tl" part="nbr_sp"/>
<clip pos="1" side="sl" part="nbr"/>
<clip pos="3" side="tl" part="lemq"/>
<lit-tag v="sg"/>
</lu>
</equal>
<b/>
<equal>
<b pos="2"/>
<clip pos="1" side="tl" part="nbr"/>
<lu>
<lit-tag v="pl"/>
<var n="adjectiu1"/>
</equal>
<clip pos="2" side="tl" part="lemh"/>
</and>
<clip pos="2" side="tl" part="a_adj"/>
</test>
<let>
<clip pos="2" side="tl" part="gen_sense_mf" link-to="3"/>
<clip pos="2" side="tl" part="gen_mf"/>
<var n="nombre"/>
<clip pos="2" side="tl" part="nbr_sense_sp" link-to="4"/>
<lit-tag v="pl_slsg"/>
</let>
<clip pos="2" side="tl" part="nbr_sp" link-to="4"/>
<clip pos="2" side="tl" part="lemq"/>
</when>
</lu>
<when>
</chunk>
<test>
<and>
</out>
<equal>
<clip pos="1" side="sl" part="nbr"/>
<lit-tag v="pl"/>
</equal>
<equal>
<clip pos="1" side="tl" part="nbr"/>
<lit-tag v="sg"/>
</equal>
</and>
</test>
<let>
<var n="nombre"/>
<lit-tag v="sg_slpl"/>
</let>
</when>
<otherwise>
<let>
<var n="nombre"/>
<clip pos="1" side="tl" part="nbr"/>
</let>
</otherwise>
</choose>
</def-macro>
</section-def-macros>

</transfer>

</source>

</code>
</code>


==== Compiled Code ====
===== Code generated =====


<source lang="bash">
<code>
#<assembly>
push "det_det_nom_adj"
#<transfer default="chunk">
push "<SN>"
#<def-var v="&lt;sg&gt;" n="nombre">
pusht tipus_det ; first evaluate the variable, append/prepend '<>', then push in the stack
push "nombre"
pusht gen_chunk
push "<sg>"
pusht nbr_chunk
storev
#<def-var v="&lt;m&gt;" n="genere">
push 1
push "^\w+" ; lem
push "genere"
push "<m>"
storev
jmp section_rules_start
#<def-macro npar="1" n="nombre_nom">
macro_nombre_nom_start:
#<var n="nombre">
push "nombre"
#<lit v="">
push ""
storev
#<clip part="nbr" pos="1" side="sl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
clipsl
#<lit-tag v="sg">
push "<sg>"
cmp
#<clip part="nbr" pos="1" side="tl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
cliptl
cliptl
#<lit-tag v="pl">
push 1
push "<pl>"
push [regex] ; a_det
cmp
and 2
jz when_0_end
#<var n="nombre">
push "nombre"
#<lit-tag v="pl_slsg">
push "<pl_slsg>"
storev
jmp choose_0_end
when_0_end:
#<clip part="nbr" pos="1" side="sl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
clipsl
#<lit-tag v="pl">
push "<pl>"
cmp
#<clip part="nbr" pos="1" side="tl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
cliptl
cliptl
#<lit-tag v="sg">
push "<3>" ; since link-to overrides everything else, we do not need any dedicated instruction
push "<sg>"
; for that
cmp
push 1
and 2
push [regex] ; gen_mf
jz when_1_end
#<var n="nombre">
push "nombre"
#<lit-tag v="sg_slpl">
push "<sg_slpl>"
storev
jmp choose_0_end
when_1_end:
#<otherwise>
#<var n="nombre">
push "nombre"
#<clip part="nbr" pos="1" side="tl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
cliptl
cliptl
storev
push "<4>"
choose_0_end:
push 1
macro_nombre_nom_end: ret
push [regex] ; nbr_sp
#<section-rules>
cliptl
section_rules_start:
lu 6 ; pop 6 items, concat, create lexical unit ^...$ and push back in stack
section_rules_end:
</source>
pushbl ; push a blank
push "el"
push "<det><def>"
push "<3>"
push "<pl>"
lu 4 ; pop 4 items from the stack, create a lexical unit ^...$ and then
; push in the stack
pushsb 1
push 3
push [regex] ; lemh
cliptl
push 3
push [regex] ; a_nom
cliptl
push "<3>"
push 3
push [regex] ; gen_mf
cliptl
push "<4>"
push 3
push [regex] ; nbr_sp
cliptl
push 3
push [regex] ; lemq
cliptl
lu 7
pushbl
pushsb 2
pushv adjectiu1 ; its a var, so eval and push the value
push 3
push [regex] ; lemh
cliptl
push 3
push [regex] ; a_adj
cliptl
push "<3>"
push 3
push [regex] ; gen_mf
cliptl
push "<4>"
push "<4>" ; a bit confused, there are two link-to in the XML
push 3
push [regex] ; lemq
cliptl
lu 7
brace 7 ; no of blank + lexical unit = 7
; pop 7 items, concat, prepend and append {, } then push back
chunk 6 ; create the chunk, ^...{^...$}$, and push back in stack
out 1 ; give output (number of chunks = 1)
</code>


----
=== Example 2 ===


==== XML t1x Code ====
==== Rule's patterns example ====
<code>
<section-def-cats>
<def-cat n="nom">
<cat-item tags="n.*"/>
</def-cat>
<def-cat n="det">
<cat-item tags="det.*"/>
<cat-item tags="predet.*"/>
</def-cat>
</section-def-cats>
<section-rules>
<rule>
<pattern>
<pattern-item n="det"/>
</pattern>
</rule>
<rule>
<pattern>
<pattern-item n="nom"/>
</pattern>
<action/>
</rule>
<rule>
<pattern>
<pattern-item n="det"/>
<pattern-item n="nom"/>
</pattern>
<action/>
</rule>
</section-rules>
</code>


==== Compiled Code ====
===== Transfer file (.t1x) =====
<source lang="xml">
<transfer default="chunk">


<section-def-cats>
<code>
<def-cat n="all">
;first rule: def-cat has two equivalent cat-items
<cat-item lemma="all" tags="predet.sp"/>
push "\w<det>\t" ;load pattern into stack
push 1
</def-cat>
addtrie [address1] ;define a trie pattern with value 1 (the first rule)
push "\w<predet>\t" ;same with the second cat-item
push 1
addtrie [address1]
;second rule (and so on) very simple, unique cat-item
push "\w<n>\t"
push 1
addtrie [address2]
;third rule (here is the trick: multiple cat-items in one of the words)
push "\w<det>\t"
push "\w<n>\t"
push 2 ; we have 'det' followed by a 'nom', so addtrie has to pop two elements
addtrie [address3]
push "\w<predet>\t"
push "\w<n>\t"
push 2
addtrie [address3]
</code>


<def-cat n="adj2">
=== Example 3 ===
<cat-item tags="adj"/>
==== XML t1x Code ====
<cat-item tags="adj.*"/>
<cat-item tags="adj.sint"/>
<cat-item tags="adj.sint.*"/>
<cat-item tags="adj.comp"/>
<cat-item tags="adj.sup"/>
</def-cat>


<def-cat n="nomcomu">
<code>
<cat-item tags="n.*"/>
<def-macro n="f_coma" npar="1">
<choose>
</def-cat>
<when>
<test>
<def-cat n="nompropi">
<equal caseless="yes">
<cat-item tags="np.*"/>
</def-cat>
<clip pos="1" side="sl" part="lem"/>
<lit v="como"/>
</equal>
</test>
<let>
<clip pos="1" side="tl" part="lem"/>
<get-case-from pos="1">
<lit v="com a"/>
</get-case-from>
</let>
</when>
</choose>
</def-macro>
</code>


<def-cat n="nploc">
==== Compiled code ====
<cat-item tags="np.loc.*"/>
</def-cat>
</section-def-cats>


<rule>
<code>
<pattern>
f_coma: push 1 ; "pos" of "clip"
<pattern-item n="all"/>
push "^\w+" ; "lem"
<pattern-item n="adj2"/>
clipsl ; gets the value clips on the top of the stack.
</pattern>
; "sl" side is implied in the name of the instruction
push "como"
<action>
...
cmpi ; does the comparison and cleans the stack, it means caseless
</action>
jnz end ; if the comparison does not succeeds, go to end
</rule>
; semantics: j = jump n = not z = zero flag is activated
; zero flag is activated when a comparison succeeds
; or an arithmetical operation gives 0
push 1 ; "pos" of "clip"
push "^\w+"
push "com a"
storetl ; store the value provided in the top of the stack
; given position 1, "tl" side and "lem"
end: ...
</code>


<rule>
=== Example 4 ===
<pattern>
==== XML t1x Code ====
<pattern-item n="nomcomu"/>
<code>
<pattern-item n="nompropi"/>
<test>
<pattern-item n="nploc"/>
<or>
<not>
</pattern>
<equal>
<action>
...
<clip pos="1" side="sl" part="gen"/>
</action>
<clip pos="3" side="sl" part="gen"/>
</equal>
</rule>
</not>
</section-rules>
<not>
<equal>
<clip pos="2" side="sl" part="gen"/>
<clip pos="3" side="sl" part="gen"/>
</equal>
</not>
</or>
</test>
</code>
==== Compiled code ====


</transfer>
<code>

start: push 1
</source>
push [regex] ; part="gen"

clipsl
===== Code generated =====
push 3

push [regex] ; part="gen"
<source lang="bash">
clipsl
#<assembly>
cmp ; compare (case sensitive)
#<transfer default="chunk">
pushnz ; NOT zero flag and push in stack
jmp section_rules_start
#<section-rules>
push 2
section_rules_start:
push [regex] ; part="gen"
patterns_start:
clipsl
push "all<predet><sp>"
push 3
push "<adj>|<adj><*>|<adj><sint>|<adj><sint><*>|<adj><comp>|<adj><sup>"
push [regex] ; part="gen"
push 2
clipsl
addtrie action_0_start
cmp ; compare (case sensitive)
push "<n><*>"
pushnz
push "<np><*>"
push "<np><loc><*>"
or ; pop 2 items and OR, push result in stack
push 3
jnz end ; jump is zero flag is 0 (we did not get ZERO as the result)
addtrie action_1_start
patterns_end:
... ... ...
action_0_start:
(code for successful test)
...
... ... ...
action_0_end:
end: ...
action_1_start:
</code>
...
action_1_end:
section_rules_end:
</source>

----


==Wishlist==

* Make the compile options like with lt-comp/cg-comp (e.g. you don't need -i/-o to say what the input/output files are), just <code>apertium-compiler [options] <infile> <outfile></code>

==External links==


* [https://github.com/ggm/vm-for-transfer-cpp Github: vm-for-transfer-cpp]
== Development Notes ==


[[Category:Development]]
* None of the macro and actions need to return anything (unlike conventional functions), so provision for returning a value (using stack) is unnecessary
[[Category:Documentation in English]]
[[Category:Transfer]]

Latest revision as of 13:58, 7 October 2014

En français

Instruction Set[edit]

Mnemonic Opcode
(in hex)
Other operands Stack
[before]→[after] (top, top-1, ...)
Description
push - value [empty] → value Pushes a string or a variable value onto the stack. Strings go between quotes ("string") but variable's names not
pushbl - N/A [empty] → blank Pushes a blank onto the stack
pushsb - pos [empty] → superblank Pushes the superblank at 'pos' onto the stack
append - N valueN, ..., value1, varName → [empty] Pops 'N' elements and appends them to a variable or clip
concat - N valueN, ..., value1 → value1...valueN Pops 'N' elements and pushes them back concatenated
clip - N/A part → value Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack
clipsl - link-to part, pos → value Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "clipsl <3>".
cliptl - link-to part, pos → value Obtains the 'part' in target language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "cliptl <3>".
storecl - N/A value, part → [empty] Stores 'value' in the only language there is (inter/post-chunk)
storesl - N/A value, part, pos → [empty] Stores 'value' as the 'part' of the source language in position 'pos'
storetl - N/A value, part, pos → [empty] Stores 'value' as the 'part' of the target language in position 'pos'
storev - N/A value, varName → [empty] Stores 'value' in the variable with name 'varName'
addtrie - address N, patternN, ..., pattern1 → [empty] Pops 'N' patterns and creates a trie entry pointing to 'address'
lu - N valueN, ..., value1 → ^(lexical_unit)$ Pops 'N' values from the stack, creates a lexical unit ^...$ with them and pushes the lu back onto the stack
mlu - N luN, ..., lu1 → multiword Pops 'N' lu from the stack, creates a multiword with them and pushes the multiword back onto the stack
lu-count - N/A [empty] → number Pushes the number of lexical units (words inside the chunk) in the rule onto the stack
chunk - N elemN-2, ... , elem1, <tags>, name → ^name<tags>{elem1...elemN-2}$ Pops 'N' amount of data from the stack, creates the chunk and pushes it back onto the stack
out - N valueN, ..., value1 → [empty] Pops 'N' values from the stack and outputs them
cmp - N/A value2, value1 → result Pops 'value1' and 'value2', compares them, if they are equal pushes a 1 (true), if they aren't pushes a 0 (false)
cmpi - N/A value2, value1 → result Pops 'value1' and 'value2', compares them (ignoring case for each string), if they are equal pushes a 1 (true), if they aren't pushes a 0 (false)
cmp-substr - N/A value2, value1 → result Tests if 'value1' contains the substring 'value2', result can be 1 (true) or 0 (false).
cmpi-substr - N/A value2, value1 → result Tests if 'value1' contains the substring 'value2' (ignoring case for each string), result can be 1 (true) or 0 (false).
not - N value → result Negates the value on top of the stack, 0 -> 1 or 1 -> 0
and - N valueN, ..., value1 → result And operation of 'N' values, result can be 1 (true) or 0 (false)
or - N valueN, ..., value1 → result Or operation of 'N' values, result can be 1 (true) or 0 (false)
in - N/A list, value → result Performs a search of a 'value' in a 'list'
inig - N/A list, value → result Performs a search (ignoring case) of a 'value' in a 'list'
jmp - label [empty] → [empty] Jumps to the label, unconditionally
jz - label top → [empty] Jumps to the label if stack.top == 0
jnz - label top → [empty] Jumps to the label if stack.top == 1
call - label N, argN, ..., arg1 → [empty] Calls a macro with the arguments on the stack
ret - N/A [empty] → [empty] Returns from a macro, PC will be handled automatically by the VM.
nop - N/A [empty] → [empty] No operation
case-of - N/A container → case Gets the case from the container in the stack. The container would usually be the result of a clip instruction but can be any string.
get-case-from - N/A pos → case Gets the case from the lexical unit in position 'pos'
modify-case - N/A case, container → modifiedContainer Modifies the case of the 'container' to 'case' and leaves the modified container on the stack
begins-with - N/A value2, value1 → result Checks if 'value1' begins with 'value2' and pushes 1 (true) or 0 (false), 'value2' can be a list
begins-with-ig - N/A value2, value1 → result Checks if 'value1' begins with 'value2' (ignoring the case) and pushes 1 (true) or 0 (false), 'value2' can be a list
ends-with - N/A value2, value1 → result Checks if 'value1' ends with 'value2' and pushes 1 (true) or 0 (false), 'value2' can be a list
ends-with-ig - N/A value2, value1 → result Checks if 'value1' ends with 'value2' (ignoring the case) and pushes 1 (true) or 0 (false), 'value2' can be a list
  • Lists are represented as a concatenation of items separated by '|', e.g. uno|otro|poco|cuánto|menos|mucho|tanto|demasiado
  • The case is represented as "aa" (all lowercase), "Aa" (first uppercase) and "AA", (all uppercase).

Code generation[edit]

Code sections[edit]

The code generated by the compiler is divided in several sections. In addition, the VM reads and stores the code in its own sections.

Section Code VM's section Information
Header #<assembly>
#<transfer default="chunk">
---- This section establishes the type of code generated and the transfer stage.
Initialisation push "genere"
push "<m>"
storev
...
jmp rules_section_start
Code section In this section we initialize the variables with their default value and execute other initialisation code.
At the end we jmp to the section rules section, although rules will only execute when a pattern is matched,
we need to process all the patterns which are in the rules section.
Macros macro_firstWord_start:
...
macro_firstWord_end:
...
Macros code section This section contains all the macro's code delimited by labels.
Each macro can be called with the 'call' instruction.
Patterns section_rules_start:
patterns_start:
push "all<predet><sp>"
push "<n><pl>"
push 2
addtrie action_0_start
...
patterns_end:
Preprocess code section In this section all the patterns will be added to the system trie.
In this example you can see that two patterns are pushed, then the number of patterns is pushed and finally
the addtrie instruction pops them and adds an entry in the trie to the rule 0.
Rules action_0_start:
...
action_0_end:
...
section_rules_end:
Rules code section Finally the rules section contains every rule delimited by its labels and all its code.
  • One line comments can be made by using the '#' symbol at the start of the line.

Code examples[edit]

Macro example[edit]

Transfer file (.t1x)[edit]

 <transfer default="chunk">

  <section-def-attrs>
    <def-attr n="nbr">
      <attr-item tags="sg"/>
      <attr-item tags="pl"/>
      <attr-item tags="sp"/>
      <attr-item tags="ND"/>
    </def-attr>
  </section-def-attrs>

  <section-def-vars>
    <def-var n="nombre" v="&amp;lt;sg&amp;gt;"/>
    <def-var n="genere" v="&amp;lt;m&amp;gt;"/>
  </section-def-vars>

  <section-def-macros>
    <def-macro n="nombre_nom" npar="1">
      <let>
        <var n="nombre"/>
        <lit v=""/>
      </let>
      <choose>
        <when>
          <test>
            <and>
              <equal>         
                <clip pos="1" side="sl" part="nbr"/>
                <lit-tag v="sg"/>
              </equal>
              <equal>
                <clip pos="1" side="tl" part="nbr"/>
                <lit-tag v="pl"/>
              </equal>
            </and>
          </test>
          <let>
            <var n="nombre"/>
            <lit-tag v="pl_slsg"/>
          </let>
         </when>
         <when>
           <test>
             <and>
               <equal>
                 <clip pos="1" side="sl" part="nbr"/>
                 <lit-tag v="pl"/>
               </equal>
               <equal>
                 <clip pos="1" side="tl" part="nbr"/>
                 <lit-tag v="sg"/>
               </equal>
             </and>
           </test>
           <let>
             <var n="nombre"/>
             <lit-tag v="sg_slpl"/>
           </let>
         </when>
         <otherwise>
           <let>
             <var n="nombre"/>
             <clip pos="1" side="tl" part="nbr"/>
           </let>
         </otherwise>
       </choose> 
     </def-macro>
   </section-def-macros>

 </transfer>

Code generated[edit]
 #<assembly>
 #<transfer default="chunk">
 #<def-var v="&lt;sg&gt;" n="nombre">
 push "nombre"
 push "<sg>"
 storev
 #<def-var v="&lt;m&gt;" n="genere">
 push "genere"
 push "<m>"
 storev
 jmp section_rules_start
 #<def-macro npar="1" n="nombre_nom">
 macro_nombre_nom_start:
 #<var n="nombre">
 push "nombre"
 #<lit v="">
 push ""
 storev
 #<clip part="nbr" pos="1" side="sl">
 push 1
 push "<sg>|<pl>|<sp>|<ND>"
 clipsl
 #<lit-tag v="sg">
 push "<sg>"
 cmp
 #<clip part="nbr" pos="1" side="tl">
 push 1
 push "<sg>|<pl>|<sp>|<ND>"
 cliptl
 #<lit-tag v="pl">
 push "<pl>"
 cmp
 and 2
 jz when_0_end
 #<var n="nombre">
 push "nombre"
 #<lit-tag v="pl_slsg">
 push "<pl_slsg>"
 storev
 jmp choose_0_end
 when_0_end:
 #<clip part="nbr" pos="1" side="sl">
 push 1
 push "<sg>|<pl>|<sp>|<ND>"
 clipsl
 #<lit-tag v="pl">
 push "<pl>"
 cmp
 #<clip part="nbr" pos="1" side="tl">
 push 1
 push "<sg>|<pl>|<sp>|<ND>"
 cliptl
 #<lit-tag v="sg">
 push "<sg>"
 cmp
 and 2
 jz when_1_end
 #<var n="nombre">
 push "nombre"
 #<lit-tag v="sg_slpl">
 push "<sg_slpl>"
 storev
 jmp choose_0_end
 when_1_end:
 #<otherwise>
 #<var n="nombre">
 push "nombre"
 #<clip part="nbr" pos="1" side="tl">
 push 1
 push "<sg>|<pl>|<sp>|<ND>"
 cliptl
 storev
 choose_0_end:
 macro_nombre_nom_end: ret
 #<section-rules>
 section_rules_start:
 section_rules_end:

Rule's patterns example[edit]

Transfer file (.t1x)[edit]
<transfer default="chunk">

  <section-def-cats>
    <def-cat n="all">
      <cat-item lemma="all" tags="predet.sp"/>
    </def-cat>

    <def-cat n="adj2">
      <cat-item tags="adj"/>
      <cat-item tags="adj.*"/>
      <cat-item tags="adj.sint"/>
      <cat-item tags="adj.sint.*"/>
      <cat-item tags="adj.comp"/>
      <cat-item tags="adj.sup"/>
    </def-cat>

    <def-cat n="nomcomu">
      <cat-item tags="n.*"/>
    </def-cat>
    
    <def-cat n="nompropi">
      <cat-item tags="np.*"/>
    </def-cat>

    <def-cat n="nploc">
      <cat-item tags="np.loc.*"/>
    </def-cat>
  </section-def-cats>

    <rule> 
      <pattern>
	<pattern-item n="all"/>
	<pattern-item n="adj2"/>
      </pattern>
      <action>
	...
      </action>
    </rule>

    <rule> 
      <pattern>
	<pattern-item n="nomcomu"/>
	<pattern-item n="nompropi"/>
	<pattern-item n="nploc"/>
      </pattern>
      <action>
	...
      </action>
    </rule>
  </section-rules>

</transfer>
Code generated[edit]
#<assembly>
#<transfer default="chunk">
jmp section_rules_start
#<section-rules>
section_rules_start:
patterns_start:
push "all<predet><sp>"
push "<adj>|<adj><*>|<adj><sint>|<adj><sint><*>|<adj><comp>|<adj><sup>"
push 2
addtrie action_0_start
push "<n><*>"
push "<np><*>"
push "<np><loc><*>"
push 3
addtrie action_1_start
patterns_end:
action_0_start:
...
action_0_end:
action_1_start:
...
action_1_end:
section_rules_end:


Wishlist[edit]

  • Make the compile options like with lt-comp/cg-comp (e.g. you don't need -i/-o to say what the input/output files are), just apertium-compiler [options] <infile> <outfile>

External links[edit]