Difference between revisions of "VM for transfer"
Jump to navigation
Jump to search
(Link to French page) |
|||
(18 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
[[Machine virtuelle pour le transfert|En français]] |
|||
== Instruction Set == |
== Instruction Set == |
||
Line 10: | Line 12: | ||
| pushsb || - || pos || [empty] → superblank || Pushes the superblank at 'pos' onto the stack |
| pushsb || - || pos || [empty] → superblank || Pushes the superblank at 'pos' onto the stack |
||
|- |
|- |
||
| append || - || N || value<sub>N</sub>, ..., value<sub>1</sub> → [empty] || Pops 'N' elements and appends them to a variable or clip |
| append || - || N || value<sub>N</sub>, ..., value<sub>1</sub>, varName → [empty] || Pops 'N' elements and appends them to a variable or clip |
||
|- |
|- |
||
| concat || - || N || value<sub>N</sub>, ..., value<sub>1</sub> → value<sub> |
| concat || - || N || value<sub>N</sub>, ..., value<sub>1</sub> → value<sub>1</sub>...value<sub>N</sub> || Pops 'N' elements and pushes them back concatenated |
||
|- |
|- |
||
| clip || - || N/A || part → value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack |
| clip || - || N/A || part → value || Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack |
||
|- |
|- |
||
| clipsl || - || |
| clipsl || - || link-to || part, pos → value || Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "clipsl <3>". |
||
|- |
|- |
||
| cliptl || - || |
| cliptl || - || link-to || part, pos → value || Obtains the 'part' in target language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "cliptl <3>". |
||
|- |
|- |
||
| storecl || - || N/A || value, part → [empty] || Stores 'value' in the only language there is (inter/post-chunk) |
| storecl || - || N/A || value, part → [empty] || Stores 'value' in the only language there is (inter/post-chunk) |
||
Line 36: | Line 38: | ||
| lu-count || - || N/A || [empty] → number || Pushes the number of lexical units (words inside the chunk) in the rule onto the stack |
| lu-count || - || N/A || [empty] → number || Pushes the number of lexical units (words inside the chunk) in the rule onto the stack |
||
|- |
|- |
||
| chunk || - || N || |
| chunk || - || N || elem<sub>N-2</sub>, ... , elem<sub>1</sub>, <tags>, name → ^name<tags>{elem<sub>1</sub>...elem<sub>N-2</sub>}$ || Pops 'N' amount of data from the stack, creates the chunk and pushes it back onto the stack |
||
|- |
|- |
||
| out || - || N || value<sub>N</sub>, ..., value<sub>1</sub> → [empty] || Pops 'N' values from the stack and outputs them |
| out || - || N || value<sub>N</sub>, ..., value<sub>1</sub> → [empty] || Pops 'N' values from the stack and outputs them |
||
Line 70: | Line 72: | ||
| nop || - || N/A || [empty] → [empty] || No operation |
| nop || - || N/A || [empty] → [empty] || No operation |
||
|- |
|- |
||
| |
| case-of || - || N/A || container → case || Gets the case from the container in the stack. The container would usually be the result of a clip instruction but can be any string. |
||
|- |
|- |
||
| |
| get-case-from || - || N/A || pos → case || Gets the case from the lexical unit in position 'pos' |
||
|- |
|||
| modify-case || - || N/A || case, container → modifiedContainer || Modifies the case of the 'container' to 'case' and leaves the modified container on the stack |
|||
|- |
|- |
||
| begins-with || - || N/A || value<sub>2</sub>, value<sub>1</sub> → result || Checks if 'value<sub>1</sub>' begins with 'value<sub>2</sub>' and pushes 1 (true) or 0 (false), 'value<sub>2</sub>' can be a list |
| begins-with || - || N/A || value<sub>2</sub>, value<sub>1</sub> → result || Checks if 'value<sub>1</sub>' begins with 'value<sub>2</sub>' and pushes 1 (true) or 0 (false), 'value<sub>2</sub>' can be a list |
||
Line 86: | Line 90: | ||
* The case is represented as "aa" (all lowercase), "Aa" (first uppercase) and "AA", (all uppercase). |
* The case is represented as "aa" (all lowercase), "Aa" (first uppercase) and "AA", (all uppercase). |
||
== Code generation == |
|||
== Sample compilation of XML code fragments == |
|||
=== |
=== Code sections === |
||
==== XML t1x Code: chunking ==== |
|||
<code> |
|||
<out> |
|||
<chunk name="det_det_nom_adj" case="caseFirstWord"> |
|||
<tags> |
|||
<tag><lit-tag v="SN"/></tag> |
|||
<tag><var n="tipus_det"/></tag> |
|||
<tag><var n="gen_chunk"/></tag> |
|||
<tag><var n="nbr_chunk"/></tag> |
|||
</tags> |
|||
<lu> |
|||
<clip pos="1" side="tl" part="lem"/> |
|||
<clip pos="1" side="tl" part="a_det"/> |
|||
<clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/> |
|||
<clip pos="1" side="tl" part="gen_mf"/> |
|||
<clip pos="1" side="tl" part="nbr_sense_sp" link-to="4"/> |
|||
<clip pos="1" side="tl" part="nbr_sp"/> |
|||
</lu> |
|||
<b/> |
|||
<lu> |
|||
<lit v="el"/> |
|||
<lit-tag v="det.def"/> |
|||
<clip pos="1" side="tl" part="gen_sense_mf" link-to="3"/> |
|||
<lit-tag v="pl"/> |
|||
</lu> |
|||
<b pos="1"/> |
|||
<lu> |
|||
<clip pos="3" side="tl" part="lemh"/> |
|||
<clip pos="3" side="tl" part="a_nom"/> |
|||
<clip pos="3" side="tl" part="gen_sense_mf" link-to="3"/> |
|||
<clip pos="3" side="tl" part="gen_mf"/> |
|||
<clip pos="3" side="tl" part="nbr_sense_sp" link-to="4"/> |
|||
<clip pos="3" side="tl" part="nbr_sp"/> |
|||
<clip pos="3" side="tl" part="lemq"/> |
|||
</lu> |
|||
<b/> |
|||
<b pos="2"/> |
|||
<lu> |
|||
<var n="adjectiu1"/> |
|||
<clip pos="2" side="tl" part="lemh"/> |
|||
<clip pos="2" side="tl" part="a_adj"/> |
|||
<clip pos="2" side="tl" part="gen_sense_mf" link-to="3"/> |
|||
<clip pos="2" side="tl" part="gen_mf"/> |
|||
<clip pos="2" side="tl" part="nbr_sense_sp" link-to="4"/> |
|||
<clip pos="2" side="tl" part="nbr_sp" link-to="4"/> |
|||
<clip pos="2" side="tl" part="lemq"/> |
|||
</lu> |
|||
</chunk> |
|||
</out> |
|||
</code> |
|||
The code generated by the compiler is divided in several sections. In addition, the VM reads and stores the code in its own sections. |
|||
==== Compiled Code ==== |
|||
{| class="wikitable" border="1" |
|||
! Section !! Code !! VM's section !! Information |
|||
|- |
|||
| align=center | Header || <code>#<assembly> <br /> #<transfer default="chunk"></code> || align=center | ---- || This section establishes the type of code generated and the transfer stage. |
|||
|- |
|||
| align=center | Initialisation || <code>push "genere"<br /> push "<m>"<br /> storev <br /> ... <br /> jmp rules_section_start</code> || align=center | Code section || In this section we initialize the variables with their default value and execute other initialisation code. <br /> At the end we jmp to the section rules section, although rules will only execute when a pattern is matched,<br /> we need to process all the patterns which are in the rules section. |
|||
|- |
|||
| align=center | Macros || <code>macro_firstWord_start: <br /> ... <br /> macro_firstWord_end: <br />...</code> || align=center | Macros code section || This section contains all the macro's code delimited by labels.<br /> Each macro can be called with the 'call' instruction. |
|||
|- |
|||
| align=center | Patterns || <code>section_rules_start:<br /> patterns_start:<br /> push "all<predet><sp>"<br /> push "<n><pl>"<br /> push 2<br /> addtrie action_0_start<br /> ... <br />patterns_end:</code> || align=center | Preprocess code section || In this section all the patterns will be added to the system trie. <br />In this example you can see that two patterns are pushed, then the number of patterns is pushed and finally<br /> the addtrie instruction pops them and adds an entry in the trie to the rule 0. |
|||
|- |
|||
| align=center | Rules || <code>action_0_start:<br /> ...<br /> action_0_end:<br /> ...<br /> section_rules_end:</code> || align=center | Rules code section || Finally the rules section contains every rule delimited by its labels and all its code. |
|||
|} |
|||
*One line comments can be made by using the '#' symbol at the start of the line. |
|||
<code> |
|||
push "det_det_nom_adj" |
|||
push "<SN>" |
|||
pusht tipus_det ; first evaluate the variable, append/prepend '<>', then push in the stack |
|||
pusht gen_chunk |
|||
pusht nbr_chunk |
|||
push 1 |
|||
push "^\w+" ; lem |
|||
cliptl |
|||
push 1 |
|||
push [regex] ; a_det |
|||
cliptl |
|||
push "<3>" ; since link-to overrides everything else, we do not need any dedicated instruction |
|||
; for that |
|||
push 1 |
|||
push [regex] ; gen_mf |
|||
cliptl |
|||
push "<4>" |
|||
push 1 |
|||
push [regex] ; nbr_sp |
|||
cliptl |
|||
lu 6 ; pop 6 items, concat, create lexical unit ^...$ and push back in stack |
|||
pushbl ; push a blank |
|||
push "el" |
|||
push "<det><def>" |
|||
push "<3>" |
|||
push "<pl>" |
|||
lu 4 ; pop 4 items from the stack, create a lexical unit ^...$ and then |
|||
; push in the stack |
|||
pushsb 1 |
|||
push 3 |
|||
push [regex] ; lemh |
|||
cliptl |
|||
push 3 |
|||
push [regex] ; a_nom |
|||
cliptl |
|||
push "<3>" |
|||
push 3 |
|||
push [regex] ; gen_mf |
|||
cliptl |
|||
push "<4>" |
|||
push 3 |
|||
push [regex] ; nbr_sp |
|||
cliptl |
|||
push 3 |
|||
push [regex] ; lemq |
|||
cliptl |
|||
lu 7 |
|||
pushbl |
|||
pushsb 2 |
|||
pushv adjectiu1 ; its a var, so eval and push the value |
|||
push 3 |
|||
push [regex] ; lemh |
|||
cliptl |
|||
push 3 |
|||
push [regex] ; a_adj |
|||
cliptl |
|||
push "<3>" |
|||
push 3 |
|||
push [regex] ; gen_mf |
|||
cliptl |
|||
push "<4>" |
|||
push "<4>" ; a bit confused, there are two link-to in the XML |
|||
push 3 |
|||
push [regex] ; lemq |
|||
cliptl |
|||
lu 7 |
|||
brace 7 ; no of blank + lexical unit = 7 |
|||
; pop 7 items, concat, prepend and append {, } then push back |
|||
chunk 6 ; create the chunk, ^...{^...$}$, and push back in stack |
|||
out 1 ; give output (number of chunks = 1) |
|||
</code> |
|||
=== |
=== Code examples === |
||
==== |
==== Macro example ==== |
||
<code> |
|||
<section-def-cats> |
|||
<def-cat n="nom"> |
|||
<cat-item tags="n.*"/> |
|||
</def-cat> |
|||
<def-cat n="det"> |
|||
<cat-item tags="det.*"/> |
|||
<cat-item tags="predet.*"/> |
|||
</def-cat> |
|||
</section-def-cats> |
|||
<section-rules> |
|||
<rule> |
|||
<pattern> |
|||
<pattern-item n="det"/> |
|||
</pattern> |
|||
</rule> |
|||
<rule> |
|||
<pattern> |
|||
<pattern-item n="nom"/> |
|||
</pattern> |
|||
<action/> |
|||
</rule> |
|||
<rule> |
|||
<pattern> |
|||
<pattern-item n="det"/> |
|||
<pattern-item n="nom"/> |
|||
</pattern> |
|||
<action/> |
|||
</rule> |
|||
</section-rules> |
|||
</code> |
|||
==== |
===== Transfer file (.t1x) ===== |
||
<code> |
<code> |
||
;first rule: def-cat has two equivalent cat-items |
|||
push "\w<det>\t" ;load pattern into stack |
|||
push 1 |
|||
addtrie [address1] ;define a trie pattern with value 1 (the first rule) |
|||
push "\w<predet>\t" ;same with the second cat-item |
|||
push 1 |
|||
addtrie [address1] |
|||
;second rule (and so on) very simple, unique cat-item |
|||
push "\w<n>\t" |
|||
push 1 |
|||
addtrie [address2] |
|||
;third rule (here is the trick: multiple cat-items in one of the words) |
|||
push "\w<det>\t" |
|||
push "\w<n>\t" |
|||
push 2 ; we have 'det' followed by a 'nom', so addtrie has to pop two elements |
|||
addtrie [address3] |
|||
push "\w<predet>\t" |
|||
push "\w<n>\t" |
|||
push 2 |
|||
addtrie [address3] |
|||
</code> |
|||
<source lang="xml"> |
|||
=== Example 3 === |
|||
==== XML t1x Code ==== |
|||
<transfer default="chunk"> |
|||
<code> |
|||
<def-macro n="f_coma" npar="1"> |
|||
<choose> |
|||
<when> |
|||
<test> |
|||
<equal caseless="yes"> |
|||
<clip pos="1" side="sl" part="lem"/> |
|||
<lit v="como"/> |
|||
</equal> |
|||
</test> |
|||
<let> |
|||
<clip pos="1" side="tl" part="lem"/> |
|||
<get-case-from pos="1"> |
|||
<lit v="com a"/> |
|||
</get-case-from> |
|||
</let> |
|||
</when> |
|||
</choose> |
|||
</def-macro> |
|||
</code> |
|||
<section-def-attrs> |
|||
==== Compiled code ==== |
|||
<def-attr n="nbr"> |
|||
<attr-item tags="sg"/> |
|||
<attr-item tags="pl"/> |
|||
<attr-item tags="sp"/> |
|||
<attr-item tags="ND"/> |
|||
</def-attr> |
|||
</section-def-attrs> |
|||
<section-def-vars> |
|||
<code> |
|||
<def-var n="nombre" v="&lt;sg&gt;"/> |
|||
f_coma: push 1 ; "pos" of "clip" |
|||
<def-var n="genere" v="&lt;m&gt;"/> |
|||
push "^\w+" ; "lem" |
|||
</section-def-vars> |
|||
clipsl ; gets the value clips on the top of the stack. |
|||
; "sl" side is implied in the name of the instruction |
|||
push "como" |
|||
cmpi ; does the comparison and cleans the stack, it means caseless |
|||
jnz end ; if the comparison does not succeeds, go to end |
|||
; semantics: j = jump n = not z = zero flag is activated |
|||
; zero flag is activated when a comparison succeeds |
|||
; or an arithmetical operation gives 0 |
|||
push 1 ; "pos" of "clip" |
|||
push "^\w+" |
|||
push "com a" |
|||
storetl ; store the value provided in the top of the stack |
|||
; given position 1, "tl" side and "lem" |
|||
end: ... |
|||
</code> |
|||
<section-def-macros> |
|||
=== Example 4 === |
|||
<def-macro n="nombre_nom" npar="1"> |
|||
==== XML t1x Code ==== |
|||
<let> |
|||
<code> |
|||
<var n="nombre"/> |
|||
<test> |
|||
< |
<lit v=""/> |
||
< |
</let> |
||
< |
<choose> |
||
< |
<when> |
||
<test> |
|||
< |
<and> |
||
<equal> |
|||
</not> |
|||
<clip pos="1" side="sl" part="nbr"/> |
|||
<not> |
|||
< |
<lit-tag v="sg"/> |
||
</equal> |
|||
<equal> |
|||
<clip pos="1" side="tl" part="nbr"/> |
|||
</equal> |
|||
<lit-tag v="pl"/> |
|||
</not> |
|||
</ |
</equal> |
||
</and> |
|||
</test> |
|||
</test> |
|||
</code> |
|||
<let> |
|||
==== Compiled code ==== |
|||
<var n="nombre"/> |
|||
<lit-tag v="pl_slsg"/> |
|||
<code> |
|||
</let> |
|||
</when> |
|||
push [regex] ; part="gen" |
|||
clipsl |
|||
push 3 |
|||
push [regex] ; part="gen" |
|||
clipsl |
|||
cmp ; compare (case sensitive) |
|||
pushnz ; NOT zero flag and push in stack |
|||
push 2 |
|||
push [regex] ; part="gen" |
|||
clipsl |
|||
push 3 |
|||
push [regex] ; part="gen" |
|||
clipsl |
|||
cmp ; compare (case sensitive) |
|||
pushnz |
|||
or ; pop 2 items and OR, push result in stack |
|||
jnz end ; jump if zero flag is 0 (we did not get ZERO as the result) |
|||
... ... ... |
|||
(code for successful test) |
|||
... ... ... |
|||
end: ... |
|||
</code> |
|||
=== Example 5 === |
|||
==== XML t1x Code ==== |
|||
<def-list n="verbos_est"> |
|||
<list-item v="actuar"/> |
|||
<list-item v="buscar"/> |
|||
<list-item v="estudiar"/> |
|||
<list-item v="existir"/> |
|||
<list-item v="ingressar"/> |
|||
<list-item v="introduir"/> |
|||
<list-item v="penetrar"/> |
|||
<list-item v="publicar"/> |
|||
<list-item v="treballar"/> |
|||
<list-item v="viure"/> |
|||
</def-list> |
|||
<rule> |
|||
<pattern> |
|||
<pattern-item n="verb"/> |
|||
<pattern-item n="a"/> |
|||
</pattern> |
|||
<action> |
|||
<choose> |
|||
<when> |
|||
<test> |
|||
<in caseless="yes"/> |
|||
<clip pos="1" side="sl" part="lem"/> |
|||
<list n="verbos_est"/> |
|||
</in> |
|||
</test> |
|||
<let> |
|||
<clip pos="2" side="tl" part="lem"/> |
|||
<lit v="en"/> |
|||
</let> |
|||
</when> |
|||
</choose> |
|||
</rule> |
|||
==== Compiled code ==== |
|||
push "actuar" |
|||
push "buscar" |
|||
push "estudiar" |
|||
push "existir" |
|||
push "ingressar" |
|||
push "introduir" |
|||
push "penetrar" |
|||
push "publicar" |
|||
push "treballar" |
|||
push "viure" |
|||
push 10 ; number of elements in the list |
|||
mklist verbos_est ; make a list variable named 'verbos_est' and put the last 10 data |
|||
; from the stack in the list |
|||
rule1: push [regex_verb] |
|||
push [regex_a] |
|||
push 2 |
|||
addtrie rule1_action |
|||
... ... ... |
|||
... ... ... |
|||
rule1_action: push 1 |
|||
push "^\w+" ; lem |
|||
clipsl ; we have lemmma in stack now |
|||
incini verbox_est ; if in verbos_est (ignore case), set ZF = 1, else ZF = 0 |
|||
jnz rule1_end |
|||
push 2 |
|||
push "^\w+" |
|||
push "en" |
|||
storetl |
|||
rule1_end: ... |
|||
=== Example 6 === |
|||
==== XML t1x Code ==== |
|||
<def-macro n="firstWord" npar="1"> |
|||
<choose> |
|||
<when> |
|||
<test> |
|||
<equal> |
|||
<clip pos="1" side="sl" part="a_np_acr"/> |
|||
<lit v=""/> |
|||
</equal> |
|||
</test> |
|||
<choose> |
|||
<when> |
<when> |
||
<test> |
<test> |
||
< |
<and> |
||
<equal> |
|||
< |
<clip pos="1" side="sl" part="nbr"/> |
||
</ |
<lit-tag v="pl"/> |
||
</equal> |
|||
<equal> |
|||
<clip pos="1" side="tl" part="nbr"/> |
|||
<lit-tag v="sg"/> |
|||
</equal> |
|||
</and> |
|||
</test> |
</test> |
||
<modify-case> |
|||
<clip pos="1" side="tl" part="lem"/> |
|||
<lit v="aa"/> |
|||
</modify-case> |
|||
<let> |
<let> |
||
<var n=" |
<var n="nombre"/> |
||
<lit v=" |
<lit-tag v="sg_slpl"/> |
||
</let> |
</let> |
||
</when> |
</when> |
||
<otherwise> |
<otherwise> |
||
<let> |
<let> |
||
<var n=" |
<var n="nombre"/> |
||
< |
<clip pos="1" side="tl" part="nbr"/> |
||
</let> |
</let> |
||
</otherwise> |
</otherwise> |
||
</choose> |
</choose> |
||
</ |
</def-macro> |
||
</section-def-macros> |
|||
<otherwise> |
|||
<let> |
|||
</transfer> |
|||
<var n="caseFirstWord"/> |
|||
<lit v="aa"/> |
|||
</source> |
|||
</let> |
|||
</otherwise> |
|||
</code> |
|||
</choose> |
|||
<let> |
|||
===== Code generated ===== |
|||
<var n="EOS"/> |
|||
<lit v="false"/> |
|||
<source lang="bash"> |
|||
</let> |
|||
< |
#<assembly> |
||
#<transfer default="chunk"> |
|||
#<def-var v="<sg>" n="nombre"> |
|||
push "nombre" |
|||
<rule comment="REGLA: DET DET ADJ NOM (your many beautiful cats)"> |
|||
push "<sg>" |
|||
... ... |
|||
storev |
|||
<action> |
|||
#<def-var v="<m>" n="genere"> |
|||
<call-macro n="firstWord"> |
|||
push "genere" |
|||
<with-param pos="1"/> |
|||
push "<m>" |
|||
</call-macro> |
|||
storev |
|||
<call-macro n="f_concord4"> |
|||
jmp section_rules_start |
|||
<with-param pos="4"/> |
|||
#<def-macro npar="1" n="nombre_nom"> |
|||
<with-param pos="3"/> |
|||
macro_nombre_nom_start: |
|||
<with-param pos="2"/> |
|||
#<var n="nombre"> |
|||
<with-param pos="1"/> |
|||
push "nombre" |
|||
</call-macro> |
|||
#<lit v=""> |
|||
... |
|||
push "" |
|||
<out> |
|||
storev |
|||
<chunk name="det_det_nom_adj" case="caseFirstWord"> |
|||
#<clip part="nbr" pos="1" side="sl"> |
|||
... ... |
|||
push 1 |
|||
</chunk> |
|||
push "<sg>|<pl>|<sp>|<ND>" |
|||
</out> |
|||
clipsl |
|||
</action> |
|||
#<lit-tag v="sg"> |
|||
</rule> |
|||
push "<sg>" |
|||
cmp |
|||
#<clip part="nbr" pos="1" side="tl"> |
|||
push 1 |
|||
push "<sg>|<pl>|<sp>|<ND>" |
|||
cliptl |
|||
#<lit-tag v="pl"> |
|||
push "<pl>" |
|||
cmp |
|||
and 2 |
|||
jz when_0_end |
|||
#<var n="nombre"> |
|||
push "nombre" |
|||
#<lit-tag v="pl_slsg"> |
|||
push "<pl_slsg>" |
|||
storev |
|||
jmp choose_0_end |
|||
when_0_end: |
|||
#<clip part="nbr" pos="1" side="sl"> |
|||
push 1 |
|||
push "<sg>|<pl>|<sp>|<ND>" |
|||
clipsl |
|||
#<lit-tag v="pl"> |
|||
push "<pl>" |
|||
cmp |
|||
#<clip part="nbr" pos="1" side="tl"> |
|||
push 1 |
|||
push "<sg>|<pl>|<sp>|<ND>" |
|||
cliptl |
|||
#<lit-tag v="sg"> |
|||
push "<sg>" |
|||
cmp |
|||
and 2 |
|||
jz when_1_end |
|||
#<var n="nombre"> |
|||
push "nombre" |
|||
#<lit-tag v="sg_slpl"> |
|||
push "<sg_slpl>" |
|||
storev |
|||
jmp choose_0_end |
|||
when_1_end: |
|||
#<otherwise> |
|||
#<var n="nombre"> |
|||
push "nombre" |
|||
#<clip part="nbr" pos="1" side="tl"> |
|||
push 1 |
|||
push "<sg>|<pl>|<sp>|<ND>" |
|||
cliptl |
|||
storev |
|||
choose_0_end: |
|||
macro_nombre_nom_end: ret |
|||
#<section-rules> |
|||
section_rules_start: |
|||
section_rules_end: |
|||
</source> |
|||
---- |
|||
==== Rule's patterns example ==== |
|||
===== Transfer file (.t1x) ===== |
|||
<source lang="xml"> |
|||
<transfer default="chunk"> |
|||
<section-def-cats> |
|||
<def-cat n="all"> |
|||
<cat-item lemma="all" tags="predet.sp"/> |
|||
</def-cat> |
|||
<def-cat n="adj2"> |
|||
<cat-item tags="adj"/> |
|||
<cat-item tags="adj.*"/> |
|||
<cat-item tags="adj.sint"/> |
|||
<cat-item tags="adj.sint.*"/> |
|||
<cat-item tags="adj.comp"/> |
|||
<cat-item tags="adj.sup"/> |
|||
</def-cat> |
|||
<def-cat n="nomcomu"> |
|||
<cat-item tags="n.*"/> |
|||
</def-cat> |
|||
<def-cat n="nompropi"> |
|||
<cat-item tags="np.*"/> |
|||
</def-cat> |
|||
<def-cat n="nploc"> |
|||
<cat-item tags="np.loc.*"/> |
|||
</def-cat> |
|||
</section-def-cats> |
|||
<rule> |
|||
<pattern> |
|||
<pattern-item n="all"/> |
|||
<pattern-item n="adj2"/> |
|||
</pattern> |
|||
<action> |
|||
... |
|||
</action> |
|||
</rule> |
|||
<rule> |
|||
<pattern> |
|||
<pattern-item n="nomcomu"/> |
|||
<pattern-item n="nompropi"/> |
|||
<pattern-item n="nploc"/> |
|||
</pattern> |
|||
<action> |
|||
... |
|||
</action> |
|||
</rule> |
|||
</section-rules> |
|||
</transfer> |
|||
</source> |
|||
===== Code generated ===== |
|||
<source lang="bash"> |
|||
#<assembly> |
|||
#<transfer default="chunk"> |
|||
jmp section_rules_start |
|||
#<section-rules> |
|||
section_rules_start: |
|||
patterns_start: |
|||
push "all<predet><sp>" |
|||
push "<adj>|<adj><*>|<adj><sint>|<adj><sint><*>|<adj><comp>|<adj><sup>" |
|||
push 2 |
|||
addtrie action_0_start |
|||
push "<n><*>" |
|||
push "<np><*>" |
|||
push "<np><loc><*>" |
|||
push 3 |
|||
addtrie action_1_start |
|||
patterns_end: |
|||
action_0_start: |
|||
... |
|||
action_0_end: |
|||
action_1_start: |
|||
... |
|||
action_1_end: |
|||
section_rules_end: |
|||
</source> |
|||
---- |
|||
==Wishlist== |
|||
==== Compiled code ==== |
|||
* Make the compile options like with lt-comp/cg-comp (e.g. you don't need -i/-o to say what the input/output files are), just <code>apertium-compiler [options] <infile> <outfile></code> |
|||
firstWord: |
|||
... ... ; normal translation of instructions, all the variables are assumed global |
|||
... ... |
|||
ret ; ret instruction does a number of things |
|||
; pops 'frame stack', current 'local variable frame' is reset with popped |
|||
; values (actually its more pointer assignment), C++ version will also |
|||
; do the necessary deallocations |
|||
; pops global stack, update PC with the popped value |
|||
... ... |
|||
... ... |
|||
rule_ddan_action: push 1 ; pos = 1 |
|||
push 1 ; number of parameters 1 |
|||
call firstWord ; macro label |
|||
; call statement does a number of things |
|||
; 1. temppc = PC + 1, set PC = firstWord |
|||
; 2. pushes the current 'local variable frame' into 'frame stack' |
|||
; 3. create a new 'local variable frame' |
|||
; 4. pops the arguments from the stack and places then in the 'local |
|||
; variable frame' |
|||
; 5. pushes temppc in global stack (it will be used by the return |
|||
; statement) |
|||
; 6. continue (instruction at firstWord will be evaluated next) |
|||
push 1 ; notice that the arguments are pushed in reverse order |
|||
; when popped, they will be in the right order |
|||
push 2 |
|||
push 3 |
|||
push 4 |
|||
push 4 |
|||
call f_concord4 |
|||
... ... |
|||
==External links== |
|||
== Development Notes == |
|||
* [https://github.com/ggm/vm-for-transfer-cpp Github: vm-for-transfer-cpp] |
|||
* None of the macro and actions need to return anything (unlike conventional functions), so provision for returning a value (using stack) is unnecessary |
|||
[[Category:Development]] |
|||
* The local variable frame is actually a queue with a maximum length equal to the maximum pattern length in the trie. |
|||
[[Category:Documentation in English]] |
|||
[[Category:Transfer]] |
Latest revision as of 13:58, 7 October 2014
Contents
Instruction Set[edit]
Mnemonic | Opcode (in hex) |
Other operands | Stack [before]→[after] (top, top-1, ...) |
Description |
---|---|---|---|---|
push | - | value | [empty] → value | Pushes a string or a variable value onto the stack. Strings go between quotes ("string") but variable's names not |
pushbl | - | N/A | [empty] → blank | Pushes a blank onto the stack |
pushsb | - | pos | [empty] → superblank | Pushes the superblank at 'pos' onto the stack |
append | - | N | valueN, ..., value1, varName → [empty] | Pops 'N' elements and appends them to a variable or clip |
concat | - | N | valueN, ..., value1 → value1...valueN | Pops 'N' elements and pushes them back concatenated |
clip | - | N/A | part → value | Obtains the part in the only language there is (inter/post-chunk) and pushes the value onto the stack |
clipsl | - | link-to | part, pos → value | Obtains the 'part' in source language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "clipsl <3>". |
cliptl | - | link-to | part, pos → value | Obtains the 'part' in target language in position 'pos' and pushes the 'value' onto the stack. An optional operand is used for clips with link-to tags, e.g. "cliptl <3>". |
storecl | - | N/A | value, part → [empty] | Stores 'value' in the only language there is (inter/post-chunk) |
storesl | - | N/A | value, part, pos → [empty] | Stores 'value' as the 'part' of the source language in position 'pos' |
storetl | - | N/A | value, part, pos → [empty] | Stores 'value' as the 'part' of the target language in position 'pos' |
storev | - | N/A | value, varName → [empty] | Stores 'value' in the variable with name 'varName' |
addtrie | - | address | N, patternN, ..., pattern1 → [empty] | Pops 'N' patterns and creates a trie entry pointing to 'address' |
lu | - | N | valueN, ..., value1 → ^(lexical_unit)$ | Pops 'N' values from the stack, creates a lexical unit ^...$ with them and pushes the lu back onto the stack |
mlu | - | N | luN, ..., lu1 → multiword | Pops 'N' lu from the stack, creates a multiword with them and pushes the multiword back onto the stack |
lu-count | - | N/A | [empty] → number | Pushes the number of lexical units (words inside the chunk) in the rule onto the stack |
chunk | - | N | elemN-2, ... , elem1, <tags>, name → ^name<tags>{elem1...elemN-2}$ | Pops 'N' amount of data from the stack, creates the chunk and pushes it back onto the stack |
out | - | N | valueN, ..., value1 → [empty] | Pops 'N' values from the stack and outputs them |
cmp | - | N/A | value2, value1 → result | Pops 'value1' and 'value2', compares them, if they are equal pushes a 1 (true), if they aren't pushes a 0 (false) |
cmpi | - | N/A | value2, value1 → result | Pops 'value1' and 'value2', compares them (ignoring case for each string), if they are equal pushes a 1 (true), if they aren't pushes a 0 (false) |
cmp-substr | - | N/A | value2, value1 → result | Tests if 'value1' contains the substring 'value2', result can be 1 (true) or 0 (false). |
cmpi-substr | - | N/A | value2, value1 → result | Tests if 'value1' contains the substring 'value2' (ignoring case for each string), result can be 1 (true) or 0 (false). |
not | - | N | value → result | Negates the value on top of the stack, 0 -> 1 or 1 -> 0 |
and | - | N | valueN, ..., value1 → result | And operation of 'N' values, result can be 1 (true) or 0 (false) |
or | - | N | valueN, ..., value1 → result | Or operation of 'N' values, result can be 1 (true) or 0 (false) |
in | - | N/A | list, value → result | Performs a search of a 'value' in a 'list' |
inig | - | N/A | list, value → result | Performs a search (ignoring case) of a 'value' in a 'list' |
jmp | - | label | [empty] → [empty] | Jumps to the label, unconditionally |
jz | - | label | top → [empty] | Jumps to the label if stack.top == 0 |
jnz | - | label | top → [empty] | Jumps to the label if stack.top == 1 |
call | - | label | N, argN, ..., arg1 → [empty] | Calls a macro with the arguments on the stack |
ret | - | N/A | [empty] → [empty] | Returns from a macro, PC will be handled automatically by the VM. |
nop | - | N/A | [empty] → [empty] | No operation |
case-of | - | N/A | container → case | Gets the case from the container in the stack. The container would usually be the result of a clip instruction but can be any string. |
get-case-from | - | N/A | pos → case | Gets the case from the lexical unit in position 'pos' |
modify-case | - | N/A | case, container → modifiedContainer | Modifies the case of the 'container' to 'case' and leaves the modified container on the stack |
begins-with | - | N/A | value2, value1 → result | Checks if 'value1' begins with 'value2' and pushes 1 (true) or 0 (false), 'value2' can be a list |
begins-with-ig | - | N/A | value2, value1 → result | Checks if 'value1' begins with 'value2' (ignoring the case) and pushes 1 (true) or 0 (false), 'value2' can be a list |
ends-with | - | N/A | value2, value1 → result | Checks if 'value1' ends with 'value2' and pushes 1 (true) or 0 (false), 'value2' can be a list |
ends-with-ig | - | N/A | value2, value1 → result | Checks if 'value1' ends with 'value2' (ignoring the case) and pushes 1 (true) or 0 (false), 'value2' can be a list |
- Lists are represented as a concatenation of items separated by '|', e.g. uno|otro|poco|cuánto|menos|mucho|tanto|demasiado
- The case is represented as "aa" (all lowercase), "Aa" (first uppercase) and "AA", (all uppercase).
Code generation[edit]
Code sections[edit]
The code generated by the compiler is divided in several sections. In addition, the VM reads and stores the code in its own sections.
Section | Code | VM's section | Information |
---|---|---|---|
Header | #<assembly> |
---- | This section establishes the type of code generated and the transfer stage. |
Initialisation | push "genere" |
Code section | In this section we initialize the variables with their default value and execute other initialisation code. At the end we jmp to the section rules section, although rules will only execute when a pattern is matched, we need to process all the patterns which are in the rules section. |
Macros | macro_firstWord_start: |
Macros code section | This section contains all the macro's code delimited by labels. Each macro can be called with the 'call' instruction. |
Patterns | section_rules_start: |
Preprocess code section | In this section all the patterns will be added to the system trie. In this example you can see that two patterns are pushed, then the number of patterns is pushed and finally the addtrie instruction pops them and adds an entry in the trie to the rule 0. |
Rules | action_0_start: |
Rules code section | Finally the rules section contains every rule delimited by its labels and all its code. |
- One line comments can be made by using the '#' symbol at the start of the line.
Code examples[edit]
Macro example[edit]
Transfer file (.t1x)[edit]
<transfer default="chunk">
<section-def-attrs>
<def-attr n="nbr">
<attr-item tags="sg"/>
<attr-item tags="pl"/>
<attr-item tags="sp"/>
<attr-item tags="ND"/>
</def-attr>
</section-def-attrs>
<section-def-vars>
<def-var n="nombre" v="&lt;sg&gt;"/>
<def-var n="genere" v="&lt;m&gt;"/>
</section-def-vars>
<section-def-macros>
<def-macro n="nombre_nom" npar="1">
<let>
<var n="nombre"/>
<lit v=""/>
</let>
<choose>
<when>
<test>
<and>
<equal>
<clip pos="1" side="sl" part="nbr"/>
<lit-tag v="sg"/>
</equal>
<equal>
<clip pos="1" side="tl" part="nbr"/>
<lit-tag v="pl"/>
</equal>
</and>
</test>
<let>
<var n="nombre"/>
<lit-tag v="pl_slsg"/>
</let>
</when>
<when>
<test>
<and>
<equal>
<clip pos="1" side="sl" part="nbr"/>
<lit-tag v="pl"/>
</equal>
<equal>
<clip pos="1" side="tl" part="nbr"/>
<lit-tag v="sg"/>
</equal>
</and>
</test>
<let>
<var n="nombre"/>
<lit-tag v="sg_slpl"/>
</let>
</when>
<otherwise>
<let>
<var n="nombre"/>
<clip pos="1" side="tl" part="nbr"/>
</let>
</otherwise>
</choose>
</def-macro>
</section-def-macros>
</transfer>
Code generated[edit]
#<assembly>
#<transfer default="chunk">
#<def-var v="<sg>" n="nombre">
push "nombre"
push "<sg>"
storev
#<def-var v="<m>" n="genere">
push "genere"
push "<m>"
storev
jmp section_rules_start
#<def-macro npar="1" n="nombre_nom">
macro_nombre_nom_start:
#<var n="nombre">
push "nombre"
#<lit v="">
push ""
storev
#<clip part="nbr" pos="1" side="sl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
clipsl
#<lit-tag v="sg">
push "<sg>"
cmp
#<clip part="nbr" pos="1" side="tl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
cliptl
#<lit-tag v="pl">
push "<pl>"
cmp
and 2
jz when_0_end
#<var n="nombre">
push "nombre"
#<lit-tag v="pl_slsg">
push "<pl_slsg>"
storev
jmp choose_0_end
when_0_end:
#<clip part="nbr" pos="1" side="sl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
clipsl
#<lit-tag v="pl">
push "<pl>"
cmp
#<clip part="nbr" pos="1" side="tl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
cliptl
#<lit-tag v="sg">
push "<sg>"
cmp
and 2
jz when_1_end
#<var n="nombre">
push "nombre"
#<lit-tag v="sg_slpl">
push "<sg_slpl>"
storev
jmp choose_0_end
when_1_end:
#<otherwise>
#<var n="nombre">
push "nombre"
#<clip part="nbr" pos="1" side="tl">
push 1
push "<sg>|<pl>|<sp>|<ND>"
cliptl
storev
choose_0_end:
macro_nombre_nom_end: ret
#<section-rules>
section_rules_start:
section_rules_end:
Rule's patterns example[edit]
Transfer file (.t1x)[edit]
<transfer default="chunk">
<section-def-cats>
<def-cat n="all">
<cat-item lemma="all" tags="predet.sp"/>
</def-cat>
<def-cat n="adj2">
<cat-item tags="adj"/>
<cat-item tags="adj.*"/>
<cat-item tags="adj.sint"/>
<cat-item tags="adj.sint.*"/>
<cat-item tags="adj.comp"/>
<cat-item tags="adj.sup"/>
</def-cat>
<def-cat n="nomcomu">
<cat-item tags="n.*"/>
</def-cat>
<def-cat n="nompropi">
<cat-item tags="np.*"/>
</def-cat>
<def-cat n="nploc">
<cat-item tags="np.loc.*"/>
</def-cat>
</section-def-cats>
<rule>
<pattern>
<pattern-item n="all"/>
<pattern-item n="adj2"/>
</pattern>
<action>
...
</action>
</rule>
<rule>
<pattern>
<pattern-item n="nomcomu"/>
<pattern-item n="nompropi"/>
<pattern-item n="nploc"/>
</pattern>
<action>
...
</action>
</rule>
</section-rules>
</transfer>
Code generated[edit]
#<assembly>
#<transfer default="chunk">
jmp section_rules_start
#<section-rules>
section_rules_start:
patterns_start:
push "all<predet><sp>"
push "<adj>|<adj><*>|<adj><sint>|<adj><sint><*>|<adj><comp>|<adj><sup>"
push 2
addtrie action_0_start
push "<n><*>"
push "<np><*>"
push "<np><loc><*>"
push 3
addtrie action_1_start
patterns_end:
action_0_start:
...
action_0_end:
action_1_start:
...
action_1_end:
section_rules_end:
Wishlist[edit]
- Make the compile options like with lt-comp/cg-comp (e.g. you don't need -i/-o to say what the input/output files are), just
apertium-compiler [options] <infile> <outfile>