Difference between revisions of "Apertium-recursive/Formalism"
Popcorndude (talk | contribs) |
Popcorndude (talk | contribs) |
||
(23 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
=== File Structure === |
|||
A proposal for a recursive transfer rule formalism. |
|||
A <code>.rtx</code> contains attribute categories, tag-order rules, tag-rewrite rules, and reduction rules. |
|||
=== Basic Rule Syntax === |
|||
Comments begin with exclamation points (<code>!</code>) and end at the end of the line. To include spaces in any name, either escape the space with a backslash (<code>\</code>) or enclose the name in double quotes (<code>"</code>). |
|||
Rules consist of a node type, an optional weight, a pattern, an optional condition, an optional variable setting, and an output, in that order. |
|||
=== Attribute Categories === |
|||
NP -> det n {2 _1 1}; |
|||
An attribute category can be defined like this: |
|||
This matches a determiner followed by a noun, combines them into an NP chunk, and at output time produces "noun determiner". |
|||
NP -> 1: n {1} | |
|||
2: n.*.def {the@det.def.sg _ 1}; |
|||
Here the first rule will match any noun, while the second will match a noun with a <code><def></code> tag. Since the second rule has a higher weight, the first rule will not be applied if they both match. |
|||
NP -> NP and@cnjcoo NP [$number=pl] {1 _1 2 _2 3}; |
|||
Here the rule specifies that the resulting chunk will be marked with a <code><pl></code> tag. |
|||
AP -> adj and@cnjcoo adj ?(1.gender/sl = 3.gender/sl) {1 _1 2 _2 3}; |
|||
This rule will not apply if the two adjectives have different genders. |
|||
The arrow can be written as either <code>-></code> or <code>→</code>. |
|||
The process by which rules are selected is described [[User:Popcorndude/Recursive_Transfer/Parser | here]]. |
|||
=== Attribute Lists === |
|||
A list of attributes can be defined like this: |
|||
gender = m f GD ; |
gender = m f GD ; |
||
number = sg pl ND ; |
number = sg pl ND ; |
||
An attribute |
An attribute category can also specify undefined and default values: |
||
gender = (GD m) m f GD; |
gender = (GD m) m f GD; |
||
This defines the <code>gender</code> category as before, but with the addition that if any rule tries to read the gender of a node that doesn't have a gender tag, the result will be <code><GD></code> rather than the empty string. It also states that any remaining <code><GD></code> tags will be replaced with <code><m></code> tags in the output step. |
This defines the <code>gender</code> category as before, but with the addition that if any rule tries to read the gender of a node that doesn't have a gender tag, the result will be <code><GD></code> rather than the empty string. It also states that any remaining <code><GD></code> tags will be replaced with <code><m></code> tags in the output step. |
||
An attribute category can also specify certain values as non-overwritable. |
|||
gender = m f @mf; |
|||
This states that if a lexical unit has a target-language <code><mf></code> tag and a rule tries to replace that tag with something else, the <code><mf></code> tag will be used instead of the replacement. |
|||
An attribute category can include another: |
An attribute category can include another: |
||
Line 46: | Line 31: | ||
det_type = dem [definite] pos; |
det_type = dem [definite] pos; |
||
det_type = dem def ind pos; |
det_type = dem def ind pos; |
||
The name of an attribute category cannot be any of the following: |
|||
{| class=wikitable |
|||
|- |
|||
! Name |
|||
! Meaning |
|||
|- |
|||
| <code>lem</code> |
|||
| The lemma of an LU or chunk |
|||
|- |
|||
| <code>lemh</code> and <code>lemq</code> |
|||
| The first and second parts of a multiword with inner inflection. See [[Multiwords]], where lemh corresponds to the inflected portion and lemq is the portion in <g> |
|||
|- |
|||
| <code>tags</code> |
|||
| All of the tags of an LU or chunk |
|||
|- |
|||
| <code>pos_tag</code> |
|||
| The first tag or an LU or chunk |
|||
|- |
|||
| <code>whole</code>, <code>chname</code>, <code>chcontent</code>, and <code>content</code> |
|||
| Internal names, some of which are included primarily for compatibility with t*x rules. Do not use these directly |
|||
|} |
|||
=== Tag Order === |
=== Tag Order === |
||
Line 55: | Line 63: | ||
NP: _.number; |
NP: _.number; |
||
Where <code>_</code> represents |
Where <code>_</code> represents the part of speech tag. If the part of speech tag is different between the source and target languages, the target language one will be used. The lemma head is automatically appended at the beginning of the pattern and the lemma queue is automatically attached to the end. |
||
To specify a literal tag in a pattern, put it in angle brackets: |
To specify a literal tag in a pattern, put it in angle brackets: |
||
det: _.<def>.number; |
det: _.<def>.number; |
||
Which tag order to use is determined solely by the first tag in the in the pattern of the reduction rule. See the output section below for how to override this choice. See also the macro section for a more powerful version of these rules. |
|||
The underscore is not mandatory and is merely a shorthand for <code>pos_tag</code>. Thus a literal part of speech tag can be used instead. |
|||
=== Tag Rewrite Rules === |
|||
This is a way to convert certain sets of tags, either between two languages that have different sets of tenses, or between something like object agreement and number marking. |
|||
object_agr = o1sg o1pl o2sg o2pl o3sg o3pl ; |
|||
number = sg pl ; |
|||
person = p1 p2 p3 ; |
|||
object_agr > person: o1sg p1, o1pl p1, o2sg p2, o2pl p2, o3sg p3, o3pl p3 ; |
|||
object_agr > number: o1sg sg, o1pl pl, o2sg sg, o2pl pl, o3sg sg, o3pl pl ; |
|||
VP -> @v NP {2[number=1.object_agr] _1 1} ; |
|||
In this example, if the verb had <code><o2sg></code>, it would be converted to <code><sg></code> when it was set as the <code>number</code> attribute of the noun. |
|||
tense = farpst nearpst pst prs fut nonpst ; |
|||
tense > tense: farpst pst, nearpst pst, prs nonpst, fut nonpst ; |
|||
In this example, no explicit assignment needs to take place and the 4 tenses of the source language (<code>farpst, nearpst, prs, fut</code>) would be automatically converted to the 2 of the target language (<code>pst, nonpst</code>). |
|||
Converting from 4 to 3 with something like <pre>tense > tense: farpst pst, nearpst pst ;</pre> |
|||
will also work, the unchanged tags not needing to be explicitly mentioned. |
|||
Tags rewrite rules apply in the output step. When building the parse tree, only unconverted tags are used to create chunks. This makes conversions like the following one potentially dangerous: |
|||
tense > tense: midpst pst, pst pri; |
|||
If tense is being propagated down through multiple chunks, any <code><midpst></code> tags will get converted to <code><pst></code> and then converted again to <code><pri></code>. |
|||
It is also possible to explicitly convert a value, for example when doing comparisons: |
|||
1.object_agr>number |
|||
1.object_agr>person |
|||
These will clip <code>object_agr</code> and convert it to <code>number</code> and <code>person</code> immediately, regardless of where they are evaluated. |
|||
Like attribute category definitions, tag rewrite rules can refer to entire output categories by enclosing them in square brackets. Since the replacement must result in a single value, this can only be done on the source side. |
|||
pasts = farpst midpst nearpst; |
|||
tense = farpst midpst nearpst past pres fut; |
|||
! These are equivalent: |
|||
tense > tense : [pasts] past; |
|||
tense > tense : farpst past, midpst past, nearpst past; |
|||
=== Reduction Rules === |
|||
Reduction rules consist of a node type, an optional name, an optional weight, a pattern, an optional condition, an optional variable setting, and an output, in that order. |
|||
NP -> det n {2 _1 1}; |
|||
This matches a determiner followed by a noun, combines them into an NP chunk, and at output time produces "noun determiner". |
|||
NP -> 1: n {1} | |
|||
2: n.*.def {the@det.def.sg _ 1}; |
|||
Here the first rule will match any noun, while the second will match a noun with a <code><def></code> tag. Since the second rule has a higher weight, the first rule will not be applied if they both match. |
|||
NP -> NP and@cnjcoo NP [$number=pl] {1 _1 2 _2 3}; |
|||
Here the rule specifies that the resulting chunk will be marked with a <code><pl></code> tag. |
|||
AP -> adj and@cnjcoo adj ?(1.gender/sl = 3.gender/sl) {1 _1 2 _2 3}; |
|||
This rule will not apply if the two adjectives have different genders. |
|||
The arrow can be written as either <code>-></code> or <code>→</code>. |
|||
A name is written in double quotes before the weight. |
|||
DP -> "de > gen" NP de@pr NP { 3 + 's@gen _1 1 } ; |
|||
If the first lemma is quoted and no weight is present, it may be interpreted as a name. This issue can be dealt with by adding an explicit 0 weight. |
|||
! Don't write this: |
|||
VP -> "be# taller than"@v NP { 2 _1 1 } ; |
|||
! Write this instead: |
|||
VP -> 0: "be# taller than"@v NP { 2 _1 1 } ; |
|||
The process by which rules are selected is described [[User:Popcorndude/Recursive_Transfer/Parser | here]]. |
|||
==== Multiple Outputs ==== |
|||
A rule can have multiple outputs, but any non-chunk output cannot be conditioned. A rule with multiple outputs is useful for treating certain tokens as if they occurred in a different order. To write such a rule, list multiple nodes before the arrow and wrap multiple outputs in another set of curly braces (<code>{}</code>). |
|||
DP clitic -> det clitic NP { { 1[number=3.number] _1 3 } _2 2 } ; |
|||
DP -> det NP { 1[number=2.number] _1 2 } ; |
|||
Here the first rule is essentially equivalent to moving the clitic later in the input stream and then applying the second rule. |
|||
Multi-output rules are closely related to [[#Interpolation|interpolation]], which is described below. |
|||
=== Patterns === |
=== Patterns === |
||
Line 114: | Line 220: | ||
That is, treat the gender of the phrase as masculine unless both elements are feminine and the number as singular unless the conjunction is "or" and both elements are singular. |
That is, treat the gender of the phrase as masculine unless both elements are feminine and the number as singular unless the conjunction is "or" and both elements are singular. |
||
These values can also be conditioned, condensing the above rules to: |
|||
NP -> NP cnjcoo NP [$gender=(if (1.gender = f and 3.gender = f) f else m), |
|||
$number=(if (2.lem =cl "or" and 1.number = sg and 3.number = sg) sg else pl)] |
|||
{1 _1 2 _2 3} ; |
|||
The pattern only looks at the source language, but it is possible to add constraints: |
The pattern only looks at the source language, but it is possible to add constraints: |
||
Line 136: | Line 248: | ||
| <pre>.$x</pre> |
| <pre>.$x</pre> |
||
| When building the output chunk for this rule, the value of the <code>x</code> attribute should come from this element |
| When building the output chunk for this rule, the value of the <code>x</code> attribute should come from this element |
||
|- |
|||
| <pre>.*</pre> |
|||
| 0 or more arbitrary tags. Note: this contrasts with other places in the pipeline where <code>*</code> must match at least 1 tag. A final <code>.*</code> is automatically appended to every pattern |
|||
|} |
|} |
||
Line 153: | Line 268: | ||
! the first input element |
! the first input element |
||
1 |
1[gender=f] |
||
! the first input element with the gender tag <f> |
! the first input element with the gender tag <f> |
||
1 |
1[gender=2.gender/ref] |
||
! the first input element with the gender tag of the reference side of the second input element |
! the first input element with the gender tag of the reference side of the second input element |
||
1 |
1[gender=$gender] |
||
! the first input element with the gender tag set to a placeholder to be filled on output with the gender tag of its parent chunk |
! the first input element with the gender tag set to a placeholder to be filled on output with the gender tag of its parent chunk |
||
Line 166: | Line 281: | ||
These elements can be conjoined using +: |
These elements can be conjoined using +: |
||
1 |
1[gender=f] + 2 |
||
This will generate something like <code>^blah<n><f>+bloop<adj>$</code>. |
This will generate something like <code>^blah<n><f>+bloop<adj>$</code>. |
||
Conjoining is currently disallowed if one side is in an if statement and the other is not. It is thus also disallowed if the tag-order rule for either element is a macro. |
|||
By default, the order of the output tags is based on the output pattern corresponding to the part of speech tag in the pattern. However, it is possible to override this using square brackets: |
|||
By default, the order of the output tags is based on the output pattern corresponding to the part of speech tag in the pattern. However, it is possible to override this using parentheses: |
|||
vblex: _.tense.person.number; |
vblex: _.tense.person.number; |
||
Line 178: | Line 295: | ||
! result: ^whatever<vblex><inf><{person}><{number}>$ |
! result: ^whatever<vblex><inf><{person}><{number}>$ |
||
V -> vblex.inf {1 |
V -> vblex.inf {1(vbinf)}; |
||
! result: ^whatever<vblex><inf>$ |
! result: ^whatever<vblex><inf>$ |
||
Note that the part of speech tag of the output is in all cases the part of speech tag of the input. To avoid this behavior (for example, if you want to change the part of speech tag), write an output rule like the following: |
|||
adj: lemh.<adj>.number; |
|||
==== Literal Lexical Units ==== |
==== Literal Lexical Units ==== |
||
Line 198: | Line 311: | ||
the@det.def.[2.gender].[3.number/sl] |
the@det.def.[2.gender].[3.number/sl] |
||
Literal lexical units can also be constructed via the same syntax as matched elements, but with a lemma rather than a number. |
|||
the(det)[gender=2.gender, number=2.number] |
|||
When constructed in this way, the tag-order specification is mandatory. |
|||
==== Output Conditionals ==== |
==== Output Conditionals ==== |
||
Line 301: | Line 420: | ||
((1.number = du) and (1.lem/tl in_caseless footwear)) |
((1.number = du) and (1.lem/tl in_caseless footwear)) |
||
! note that "in-case-less", "incl", "IN-cl", and "__IN_CASE_LESS__" would all also work here. |
! note that "in-case-less", "incl", "IN-cl", and "__IN_CASE_LESS__" would all also work here. |
||
=== Tag Rewrite Rules === |
|||
This is a way to convert certain sets of tags, either between two languages that have different sets of tenses, or between something like object agreement and number marking. |
|||
object_agr = o1sg o1pl o2sg o2pl o3sg o3pl ; |
|||
number = sg pl ; |
|||
person = p1 p2 p3 ; |
|||
object_agr > person: o1sg p1, o1pl p1, o2sg p2, o2pl p2, o3sg p3, o3pl p3 ; |
|||
object_agr > number: o1sg sg, o1pl pl, o2sg sg, o2pl pl, o3sg sg, o3pl pl ; |
|||
VP -> @v NP {2(number=1.object_agr) _1 1} ; |
|||
In this example, if the verb had <code><o2sg></code>, it would be converted to <code><sg></code> when it was set as the <code>number</code> attribute of the noun. |
|||
tense = farpst nearpst pst prs fut nonpst ; |
|||
tense > tense: farpst pst, nearpst pst, prs nonpst, fut nonpst ; |
|||
In this example, no explicit assignment needs to take place and the 4 tenses of the source language (<code>farpst, nearpst, prs, fut</code>) would be automatically converted to the 2 of the target language (<code>pst, nonpst</code>). |
|||
Converting from 4 to 3 with something like <pre>tense > tense: farpst pst, nearpst pst ;</pre> |
|||
will also work, the unchanged tags not needing to be explicitly mentioned. |
|||
When an attribute category is being mapped to itself, such as in the tense example above, the replacement is always performed. As a result, if a tag appears on the left side of a change and the right side of another, the results may be incorrect. For example: |
|||
tense > tense: midpst pst, pst pri; |
|||
This rule might convert <code><midpst></code> to either <code><pst></code> or <code><pri></code> in different situations. |
|||
However, when a rule maps between different categories, as in the object agreement example, the transformation will not happen invisibly. That is, if you have <code>1</code> in the output, a <code>tense > tense</code> conversion will happen, but a <code>object_agr > number</code> one won't. This is because the compiler does not have enough information to know what attributes that node has which can be clipped and thus does not know what it is converting from. |
|||
In order for this to be fully automatic, the <code>number</code> element in the relevant output pattern would have to compile to something which checked <code>number</code> and then every attribute that could map to <code>number</code> until it found one. While this behavior could be added if desired, I initially deemed it too complicated and simply required that in such situations the rule author has to write <code>1(number=1.object_agr)</code> to trigger the <code>object_agr > number</code> conversion. |
|||
It is also possible to explicitly convert a value, for example when doing comparisons: |
|||
1.object_agr>number |
|||
1.object_agr>person |
|||
Like attribute category definitions, tag rewrite rules can refer to entire output categories by enclosing them in square brackets. Since the replacement must result in a single value, this can only be done on the source side. |
|||
pasts = farpst midpst nearpst; |
|||
tense = farpst midpst nearpst past pres fut; |
|||
! These are equivalent: |
|||
tense > tense : [pasts] past; |
|||
tense > tense : farpst past, midpst past, nearpst past; |
|||
=== Macros === |
=== Macros === |
||
Line 360: | Line 431: | ||
det: (if (1.det_type = dem) |
det: (if (1.det_type = dem) |
||
1 |
1(det_dem) |
||
else |
else |
||
1 |
1(def_def) |
||
); |
); |
||
Here we define a "det" pattern which will apply the "det_dem" pattern or the "det_def" pattern to its argument based on whether that argument has a <code><dem></code> tag. Since this is a tag order pattern it will be applied to all <code><det></code>s by default and can also be manually applied to other things with the <code>3 |
Here we define a "det" pattern which will apply the "det_dem" pattern or the "det_def" pattern to its argument based on whether that argument has a <code><dem></code> tag. Since this is a tag order pattern it will be applied to all <code><det></code>s by default and can also be manually applied to other things with the <code>3(det)</code> syntax. |
||
Macros are only allowed to clip from the input node (referred to as <code>1</code>), including any values passed in. |
Macros are only allowed to clip from the input node (referred to as <code>1</code>), including any values passed in. |
||
If a macro specifies a value for an attribute, it will override anything that is passed in. Thus if the above example had <code>1 |
If a macro specifies a value for an attribute, it will override anything that is passed in. Thus if the above example had <code>1(det_dem)[distance=prx]</code> rather than <code>1(det_dem)</code>, invoking it as <code>2(det)[distance=dist]</code> and as <code>2(det)[distance=med]</code> would make no difference and the output would be have <code><prx></code> regardless. |
||
A macro is not required to function as a tag order pattern and may output anything or nothing, so long as it only accesses attributes of the input node. |
A macro is not required to function as a tag order pattern and may output anything or nothing, so long as it only accesses attributes of the input node. |
||
Line 380: | Line 451: | ||
[a@det.ind.sp _] |
[a@det.ind.sp _] |
||
else [] ); |
else [] ); |
||
DP -> n { * |
DP -> n { *(maybe_det)[number=1.number, definite=$definite] 1 }; |
||
This will insert "the" if the DP is definite, "a" if it's indefinite and singular, and will output only the noun otherwise. |
This will insert "the" if the DP is definite, "a" if it's indefinite and singular, and will output only the noun otherwise. |
||
Line 386: | Line 457: | ||
Since a macro needs to be contained in a conditional but not all macros are conditional, they permit the keyword <code>always</code> in addition to <code>if</code>, <code>else</code>, etc. |
Since a macro needs to be contained in a conditional but not all macros are conditional, they permit the keyword <code>always</code> in addition to <code>if</code>, <code>else</code>, etc. |
||
vaux: (always 1 |
vaux: (always 1(vblex)); |
||
This macro is essentially an alias of <code>vblex</code>. |
This macro is essentially an alias of <code>vblex</code>. |
||
=== Interpolation |
=== Interpolation === |
||
Sometimes it is necessary to insert words into existing nodes, such as when generating certain clitics. |
|||
Parsing clitics, such as [[User_talk:Popcorndude/Recursive_Transfer#Serbo-Croatian_clitics]] can be done using multiple output units |
|||
NP -> adj n { 2 _1 1 } ; |
|||
DP -> det NP (if ($lu-count = "2") |
|||
{ 1 _1 2 } |
|||
! should be able to handle "noun clitic determiner" |
|||
else |
|||
{ 1 _ >3 _1 2 } ) ; |
|||
VP -> DP v.pprs { 1 < be(vaux) _1 2 } ; |
|||
These rules represent a scenario where target language present progressive is marked with a clitic which is placed between a determiner and noun phrase. At the <code>VP</code> level, the clitic is created and inserted into the <code>DP</code> with a less-than sign (<code><</code>). Then at the <code>DP</code> level, the rule checks whether anything has been inserted by checking whether the value of <code>$lu-count</code> is 2, which it would be if nothing had been inserted. If <code>$lu-count</code> is not 2, then the inserted item is output in the appropriate place. |
|||
Outputting them, however, is more difficult. My current idea is to do something like this: |
|||
The inserted value is referred to as <code>>3</code> rather than <code>3</code> to tell the compiler that it is an inserted value so as to prevent error messages about trying to access a node that doesn't exist. |
|||
NP -> @det @n {2 _1 1}; |
|||
VP -> NP @vbser {(_1 2)>1}; |
|||
Some possible input and output from these rules (written monolingually for simplicity): |
|||
Where <code>(_1 2)>1</code> means "put the space between the elements and element 2 after the first word of element 1". The corresponding syntax for a right-aligned clitic would be <code>1<(2 _1)</code>. New lexical units could also be put in the parentheses (even if there's only one thing being inserted, the parentheses should, I think, be mandatory for clarity). |
|||
^the<det>$ ^green<adj>$ ^frog<n>$ ^speak<v><pprs>$ |
|||
I'm not sure whether this will cover all cases, but it should at least cover a lot of them. |
|||
the NP[green frog] speak |
|||
DP[the NP[green frog]] speak |
|||
VP[DP[the NP[green frog]] speak] |
|||
DP[the NP[green frog] be] speak |
|||
the be NP[green frog] speak |
|||
the be frog green speak |
|||
^the<det>$ ^be<vaux>$ ^frog<n>$ ^green<adj>$ ^speak<v>$ |
|||
This is functions as the inverse of [[#Multiple_Outputs|rules with multiple outputs]]. The reverse of the above rules could be something like this: |
|||
NP -> n adj { 2 _1 1 } ; |
|||
DP -> det NP { 1 _1 2 } ; |
|||
DP vaux -> det vaux NP { { 1 _1 3 } _2 2 } ; |
|||
VP -> DP vaux v { 1 _1 3[tense=pprs] } ; |
|||
Multi-output rules take things inside and moves them out, while interpolation takes things outside and moves them in. |
|||
=== Global Variables === |
|||
For passing nodes up and down a tree, an alternative to multi-output rules and interpolation is global variables. Global variables are referred to with double dollar signs and are set in the attribute literal section of a rule. |
|||
VP -> %vblex DP.$itg [$$wh_word=(if (2.itg = itg) 2)] { 1 (if (2.itg not = itg) [ _1 2 ]) } ; |
|||
The value of this variable can then be included in the output step of any rule. |
|||
S -> DP.nom VP { (if (2.itg = itg) [ $$wh_word _ ] ) 1 _1 2 } ; |
|||
If a rule attempts to output an unset variable, the result will be no output. All variables are reset at the end of the output step. |
|||
=== Clips === |
|||
When clipping a tag or lemma from a lexical unit in the input stream, <code>/sl</code> refers to source language, <code>/tl</code> refers to target language, and <code>/ref</code> refers to the output of apertium-anaphora. Chunks, meanwhile, have only 1 side, which is <code>/tl</code>. If the side is left unspecified, then <code>/tl</code> will be clipped. However, if an LU is being clipped from and the value for <code>/tl</code> is empty or is the unspecified value for that attribute category, it will try again with <code>/ref</code> and then with <code>/sl</code>. |
|||
Almost anywhere that a clip or a literal value is used as a value, it can be replaced with an if statement using the same syntax as output conditionals and macros. |
|||
VP -> v vaux [$negative=(if (1.negative = neg or 2.negative = neg) neg else pos)] |
|||
{ 2 _1 1[tense=(if (2.lem in verb_ing) pprs) else inf)] } ; |
|||
The one exception is embedding an if statement inside a conditional: <code>(x in (if ...))</code>. |
|||
=== Brackets === |
|||
A summary of which means what where: |
|||
{| class="wikitable" |
|||
|- |
|||
! Bracket |
|||
! General Meaning |
|||
! Uses |
|||
! Examples |
|||
! Comments |
|||
|- |
|||
|rowspan="6"| <code>()</code> |
|||
|rowspan="6"| Condition |
|||
| If statement |
|||
| <code>(if ...)</code> |
|||
| |
|||
|- |
|||
| Condition |
|||
| <code>(a in b)</code> |
|||
| in an if statement |
|||
|- |
|||
| Pattern condition |
|||
| <code>NP NP ?(1.case = 2.case)</code> |
|||
| |
|||
|- |
|||
| Pattern override |
|||
| <code>1(vb_impers)</code> |
|||
| use the vb_impers output rule rather than the output rule chosen based on the pattern |
|||
|- |
|||
| Macro invocation |
|||
| <code>*(maybe_det)</code> |
|||
| |
|||
|- |
|||
| Attribute defaults |
|||
| <code>gender = (GD m) m f mf GD;</code> |
|||
| |
|||
|- |
|||
|rowspan="6"| <code>[]</code> |
|||
|rowspan="6"| List |
|||
| Rule variable setting |
|||
| <code>[$tense=past, $number=sg]</code> |
|||
| |
|||
|- |
|||
| Node variable setting |
|||
| <code>1[tense=2.tense, number=2.number]</code> |
|||
| |
|||
|- |
|||
| Set inclusion |
|||
| <code>tense = [finite] inf ger;</code> |
|||
| tense is composed of <inf>, <ger> and everything in finite |
|||
|- |
|||
| Group tag rewriting |
|||
| <code>poss > number : [poss_sg] sg, [poss_pl] pl;</code> |
|||
| |
|||
|- |
|||
| Grouping in if statements |
|||
| <code>(if (whatever) [1 _ 2])</code> |
|||
| |
|||
|- |
|||
| Pattern tag sets |
|||
| <code>NP.*.[case_not_nomacc]</code> |
|||
| |
|||
|- |
|||
|rowspan="2"| <code>{}</code> |
|||
|rowspan="2"| Chunk |
|||
| Output |
|||
| <code>NP -> n { 1 } ;</code> |
|||
| |
|||
|- |
|||
| Chunk |
|||
| <code>NP clitic -> det clitic NP { { 1 _ 3 } _ 2 } ;</code> |
|||
| |
|||
|} |
|||
== Correspondence with t*x == |
== Correspondence with t*x == |
||
Line 419: | Line 609: | ||
NP -> adj n.$number ?(1.number = 2.number) |
NP -> adj n.$number ?(1.number = 2.number) |
||
(if (1.lem/tl incl pre_adj) |
(if (1.lem/tl incl pre_adj) |
||
{1 |
{1[gender=2.gender] _1 2} |
||
else |
else |
||
{2 _1 1 |
{2 _1 1[gender=2.gender]} |
||
) ; |
) ; |
||
Line 630: | Line 820: | ||
| |
| |
||
|- |
|- |
||
| <pre>1 |
| <pre>1[gender=2.gender]</pre> |
||
| <pre> |
| <pre> |
||
<lu> |
<lu> |
||
Line 650: | Line 840: | ||
|} |
|} |
||
Technically this would compile to a rule which |
Technically this would compile to a rule which outputs an <code>NP</code> chunk containing the input unchanged and also a separate postchunk rule that would do the actual rearranging so that the conditionals can depend on changed values of the chunk tags. |
||
== Example == |
|||
A version of this example with pictures can be found at https://github.com/apertium/apertium-recursive/blob/master/docs/Hobbit_Example.pdf |
|||
=== Initial Sentence === |
|||
In a hole in the ground there lived a Hobbit. |
|||
=== Output of eng-spa-lex === |
|||
^In<pr>/En<pr>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^hole<n><sg>/agujero<n><m><sg>$ ^in<pr>/en<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^ground<n><sg>/tierra<n><f><sg>$ ^there<adv>/allí<adv>$ ^live<vblex><past>/vivir<vblex><past>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^Hobbit<n><sg>/Hobbit<n><m><sg>$^.<sent>/.<sent>$^.<sent>/.<sent>$ |
|||
=== A Simple Set of Rules === |
|||
gender = m f; |
|||
number = (ND sg) sg pl ND; |
|||
definite = def ind; |
|||
tense = past pres ifi; |
|||
person = (PD p3) p1 p2 p3 PD; |
|||
tense > tense : past ifi; |
|||
n: _.gender.number; |
|||
det: _.definite.gender.number; |
|||
pr: _; |
|||
vblex: _.tense.person.number; |
|||
adv: _; |
|||
NP: _.gender.number; |
|||
DP: _.gender.number; |
|||
PP: _; |
|||
VP: _.tense.person.number; |
|||
NP -> %n { 1 } | |
|||
10: %n PP { 1 _1 2 } ; |
|||
PP -> pr DP { 1 _1 2 } ; |
|||
DP -> det %NP { 1(gender=2.gender, number=2.number) _1 2 } ; |
|||
VP -> %vblex DP { 1(tense=$tense, person=$person, number=$number) _1 2 } | |
|||
adv %VP (if (1.lem/sl = there) |
|||
{ %2 } |
|||
else |
|||
{ 1 _1 %2 } ) | |
|||
PP %VP { 1 _1 %2 } ; |
|||
=== Process === |
|||
{| class="wikitable" |
|||
|- |
|||
! Action |
|||
! Result |
|||
! Comments |
|||
|- |
|||
| Read token |
|||
| <ol> |
|||
<li>^In<pr>/En<pr>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li></ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol> |
|||
| |
|||
|- |
|||
| Split |
|||
| <ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol> |
|||
<hr> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</pre></li></ol> |
|||
| Rule 1 (<code>NP -> n</code>) could apply, but it's possible that reading more of the input would make it so rule 2 (<code>NP -> n PP</code>) could apply, so we do both. |
|||
|- |
|||
| Apply rule 1 (<code>NP -> n</code>) in the first branch |
|||
| <ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$</li></ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol> |
|||
| Since the rule says <code>%n</code>, the required NP tags (gender and number) are filled in with the values of the noun tags. |
|||
|- |
|||
| Apply rule 4 (<code>DP -> det NP</code>) in the first branch |
|||
| <ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$</li></ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol> |
|||
| Note that the determiner still has GD as it's gender. Child tags are not modified until the output step. |
|||
|- |
|||
| Apply rule 3 (<code>PP -> pr DP</code>) in the first branch |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li></ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
</ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^the<det><def><sp>/el<det><def><GD><ND>$</li> |
|||
</ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^the<det><def><sp>/el<det><def><GD><ND>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^the<det><def><sp>/el<det><def><GD><ND>$</li> |
|||
<li>^ground<n><sg>/tierra<n><f><sg>$</li> |
|||
</ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^the<det><def><sp>/el<det><def><GD><ND>$</li> |
|||
<li>^ground<n><sg>/tierra<n><f><sg>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 1 (<code>NP -> n</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^the<det><def><sp>/el<det><def><GD><ND>$</li> |
|||
<li>^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^the<det><def><sp>/el<det><def><GD><ND>$</li> |
|||
<li>^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$</li> |
|||
</ol> |
|||
| This time the next word is an adverb, rather than a preposition, so no splitting occurs and the rule is applied in each branch. |
|||
|- |
|||
| Apply rule 4 (<code>DP -> det NP</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li> |
|||
<li>^in<pr>/en<pr>$</li> |
|||
<li>^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 3 (<code>PP -> pr DP</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^hole<n><sg>/agujero<n><m><sg>$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 2 (<code>NP -> n PP</code>) in the second branch |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| Note that rule 2 has a weight attached to it, so now the second branch is weighted. |
|||
|- |
|||
| Apply rule 4 (<code>DP -> det NP</code>) in the second branch |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^In<pr>/En<pr>$</li> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 3 (<code>PP -> pr DP</code>) in the second branch |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^Hobbit<n><sg>/Hobbit<n><m><sg>$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^Hobbit<n><sg>/Hobbit<n><m><sg>$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 1 (<code>NP -> n</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li> |
|||
<li>^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 4 (<code>DP -> det NP</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^live<vblex><past>/vivir<vblex><past>$</li> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 5 (<code>VP -> vblex DP</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^there<adv>/allí<adv>$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$</li> |
|||
</ol> |
|||
| VP wants tense, person, and number tags. The verb supplies tense, but it doesn't have person or number tags, so the defaults are used instead. |
|||
|- |
|||
| Apply rule 6 (<code>VP -> adv VP</code>) in both branches |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 7 (<code>VP -> PP VP</code>) in the first branch |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$ ^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 7 (<code>VP -> PP VP</code>) in the first branch |
|||
| <ol><li>^unknown<VP><past><PD><ND>{^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$ ^unknown<VP><past><PD><ND>{^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$ ^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply rule 7 (<code>VP -> PP VP</code>) in the second branch |
|||
| <ol><li>^unknown<VP><past><PD><ND>{^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$ ^unknown<VP><past><PD><ND>{^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$ ^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$}$}$</li> |
|||
</ol> |
|||
<hr/> |
|||
Weight: 10 |
|||
<ol><li>^unknown<VP><past><PD><ND>{^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$ ^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Prune branches |
|||
| Weight: 10 |
|||
<ol><li>^unknown<VP><past><PD><ND>{^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$ ^unknown<VP><past><PD><ND>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$}$</li> |
|||
</ol> |
|||
| No rules begin with VP, so it's time to output. Both rules have the same number of trees (1), but the second one has higher weight (10), so the first one gets discarded and we output the second one. |
|||
|- |
|||
| Apply output side of rule 7 (<code>VP -> PP VP</code>) |
|||
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| At output, the unspecified tags PD and ND are replaced with the defaults p3 and sg. |
|||
|- |
|||
| Apply output side of rule 3 (<code>PP -> pr DP</code>) |
|||
| <ol><li>^En<pr>$</li> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| The preposition wasn't built by a rule, so we just write it to the output stream. |
|||
|- |
|||
| Apply output side of rule 4 (<code>DP -> det NP</code>) |
|||
| <ol> |
|||
<li>^uno<det><ind><m><sg>$</li> |
|||
<li>^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| Here the gender and the number of NP are copied to the determiner. |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$ ^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 2 (<code>NP -> n PP</code>) |
|||
| <ol> |
|||
<li>^agujero<n><m><sg>$</li> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<PP>{^in<pr>/en<pr>$ ^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 3 (<code>PP -> pr DP</code>) |
|||
| <ol> |
|||
<li>^en<pr>$</li> |
|||
<li>^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<DP><f><sg>{^the<det><def><sp>/el<det><def><GD><ND>$ ^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 4 (<code>DP -> det NP</code>) |
|||
| <ol> |
|||
<li>^el<det><def><f><sg>$</li> |
|||
<li>^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| Once again we copy the gender and number of the NP to the determiner. |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<NP><f><sg>{^ground<n><sg>/tierra<n><f><sg>$}$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 1 (<code>NP -> n</code>) |
|||
| <ol> |
|||
<li>^tierra<n><f><sg>$</li> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<VP><past><p3><sg>{^there<adv>/allí<adv>$ ^unknown<VP><past><PD><ND>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 6 (<code>VP -> adv VP</code>) |
|||
| <ol> |
|||
<li>^unknown<VP><past><p3><sg>{^live<vblex><past>/vivir<vblex><past>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$}$</li> |
|||
</ol> |
|||
| Since the source language lemma of the adverb is "there", we take the first clause of the if statement and only output the VP, which takes all its tags from the parent chunk. |
|||
|- |
|||
| Apply output side of rule 5 (<code>VP -> vblex DP</code>) |
|||
| <ol> |
|||
<li>^vivir<vblex><past><p3><sg>$</li> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$</li> |
|||
</ol> |
|||
| As with the previous line, the verb gets all its tags from the parent chunk, but in this rule we've explicitly listed them. |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 4 (<code>DP -> det NP</code>) |
|||
| <ol> |
|||
<li>^uno<det><ind><m><sg>$</li> |
|||
<li>^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$</li> |
|||
</li> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| <ol> |
|||
<li>^unknown<NP><m><sg>{^Hobbit<n><sg>/Hobbit<n><m><sg>$}$</li> |
|||
</ol> |
|||
| |
|||
|- |
|||
| Apply output side of rule 1 (<code>NP -> n</code>) |
|||
| <ol><li>^Hobbit<n><m><sg>$</li></ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^.<sent>/.<sent>$</li></ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| |
|||
| No rules apply to punctuation in this example, so we just immediately output it when we see it. |
|||
| |
|||
|- |
|||
| Read token |
|||
| <ol><li>^.<sent>/.<sent>$</li></ol> |
|||
| |
|||
|- |
|||
| Output first word |
|||
| |
|||
| |
|||
|} |
|||
=== Output of Transfer === |
|||
^En<pr>$ ^uno<det><ind><m><sg>$ ^agujero<n><m><sg>$ ^en<pr>$ ^el<det><def><f><sg>$ ^tierra<n><f><sg>$ ^vivir<vblex><ifi><p3><sg>$ ^uno<det><ind><m><sg>$ ^Hobbit<n><m><sg>$^.<sent>$^.<sent>$ |
|||
=== Overall Output === |
|||
En un agujero en la tierra vivió un Hobbit. |
Revision as of 20:19, 27 August 2019
Contents
File Structure
A .rtx
contains attribute categories, tag-order rules, tag-rewrite rules, and reduction rules.
Comments begin with exclamation points (!
) and end at the end of the line. To include spaces in any name, either escape the space with a backslash (\
) or enclose the name in double quotes ("
).
Attribute Categories
An attribute category can be defined like this:
gender = m f GD ; number = sg pl ND ;
An attribute category can also specify undefined and default values:
gender = (GD m) m f GD;
This defines the gender
category as before, but with the addition that if any rule tries to read the gender of a node that doesn't have a gender tag, the result will be <GD>
rather than the empty string. It also states that any remaining <GD>
tags will be replaced with <m>
tags in the output step.
An attribute category can also specify certain values as non-overwritable.
gender = m f @mf;
This states that if a lexical unit has a target-language <mf>
tag and a rule tries to replace that tag with something else, the <mf>
tag will be used instead of the replacement.
An attribute category can include another:
definite = def ind; ! The following are equivalent: det_type = dem [definite] pos; det_type = dem def ind pos;
The name of an attribute category cannot be any of the following:
Name | Meaning |
---|---|
lem
|
The lemma of an LU or chunk |
lemh and lemq
|
The first and second parts of a multiword with inner inflection. See Multiwords, where lemh corresponds to the inflected portion and lemq is the portion in <g> |
tags
|
All of the tags of an LU or chunk |
pos_tag
|
The first tag or an LU or chunk |
whole , chname , chcontent , and content
|
Internal names, some of which are included primarily for compatibility with t*x rules. Do not use these directly |
Tag Order
The order of tags for each type of node must be defined like this:
n: _.gender.number; adj: _.gender; NP: _.number;
Where _
represents the part of speech tag. If the part of speech tag is different between the source and target languages, the target language one will be used. The lemma head is automatically appended at the beginning of the pattern and the lemma queue is automatically attached to the end.
To specify a literal tag in a pattern, put it in angle brackets:
det: _.<def>.number;
Which tag order to use is determined solely by the first tag in the in the pattern of the reduction rule. See the output section below for how to override this choice. See also the macro section for a more powerful version of these rules.
The underscore is not mandatory and is merely a shorthand for pos_tag
. Thus a literal part of speech tag can be used instead.
Tag Rewrite Rules
This is a way to convert certain sets of tags, either between two languages that have different sets of tenses, or between something like object agreement and number marking.
object_agr = o1sg o1pl o2sg o2pl o3sg o3pl ; number = sg pl ; person = p1 p2 p3 ; object_agr > person: o1sg p1, o1pl p1, o2sg p2, o2pl p2, o3sg p3, o3pl p3 ; object_agr > number: o1sg sg, o1pl pl, o2sg sg, o2pl pl, o3sg sg, o3pl pl ; VP -> @v NP {2[number=1.object_agr] _1 1} ;
In this example, if the verb had <o2sg>
, it would be converted to <sg>
when it was set as the number
attribute of the noun.
tense = farpst nearpst pst prs fut nonpst ; tense > tense: farpst pst, nearpst pst, prs nonpst, fut nonpst ;
In this example, no explicit assignment needs to take place and the 4 tenses of the source language (farpst, nearpst, prs, fut
) would be automatically converted to the 2 of the target language (pst, nonpst
).
Converting from 4 to 3 with something like
tense > tense: farpst pst, nearpst pst ;
will also work, the unchanged tags not needing to be explicitly mentioned.
Tags rewrite rules apply in the output step. When building the parse tree, only unconverted tags are used to create chunks. This makes conversions like the following one potentially dangerous:
tense > tense: midpst pst, pst pri;
If tense is being propagated down through multiple chunks, any <midpst>
tags will get converted to <pst>
and then converted again to <pri>
.
It is also possible to explicitly convert a value, for example when doing comparisons:
1.object_agr>number 1.object_agr>person
These will clip object_agr
and convert it to number
and person
immediately, regardless of where they are evaluated.
Like attribute category definitions, tag rewrite rules can refer to entire output categories by enclosing them in square brackets. Since the replacement must result in a single value, this can only be done on the source side.
pasts = farpst midpst nearpst; tense = farpst midpst nearpst past pres fut; ! These are equivalent: tense > tense : [pasts] past; tense > tense : farpst past, midpst past, nearpst past;
Reduction Rules
Reduction rules consist of a node type, an optional name, an optional weight, a pattern, an optional condition, an optional variable setting, and an output, in that order.
NP -> det n {2 _1 1};
This matches a determiner followed by a noun, combines them into an NP chunk, and at output time produces "noun determiner".
NP -> 1: n {1} | 2: n.*.def {the@det.def.sg _ 1};
Here the first rule will match any noun, while the second will match a noun with a <def>
tag. Since the second rule has a higher weight, the first rule will not be applied if they both match.
NP -> NP and@cnjcoo NP [$number=pl] {1 _1 2 _2 3};
Here the rule specifies that the resulting chunk will be marked with a <pl>
tag.
AP -> adj and@cnjcoo adj ?(1.gender/sl = 3.gender/sl) {1 _1 2 _2 3};
This rule will not apply if the two adjectives have different genders.
The arrow can be written as either ->
or →
.
A name is written in double quotes before the weight.
DP -> "de > gen" NP de@pr NP { 3 + 's@gen _1 1 } ;
If the first lemma is quoted and no weight is present, it may be interpreted as a name. This issue can be dealt with by adding an explicit 0 weight.
! Don't write this: VP -> "be# taller than"@v NP { 2 _1 1 } ; ! Write this instead: VP -> 0: "be# taller than"@v NP { 2 _1 1 } ;
The process by which rules are selected is described here.
Multiple Outputs
A rule can have multiple outputs, but any non-chunk output cannot be conditioned. A rule with multiple outputs is useful for treating certain tokens as if they occurred in a different order. To write such a rule, list multiple nodes before the arrow and wrap multiple outputs in another set of curly braces ({}
).
DP clitic -> det clitic NP { { 1[number=3.number] _1 3 } _2 2 } ; DP -> det NP { 1[number=2.number] _1 2 } ;
Here the first rule is essentially equivalent to moving the clitic later in the input stream and then applying the second rule.
Multi-output rules are closely related to interpolation, which is described below.
Patterns
An element of a pattern must match a single, literal part of speech tag. In order to match multiple part of speech tags, create a separate rule which matches each of them:
NOM -> n {1} | np {1};
To match a lemma or pseudolemma, place it before the part of speech tag, separated by @
:
NP -> the@det n {2 _1 1};
It is also possible to match a category of lemmas:
days = sunday monday tuesday wednesday thursday friday saturday; date -> $days@n the@det num.ord {2 _2 3 _1 1};
Tags besides part of speech can be matched like this:
VP -> vbser vblex.pp {1 _1 2};
To match a set of tags, enclose the category name in square brackets:
non_finite = pp ger; VP -> vbser vblex.[non_finite] {1 _1 2};
Pattern elements can also specify values for the tags of the chunk being output by the rule.
number = (ND sg) sg pl sp ND; NP: _.number; NP -> n.$number adj {1};
This rule specifies that the number tag of the NP chunk should be copied from the noun. It will use the target language side if that is available. If not, it will proceed to the reference side, and then the source side. If all three of these are empty, it will use the default value <ND>
. To require that a particular variable be taken from a particular side, put the side after a slash:
NP: number; NP -> det.$number/ref n {1 _1 2};
/sl
refers to the source language, /tl
to the target language, and /ref
to anything added by anaphora resolution.
If a pattern element is contributing several tags to the chunk, the following shortcut is available:
NP: _.number.gender; NP -> %n adj {2 _1 1};
The %
indicates the noun is the source of all chunk tags not elsewhere specified.
To specify a literal value for a chunk tag, put it in square brackets after the pattern like this:
NP: _.gender.number; NP -> 0: NP cnjcoo NP [$gender=m, $number=pl] {1 _1 2 _2 3} | 1: NP.f cnjcoo NP.f [$gender=f, $number=pl] {1 _1 2 _2 3} | 2: NP.*.sg or@cnjcoo NP.*.sg [$gender=m, $number=sg] {1 _1 2 _2 3} | 3: NP.f.sg or@cnjcoo NP.f.sg [$gender=f, $number=sg] {1 _1 2 _2 3} ;
That is, treat the gender of the phrase as masculine unless both elements are feminine and the number as singular unless the conjunction is "or" and both elements are singular.
These values can also be conditioned, condensing the above rules to:
NP -> NP cnjcoo NP [$gender=(if (1.gender = f and 3.gender = f) f else m), $number=(if (2.lem =cl "or" and 1.number = sg and 3.number = sg) sg else pl)] {1 _1 2 _2 3} ;
The pattern only looks at the source language, but it is possible to add constraints:
conj_list = and or; NP: _.gender.number; NP -> %NP cnjcoo NP ?((2.lem/tl in conj_list) and ~(3.gender = 1.gender)) {1 _1 2 _2 3};
This will only match the pattern if it is also the case that the target language lemma of the conjunction is "and" or "or" and the two NPs have different genders. See below for the syntax of conditions.
Piece of pattern | Meaning |
---|---|
.x |
A literal tag <x>
|
.[x] |
Any tag in the category x
|
.$x |
When building the output chunk for this rule, the value of the x attribute should come from this element
|
.* |
0 or more arbitrary tags. Note: this contrasts with other places in the pipeline where * must match at least 1 tag. A final .* is automatically appended to every pattern
|
Outputs
Output elements are written between curly braces and may be any of the following:
Blanks
An underscore represents a single space. An underscore followed by a number represents the superblank after that position, so 1 _ 2
is elements 1 and 2 separated by a space while 1 _1 2
is elements 1 and 2 separated by whatever separated them in the input.
Matched Elements
A number represents the input element in that position with its tags arranged according to the defined output pattern for its part of speech tag. It can be followed by a specification of where those tags should come from.
1 ! the first input element 1[gender=f] ! the first input element with the gender tag <f> 1[gender=2.gender/ref] ! the first input element with the gender tag of the reference side of the second input element 1[gender=$gender] ! the first input element with the gender tag set to a placeholder to be filled on output with the gender tag of its parent chunk
These elements can also be prefixed with %
to specify that as many tags as possible should be placeholders for tags of the parent chunk.
These elements can be conjoined using +:
1[gender=f] + 2
This will generate something like ^blah<n><f>+bloop<adj>$
.
Conjoining is currently disallowed if one side is in an if statement and the other is not. It is thus also disallowed if the tag-order rule for either element is a macro.
By default, the order of the output tags is based on the output pattern corresponding to the part of speech tag in the pattern. However, it is possible to override this using parentheses:
vblex: _.tense.person.number; vbinf: _.<inf>; V -> vblex.inf {1}; ! result: ^whatever<vblex><inf><{person}><{number}>$ V -> vblex.inf {1(vbinf)}; ! result: ^whatever<vblex><inf>$
Literal Lexical Units
A new lexical unit can be inserted like this:
the@det.def.mf.sp
Placeholders can be included using $
:
the@det.def.$gender.sp
And clips from other elements can be placed in square brackets:
the@det.def.[2.gender].[3.number/sl]
Literal lexical units can also be constructed via the same syntax as matched elements, but with a lemma rather than a number.
the(det)[gender=2.gender, number=2.number]
When constructed in this way, the tag-order specification is mandatory.
Output Conditionals
An output conditional evaluates a sequence of conditions and outputs the element corresponding to the first one that evaluates to true. The element to be output can be any of the possibilities listed above, the entire chunk, or another conditional.
NP -> NP cnjcoo NP (if (2.lem/sl = and) { 1 _1 3 } else { 1 _1 2 _2 3 } );
Here the rule determines what the final output will be based on the lemma of the conjunction.
PP -> DP ?(1.case in might_get_pr) (if (1.prep_flag = none) { 1 } else { (if (1.prep_flag = to) to@pr else-if (1.prep_flag = at) at@pr else-if (1.prep_flag = in) in@pr else-if (1.prep_flag = on) on@pr else for@pr ) _ 1 } );
Here the rule determines first whether to add a preposition. If it is going to add a preposition, it creates a chunk and within that chunk, has another if statement to determine which preposition to add.
The first clause is labeled "if", the last can be "else" or "otherwise", and intermediate ones can be "if", "else-if", or "elif". These labels follow the same rules as logical operators - that is, capitalization, "-", and "_" are all ignored.
For the output of an if statement to have multiple elements, surround those elements with square brackets. Thus the conjunction rule above can be rewritten as follows:
NP -> NP cnjcoo NP { 1 _1 (if (2.lem/sl = and) [ 2 _2 ] else [] ) 3 };
Conditions
Conditions are written in parentheses. A condition is a value, an operator, and another value. If the operator is "and" or "or" these values are other conditions, otherwise they are clips or strings. A condition can be negated by writing "not" before the operator.
(1.case = 2.case) ! true if the first and second elements have the same case, otherwise false (1.case not = 2.case) ! the reverse of the previous line
The full list of operations is as follows:
Name | Description | Alternate Spellings |
---|---|---|
And | Evaluates to true if both arguments evaluates to true, otherwise false | & |
Or | Evaluates to true if either argument evaluates to true, otherwise false | | |
Equal | Evaluates to true if the arguments are identical strings | = |
IsPrefix | Evaluates to true if the right argument occurs at the beginning of the left argument | StartsWith, BeginsWith |
IsSuffix | Evaluates to true if the right argument occurs at the end of the left argument | EndsWith |
IsSubstring | Evaluates to true if the right argument occurs anywhere in the left argument | Contains |
HasPrefix | Evaluates to true if the left argument begins with anything in the list named by the right argument | StartsWithList, BeginsWithList |
HasSuffix | Evaluates to true if the left argument ends with anything in the list named by the right argument | EndsWithList |
In | Evaluates to true if the left argument is a member of the list named by the right argument | ∈ |
Any of these operators (besides And and Or) can be made to ignore case by adding one of "cl", "caseless", "fold", "foldcase".
maybe_get_pr = dat obj; (1.case in maybe_get_pr) footwear = boot sock shoe sandal; ((1.number = du) and (1.lem/tl in_caseless footwear)) ! note that "in-case-less", "incl", "IN-cl", and "__IN_CASE_LESS__" would all also work here.
Macros
The macro facility is a combination of tag order rules and output conditionals.
det_type = dem def ind; det_dem: _.<dem>.distance; det_def: _.definite.number; det: (if (1.det_type = dem) 1(det_dem) else 1(def_def) );
Here we define a "det" pattern which will apply the "det_dem" pattern or the "det_def" pattern to its argument based on whether that argument has a <dem>
tag. Since this is a tag order pattern it will be applied to all <det>
s by default and can also be manually applied to other things with the 3(det)
syntax.
Macros are only allowed to clip from the input node (referred to as 1
), including any values passed in.
If a macro specifies a value for an attribute, it will override anything that is passed in. Thus if the above example had 1(det_dem)[distance=prx]
rather than 1(det_dem)
, invoking it as 2(det)[distance=dist]
and as 2(det)[distance=med]
would make no difference and the output would be have <prx>
regardless.
A macro is not required to function as a tag order pattern and may output anything or nothing, so long as it only accesses attributes of the input node.
If you want to call a macro and what node gets passed in doesn't matter, you can use the symbol *
to represent an empty node.
maybe_det: (if (1.definite = def) [the@det.def.sp _] elif (1.number = sg) [a@det.ind.sp _] else [] ); DP -> n { *(maybe_det)[number=1.number, definite=$definite] 1 };
This will insert "the" if the DP is definite, "a" if it's indefinite and singular, and will output only the noun otherwise.
Since a macro needs to be contained in a conditional but not all macros are conditional, they permit the keyword always
in addition to if
, else
, etc.
vaux: (always 1(vblex));
This macro is essentially an alias of vblex
.
Interpolation
Sometimes it is necessary to insert words into existing nodes, such as when generating certain clitics.
NP -> adj n { 2 _1 1 } ; DP -> det NP (if ($lu-count = "2") { 1 _1 2 } else { 1 _ >3 _1 2 } ) ; VP -> DP v.pprs { 1 < be(vaux) _1 2 } ;
These rules represent a scenario where target language present progressive is marked with a clitic which is placed between a determiner and noun phrase. At the VP
level, the clitic is created and inserted into the DP
with a less-than sign (<
). Then at the DP
level, the rule checks whether anything has been inserted by checking whether the value of $lu-count
is 2, which it would be if nothing had been inserted. If $lu-count
is not 2, then the inserted item is output in the appropriate place.
The inserted value is referred to as >3
rather than 3
to tell the compiler that it is an inserted value so as to prevent error messages about trying to access a node that doesn't exist.
Some possible input and output from these rules (written monolingually for simplicity):
^the<det>$ ^green<adj>$ ^frog<n>$ ^speak<v><pprs>$ the NP[green frog] speak DP[the NP[green frog]] speak VP[DP[the NP[green frog]] speak] DP[the NP[green frog] be] speak the be NP[green frog] speak the be frog green speak ^the<det>$ ^be<vaux>$ ^frog<n>$ ^green<adj>$ ^speak<v>$
This is functions as the inverse of rules with multiple outputs. The reverse of the above rules could be something like this:
NP -> n adj { 2 _1 1 } ; DP -> det NP { 1 _1 2 } ; DP vaux -> det vaux NP { { 1 _1 3 } _2 2 } ; VP -> DP vaux v { 1 _1 3[tense=pprs] } ;
Multi-output rules take things inside and moves them out, while interpolation takes things outside and moves them in.
Global Variables
For passing nodes up and down a tree, an alternative to multi-output rules and interpolation is global variables. Global variables are referred to with double dollar signs and are set in the attribute literal section of a rule.
VP -> %vblex DP.$itg [$$wh_word=(if (2.itg = itg) 2)] { 1 (if (2.itg not = itg) [ _1 2 ]) } ;
The value of this variable can then be included in the output step of any rule.
S -> DP.nom VP { (if (2.itg = itg) [ $$wh_word _ ] ) 1 _1 2 } ;
If a rule attempts to output an unset variable, the result will be no output. All variables are reset at the end of the output step.
Clips
When clipping a tag or lemma from a lexical unit in the input stream, /sl
refers to source language, /tl
refers to target language, and /ref
refers to the output of apertium-anaphora. Chunks, meanwhile, have only 1 side, which is /tl
. If the side is left unspecified, then /tl
will be clipped. However, if an LU is being clipped from and the value for /tl
is empty or is the unspecified value for that attribute category, it will try again with /ref
and then with /sl
.
Almost anywhere that a clip or a literal value is used as a value, it can be replaced with an if statement using the same syntax as output conditionals and macros.
VP -> v vaux [$negative=(if (1.negative = neg or 2.negative = neg) neg else pos)] { 2 _1 1[tense=(if (2.lem in verb_ing) pprs) else inf)] } ;
The one exception is embedding an if statement inside a conditional: (x in (if ...))
.
Brackets
A summary of which means what where:
Bracket | General Meaning | Uses | Examples | Comments |
---|---|---|---|---|
()
|
Condition | If statement | (if ...)
|
|
Condition | (a in b)
|
in an if statement | ||
Pattern condition | NP NP ?(1.case = 2.case)
|
|||
Pattern override | 1(vb_impers)
|
use the vb_impers output rule rather than the output rule chosen based on the pattern | ||
Macro invocation | *(maybe_det)
|
|||
Attribute defaults | gender = (GD m) m f mf GD;
|
|||
[]
|
List | Rule variable setting | [$tense=past, $number=sg]
|
|
Node variable setting | 1[tense=2.tense, number=2.number]
|
|||
Set inclusion | tense = [finite] inf ger;
|
tense is composed of <inf>, <ger> and everything in finite | ||
Group tag rewriting | poss > number : [poss_sg] sg, [poss_pl] pl;
|
|||
Grouping in if statements | (if (whatever) [1 _ 2])
|
|||
Pattern tag sets | NP.*.[case_not_nomacc]
|
|||
{}
|
Chunk | Output | NP -> n { 1 } ;
|
|
Chunk | NP clitic -> det clitic NP { { 1 _ 3 } _ 2 } ;
|
Correspondence with t*x
number = sg pl; gender = m f; pre_adj = gran buen; n: _.gender.number; adj: _.gender; NP: _.number; NP -> adj n.$number ?(1.number = 2.number) (if (1.lem/tl incl pre_adj) {1[gender=2.gender] _1 2} else {2 _1 1[gender=2.gender]} ) ;
<transfer> <section-def-cats> <def-cat "n"> <cat-item tags="n"/> <cat-item tags="n.*"/> </def-cat> <def-cat "adj"> <cat-item tags="adj"/> <cat-item tags="adj.*"/> </def-cat> </section-def-cats> <section-def-attrs> <def-attr n="number"> <attr-item tags="sg"/> <attr-item tags="pl"/> </def-attr> <def-attr n="gender"> <attr-item tags="m"/> <attr-item tags="f"/> </def-attr> </section-def-attrs> <section-def-lists> <def-list n="pre_adj"> <list-item v="gran"/> <list-item v="buen"/> </def-list> </section-def-lists> <section-rules> <rule comment="adj n"> <pattern> <pattern-item n="adj"/> <pattern-item n="n"/> </pattern> <action> <choose> <when> <test> <not> <equal> <clip pos="1" side="tl" part="number"/> <clip pos="2" side="tl" part="number"/> </equal> </not> </test> <reject-current-rule/> </when> </choose> <choose> <when> <test> <in caseless="yes"> <clip pos="1" side="tl" part="lem"/> <list n="pre_adj"/> </in> </test> <out> <chunk name="default"> <tags> <tag><lit-tag v="NP"/></tag> </tags> <lu> <clip pos="1" side="tl" part="lemh"/> <lit-tag v="adj"/> <clip pos="2" side="tl" part="gender"/> </lu> <lu> <clip pos="2" side="tl" part="lemh"/> <lit-tag v="n"/> <clip pos="2" side="tl" part="gender"/> <clip pos="2" side="tl" part="number"/> </lu> </chunk> </out> </when> <otherwise> <out> <chunk name="default"> <tags> <tag><lit-tag v="NP"/></tag> </tags> <lu> <clip pos="1" side="tl" part="lemh"/> <lit-tag v="adj"/> <clip pos="2" side="tl" part="gender"/> <clip pos="1" side="tl" part="lemq"/> </lu> <lu> <clip pos="2" side="tl" part="lemh"/> <lit-tag v="n"/> <clip pos="2" side="tl" part="gender"/> <clip pos="2" side="tl" part="number"/> <clip pos="2" side="tl" part="lemq"/> </lu> </chunk> </out> </otherwise> </choose> </action> </rule> </section-rules> </transfer>
number = sg pl; |
<def-attr n="number"> <attr-item tags="sg"/> <attr-item tags="pl"/> </def-attr> <def-list n="number"> <list-item v="sg"/> <list-item v="pl"/> </def-list> |
It isn't shown in the above example, but each list simultaneously defines an attribute category and a list. |
n: _.gender.number; |
(no direct equivalent) | |
NP -> |
<tags> <tag><lit-tag v="NP"/></tag> ... </tags> |
The further contents of <tags> is determined by NP: _.number; , which indicates that those contents will be a number tag, probably clipped from one of the inputs.
|
n |
<def-cat n="some_unique_name"> <cat-item tags="n"/> <cat-item tags="n.*"/> </def-cat> ... <pattern-item n="some_unique_name"/> |
|
.$number |
<clip pos="2" side="tl" part="number"/> |
This determines the contents of <tags> in the output chunk.
|
? |
<choose> <when> <test> <not> ... </not> </test> <reject-current-rule/> </when> </choose> |
There is no functionality equivalent to <reject-current-rule shifting="yes"/> .
|
1.number |
<clip pos="1" part="number"/> |
In the example rule, the clips are written as being side="tl" , but an unspecified clip will actually check all three sides (target, then reference, then source) until it finds a value.
|
(if (...) ... else ... ) |
<choose> <when> <test> ... </test> <out> ... </out> </when> <otherwise> <out> ... </out> </otherwise> </choose> |
|
(... incl ...) |
<in caseless="yes"> ... <list n="..."/> </in> |
|
_1 |
<b pos="1"/> |
|
1[gender=2.gender] |
<lu> <clip pos="1" side="tl" part="lemh"/> <lit-tag v="adj"/> <clip pos="2" side="tl" part="gender"/> <clip pos="1" side="tl" part="lemq"/> </lu> |
<lit-tag v="adj"/> should actually be <clip pos="1" side="tl" part="pos_tag"/> where pos_tag is a special attribute that returns whatever the first tag is.
|
{ ... } |
<chunk name="default"> ... </chunk> |
It is possible to make the name be something other than default, for example with n.$lem/sl in the pattern.
|
Technically this would compile to a rule which outputs an NP
chunk containing the input unchanged and also a separate postchunk rule that would do the actual rearranging so that the conditionals can depend on changed values of the chunk tags.