Difference between revisions of "Apertium-recursive/Formalism"

From Apertium
Jump to navigation Jump to search
(21 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
=== Basic Rule Syntax ===
 
=== Basic Rule Syntax ===
   
Rules consist of a node type, a weight (optional?), a pattern, and an output.
+
Rules consist of a node type, an optional weight, a pattern, an optional condition, an optional variable setting, and an output, in that order.
   
NP -> 2: @det @adj @n {3 2 1} ;
+
NP -> det n {2 _1 1};
   
  +
This matches a determiner followed by a noun, combines them into an NP chunk, and at output time produces "noun determiner".
This gathers an det node, a adj node, and an n node and produces an NP node. Once all rules have been applied, the nodes they have gathered will be output according to their patterns. In this case in the order n adj det (the 3rd, the 2nd, the 1st).
 
   
  +
NP -> 1: n {1} |
The weight of a parse is the sum (?) of all the rules involved in producing it and the parse with the lowest weight is output. There should probably be an additional factor of how many unconsolidated pieces a parse has so we prefer more complete parses (that is, "NP cnj NP" as 3 separate nodes has a lower weight than the consolidated version, but we want the consolidated one).
 
  +
2: n.*.def {the@det.def.sg _ 1};
   
  +
Here the first rule will match any noun, while the second will match a noun with a <code><def></code> tag. Since the second rule has a higher weight, the first rule will not be applied if they both match.
Multiple rules which produce the same node type can be joined with pipes:
 
   
NP -> 1: @det @n {2 1} |
+
NP -> NP and@cnjcoo NP [$number=pl] {1 _1 2 _2 3};
2: @num @n {1 2} ;
 
   
  +
Here the rule specifies that the resulting chunk will be marked with a <code><pl></code> tag.
   
  +
AP -> adj and@cnjcoo adj ?(1.gender/sl = 3.gender/sl) {1 _1 2 _2 3};
; Comments:
 
   
  +
This rule will not apply if the two adjectives have different genders.
* When you say "output" do you mean immediately? or do you mean the AST will be built with that order in mind? - [[User:Francis Tyers|Francis Tyers]] ([[User talk:Francis Tyers|talk]]) 05:48, 13 March 2019 (CET)
 
  +
** It uses the patterns to build the tree bottom-up and then when that's done it applies the output sections top-down (that way the verb phrase can set case on the noun phrase which can then set case on the noun). [[User:Popcorndude|Popcorndude]] ([[User talk:Popcorndude|talk]]) 14:59, 13 March 2019 (CET)
 
  +
The arrow can be written as either <code>-></code> or <code>→</code>.
* I guess the weights should also be lexicalised, but a priori rule weights are probably also a good idea(?) - [[User:Francis Tyers|Francis Tyers]] ([[User talk:Francis Tyers|talk]]) 05:48, 13 March 2019 (CET)
 
  +
** Would something like this be a reasonable way of lexicalising the weights? (from the [[User_talk:Popcorndude/Recursive_Transfer#Ambiguous_rules | ambiguous rules]] example) [[User:Popcorndude|Popcorndude]] ([[User talk:Popcorndude|talk]]) 19:17, 14 March 2019 (CET)
 
  +
The process by which rules are selected is described [[User:Popcorndude/Recursive_Transfer/Parser | here]].
de_nn1 = memoría ;
 
de_nn2 = traducción ;
 
de_nsn1 = hermana madre ;
 
DE-S -> @det.pos @n {1 2} ;
 
de_nofn1 = constitución guerra ;
 
NP -> 1: $de_nn1@n de@pr $de_nn2@n {3 1} |
 
1: $de_nsn1@n de@pr DE-S {3 's@gen 1} |
 
1: $de_nofn1@n de@pr @num {1 2 3} |
 
3: @n de@pr @n {1 2 3} ;
 
* Why isn't it <code>@det @adj @n</code> etc. (per below)? —[[User:Firespeaker|Firespeaker]] ([[User talk:Firespeaker|talk]]) 05:53, 13 March 2019 (CET)
 
** Because I changed it partway through writing this page and forgot to fix this part. [[User:Popcorndude|Popcorndude]] ([[User talk:Popcorndude|talk]]) 14:59, 13 March 2019 (CET)
 
   
 
=== Attribute Lists ===
 
=== Attribute Lists ===
Line 42: Line 33:
 
number = sg pl ND ;
 
number = sg pl ND ;
   
  +
An attribute list can also specify undefined and default values:
=== Lexical Units ===
 
   
  +
gender = (GD m) m f GD;
Lexical units are matched like this:
 
   
  +
This defines the <code>gender</code> category as before, but with the addition that if any rule tries to read the gender of a node that doesn't have a gender tag, the result will be <code><GD></code> rather than the empty string. It also states that any remaining <code><GD></code> tags will be replaced with <code><m></code> tags in the output step.
potato@n.sg ! matches "potato" with tags <n> and <sg>, possibly with others
 
@n ! matches any noun
 
   
  +
An attribute category can include another:
Any of these literals can be replaced with an attribute category using $
 
   
  +
definite = def ind;
potato@n.$number ! matches potato<n><sg> and potato<n><pl>
 
  +
vegetable = potato carrot radish ;
 
  +
! The following are equivalent:
$vegetable@n.sg ! matches potato<n><sg>, carrot<n><sg>, and radish<n><sg>
 
  +
det_type = dem [definite] pos;
  +
det_type = dem def ind pos;
   
  +
=== Tag Order ===
The last one would probably also match potato<n><m><sg>, potato<sg><n>, and potato<x><sg><bloop><n>
 
   
  +
The order of tags for each type of node must be defined like this:
=== Variables ===
 
   
  +
n: _.gender.number;
A node can have variables attached to it. Lexical units have variables corresponding to any attribute categories that match their tags.
 
  +
adj: _.gender;
  +
NP: _.number;
   
  +
Where <code>_</code> represents the lemma and the part of speech tag. Note that it is currently only possible to specify single tags as patterns. However, it is possible to specify that a different pattern should be used (see the output section below). Note also that the lemma queue is automatically appended to the pattern.
NP.number./case.gender/ -> @adj.$number @n.$number.$gender {2(case=$case) 1(case=$case)} ;
 
   
  +
To specify a literal tag in a pattern, put it in angle brackets:
This rule specifies that NP has 3 variables associated with it. The / before case indicates it only appears in the target language and the / after gender indicates it only occurs in the source language. Since number has neither, it appears in both. The NP will initially have a value for $number which will be the number marking of the adjective and noun (which must match) and for $gender, which will be the gender tag of the noun. $case will initially be empty. If the value of $case is set by some other rule further up the tree, then the case tag will be set on both lexical units in the output phase, otherwise they will keep their default marking.
 
   
  +
det: _.<def>.number;
Values can also be transferred between nodes in the output phase:
 
   
  +
=== Patterns ===
VP -> NP @v {2(number=1.number, gender=1.gender) 1(case=nom)} ;
 
   
  +
An element of a pattern must match a single, literal part of speech tag. In order to match multiple part of speech tags, create a separate rule which matches each of them:
This makes the verb agree with the subject in number and gender and sets the subject's case to <nom>.
 
   
  +
NOM -> n {1} | np {1};
The 3 possible assignments are "attr=literal", "attr=index.attr", and "attr=$var".
 
   
  +
To match a lemma or pseudolemma, place it before the part of speech tag, separated by <code>@</code>:
Similar patterns can be used if the output is a literal lexical unit with agreement:
 
   
  +
NP -> the@det n {2 _1 1};
el@det.def.[1.gender].sg
 
el@det.def.$gender.sg
 
   
  +
It is also possible to match a category of lemmas:
=== Variable Conflicts ===
 
   
  +
days = sunday monday tuesday wednesday thursday friday saturday;
I have yet to deal how to have a rule with multiple variables that all reference the same attribute category. I have 3 potential ways of handling this:
 
  +
date -> $days@n the@det num.ord {2 _2 3 _1 1};
   
  +
Tags besides part of speech can be matched as shown above.
! Desired pattern: adj1 adj2 n1 n2
 
  +
! adj1 and n2 have the same case, as do adj2 and n1
 
  +
Pattern elements can also specify values for the tags of the chunk being output by the rule.
  +
  +
number = (ND sg) sg pl sp ND;
  +
NP: _.number;
  +
NP -> n.$number adj {1};
  +
  +
This rule specifies that the number tag of the NP chunk should be copied from the noun. It will use the target language side if that is available. If not, it will proceed to the reference side, and then the source side. If all three of these are empty, it will use the default value <code><ND></code>. To require that a particular variable be taken from a particular side, put the side after a slash:
  +
  +
NP: number;
  +
NP -> det.$number/ref n {1 _1 2};
  +
  +
<code>/sl</code> refers to the source language, <code>/tl</code> to the target language, and <code>/ref</code> to anything added by anaphora resolution.
  +
  +
If a pattern element is contributing several tags to the chunk, the following shortcut is available:
  +
  +
NP: _.number.gender;
  +
NP -> %n adj {2 _1 1};
  +
  +
The <code>%</code> indicates the noun is the source of all chunk tags not elsewhere specified.
  +
  +
To specify a literal value for a chunk tag, put it in square brackets after the pattern like this:
  +
  +
NP: _.gender.number;
  +
NP -> 0: NP cnjcoo NP [$gender=m, $number=pl] {1 _1 2 _2 3} |
  +
1: NP.f cnjcoo NP.f [$gender=f, $number=pl] {1 _1 2 _2 3} |
  +
2: NP.*.sg or@cnjcoo NP.*.sg [$gender=m, $number=sg] {1 _1 2 _2 3} |
  +
3: NP.f.sg or@cnjcoo NP.f.sg [$gender=f, $number=sg] {1 _1 2 _2 3} ;
  +
  +
That is, treat the gender of the phrase as masculine unless both elements are feminine and the number as singular unless the conjunction is "or" and both elements are singular.
  +
  +
The pattern only looks at the source language, but it is possible to add constraints:
  +
  +
conj_list = and or;
  +
NP: _.gender.number;
  +
NP -> %NP cnjcoo NP ?((2.lem/tl in conj_list) and ~(3.gender = 1.gender)) {1 _1 2 _2 3};
  +
  +
This will only match the pattern if it is also the case that the target language lemma of the conjunction is "and" or "or" and the two NPs have different genders. See below for the syntax of conditions.
  +
  +
=== Outputs ===
  +
  +
Output elements are written between curly braces and may be any of the following:
  +
  +
==== Blanks ====
  +
  +
An underscore represents a single space. An underscore followed by a number represents the superblank after that position, so <code>1 _ 2</code> is elements 1 and 2 separated by a space while <code>1 _1 2</code> is elements 1 and 2 separated by whatever separated them in the input.
  +
  +
==== Matched Elements ====
  +
  +
A number represents the input element in that position with its tags arranged according to the defined output pattern for its part of speech tag. It can be followed by a specification of where those tags should come from.
  +
  +
1
  +
! the first input element
 
 
  +
1(gender=f)
! Option 1: variable subscripts
 
  +
! the first input element with the gender tag <f>
@adj.$case#a @adj.$case#b @n.$case#b @n.$case#a
 
 
 
  +
1(gender=2.gender/ref)
! Option 2: require multiple attribute lists
 
  +
! the first input element with the gender tag of the reference side of the second input element
case = nom acc dat ;
 
case2 = nom acc dat ;
 
@adj.$case @adj.$case2 @n.$case2 @n.$case
 
 
 
  +
1(gender=$gender)
! Option 3: conditionals and variables that aren't attribute names
 
  +
! the first input element with the gender tag set to a placeholder to be filled on output with the gender tag of its parent chunk
@adj.$case @adj(case=$othercase) @n(case=$othercase) @n.$case
 
! I say "conditionals" because this syntax makes it easy to have things like
 
@n(case=$othercase not $case)
 
@adj(case=$othercase not nom)
 
   
  +
These elements can also be prefixed with <code>%</code> to specify that as many tags as possible should be placeholders for tags of the parent chunk.
I find it difficult to come up with situations in which this would be needed, but in the ones where it is, the conditionals are also wanted, so maybe Option 3 is best, or maybe there should be a way to specify restrictions outside of just the pattern, such as
 
   
  +
These elements can be conjoined using +:
! using Option 1
 
  +
XP -> 3: @adj.$case#a @adj.$case#b @n.$case#b @n.$case#a ($case#a != $case#b) {1 4 2 3} ;
 
  +
1(gender=f) + 2
! note: this particular syntax conflicts with the current use of ! for comments
 
  +
  +
This will generate something like <code>^blah<n><f>+bloop<adj>$</code>.
  +
  +
By default, the order of the output tags is based on the output pattern corresponding to the part of speech tag in the pattern. However, it is possible to override this using square brackets:
  +
  +
vblex: _.tense.person.number;
  +
vbinf: _.<inf>;
  +
  +
V -> vblex.inf {1};
  +
! result: ^whatever<vblex><inf><{person}><{number}>$
  +
  +
V -> vblex.inf {1[vbinf]};
  +
! result: ^whatever<vblex><inf>$
  +
  +
Note that the part of speech tag of the output is in all cases the part of speech tag of the input. To avoid this behavior (for example, if you want to change the part of speech tag), write an output rule like the following:
  +
  +
adj: lemh.<adj>.number;
  +
  +
==== Literal Lexical Units ====
  +
  +
A new lexical unit can be inserted like this:
  +
  +
the@det.def.mf.sp
  +
  +
Placeholders can be included using <code>$</code>:
  +
  +
the@det.def.$gender.sp
  +
  +
And clips from other elements can be placed in square brackets:
  +
  +
the@det.def.[2.gender].[3.number/sl]
  +
  +
==== Output Conditionals ====
  +
  +
An output conditional evaluates a sequence of conditions and outputs the element corresponding to the first one that evaluates to true. The element to be output can be any of the possibilities listed above, the entire chunk, or another conditional.
  +
  +
NP -> NP cnjcoo NP
  +
(if (2.lem/sl = and)
  +
{ 1 _1 3 }
  +
else
  +
{ 1 _1 2 _2 3 } );
  +
  +
Here the rule determines what the final output will be based on the lemma of the conjunction.
  +
  +
PP -> DP ?(1.case in might_get_pr)
  +
(if (1.prep_flag = none)
  +
{ 1 }
  +
else
  +
{ (if (1.prep_flag = to)
  +
to@pr
  +
else-if (1.prep_flag = at)
  +
at@pr
  +
else-if (1.prep_flag = in)
  +
in@pr
  +
else-if (1.prep_flag = on)
  +
on@pr
  +
else
  +
for@pr
  +
)
  +
_ 1 } );
  +
  +
Here the rule determines first whether to add a preposition. If it is going to add a preposition, it creates a chunk and within that chunk, has another if statement to determine which preposition to add.
  +
  +
The first clause is labeled "if", the last can be "else" or "otherwise", and intermediate ones can be "if", "else-if", or "elif". These labels follow the same rules as logical operators - that is, capitalization, "-", and "_" are all ignored.
  +
  +
For the output of an if statement to have multiple elements, surround those elements with square brackets. Thus the conjunction rule above can be rewritten as follows:
  +
  +
NP -> NP cnjcoo NP
  +
{ 1 _1
  +
(if (2.lem/sl = and)
  +
[ 2 _2 ]
  +
else [] )
  +
3 };
  +
  +
=== Conditions ===
  +
  +
Conditions are written in parentheses. A condition is a value, an operator, and another value. If the operator is "and" or "or" these values are other conditions, otherwise they are clips or strings. A condition can be negated by writing "not" before the operator.
  +
  +
(1.case = 2.case) ! true if the first and second elements have the same case, otherwise false
  +
(1.case not = 2.case) ! the reverse of the previous line
  +
  +
The full list of operations is as follows:
  +
  +
{| class="wikitable" border="1"
  +
|-
  +
! Name
  +
! Description
  +
! Alternate Spellings
  +
|-
  +
| And
  +
| Evaluates to true if both arguments evaluates to true, otherwise false
  +
| &
  +
|-
  +
| Or
  +
| Evaluates to true if either argument evaluates to true, otherwise false
  +
| <nowiki>|</nowiki>
  +
|-
  +
| Equal
  +
| Evaluates to true if the arguments are identical strings
  +
| =
  +
|-
  +
| IsPrefix
  +
| Evaluates to true if the right argument occurs at the beginning of the left argument
  +
| StartsWith, BeginsWith
  +
|-
  +
| IsSuffix
  +
| Evaluates to true if the right argument occurs at the end of the left argument
  +
| EndsWith
  +
|-
  +
| IsSubstring
  +
| Evaluates to true if the right argument occurs anywhere in the left argument
  +
| Contains
  +
|-
  +
| HasPrefix
  +
| Evaluates to true if the left argument begins with anything in the list named by the right argument
  +
| StartsWithList, BeginsWithList
  +
|-
  +
| HasSuffix
  +
| Evaluates to true if the left argument ends with anything in the list named by the right argument
  +
| EndsWithList
  +
|-
  +
| In
  +
| Evaluates to true if the left argument is a member of the list named by the right argument
  +
| ∈
  +
|}
  +
  +
Any of these operators (besides And and Or) can be made to ignore case by adding one of "cl", "caseless", "fold", "foldcase".
  +
  +
maybe_get_pr = dat obj;
  +
(1.case in maybe_get_pr)
  +
  +
footwear = boot sock shoe sandal;
  +
((1.number = du) and (1.lem/tl in_caseless footwear))
  +
! note that "in-case-less", "incl", "IN-cl", and "__IN_CASE_LESS__" would all also work here.
  +
  +
=== Tag Rewrite Rules ===
  +
  +
This is a way to convert certain sets of tags, either between two languages that have different sets of tenses, or between something like object agreement and number marking.
  +
  +
object_agr = o1sg o1pl o2sg o2pl o3sg o3pl ;
  +
number = sg pl ;
  +
person = p1 p2 p3 ;
  +
  +
object_agr > person: o1sg p1, o1pl p1, o2sg p2, o2pl p2, o3sg p3, o3pl p3 ;
  +
object_agr > number: o1sg sg, o1pl pl, o2sg sg, o2pl pl, o3sg sg, o3pl pl ;
  +
  +
VP -> @v NP {2(number=1.object_agr) _1 1} ;
  +
  +
In this example, if the verb had <code><o2sg></code>, it would be converted to <code><sg></code> when it was set as the <code>number</code> attribute of the noun.
  +
  +
tense = farpst nearpst pst prs fut nonpst ;
  +
  +
tense > tense: farpst pst, nearpst pst, prs nonpst, fut nonpst ;
  +
  +
In this example, no explicit assignment needs to take place and the 4 tenses of the source language (<code>farpst, nearpst, prs, fut</code>) would be automatically converted to the 2 of the target language (<code>pst, nonpst</code>).
  +
  +
Converting from 4 to 3 with something like <pre>tense > tense: farpst pst, nearpst pst ;</pre>
  +
will also work, the unchanged tags not needing to be explicitly mentioned.
  +
  +
When an attribute category is being mapped to itself, such as in the tense example above, the replacement is always performed. As a result, if a tag appears on the left side of a change and the right side of another, the results may be incorrect. For example:
  +
  +
tense > tense: midpst pst, pst pri;
  +
  +
This rule might convert <code><midpst></code> to either <code><pst></code> or <code><pri></code> in different situations.
  +
  +
However, when a rule maps between different categories, as in the object agreement example, the transformation will not happen invisibly. That is, if you have <code>1</code> in the output, a <code>tense > tense</code> conversion will happen, but a <code>object_agr > number</code> one won't. This is because the compiler does not have enough information to know what attributes that node has which can be clipped and thus does not know what it is converting from.
  +
  +
In order for this to be fully automatic, the <code>number</code> element in the relevant output pattern would have to compile to something which checked <code>number</code> and then every attribute that could map to <code>number</code> until it found one. While this behavior could be added if desired, I initially deemed it too complicated and simply required that in such situations the rule author has to write <code>1(number=1.object_agr)</code> to trigger the <code>object_agr > number</code> conversion.
  +
  +
It is also possible to explicitly convert a value, for example when doing comparisons:
  +
  +
1.object_agr>number
  +
1.object_agr>person
  +
  +
Like attribute category definitions, tag rewrite rules can refer to entire output categories by enclosing them in square brackets. Since the replacement must result in a single value, this can only be done on the source side.
  +
  +
pasts = farpst midpst nearpst;
  +
tense = farpst midpst nearpst past pres fut;
  +
  +
! These are equivalent:
  +
tense > tense : [pasts] past;
  +
tense > tense : farpst past, midpst past, nearpst past;
  +
  +
=== Macros ===
  +
  +
The macro facility is a combination of tag order rules and output conditionals.
  +
  +
det_type = dem def ind;
  +
  +
det_dem: _.<dem>.distance;
  +
det_def: _.definite.number;
  +
  +
det: (if (1.det_type = dem)
  +
1[det_dem]
  +
else
  +
1[def_def]
  +
);
  +
  +
Here we define a "det" pattern which will apply the "det_dem" pattern or the "det_def" pattern to its argument based on whether that argument has a <code><dem></code> tag. Since this is a tag order pattern it will be applied to all <code><det></code>s by default and can also be manually applied to other things with the <code>3[det]</code> syntax.
  +
  +
Macros are only allowed to clip from the input node (referred to as <code>1</code>), including any values passed in.
  +
  +
If a macro specifies a value for an attribute, it will override anything that is passed in. Thus if the above example had <code>1[det_dem](distance=prx)</code> rather than <code>1[det_dem]</code>, invoking it as <code>2[det](distance=dist)</code> and as <code>2[det](distance=med)</code> would make no difference and the output would be have <code><prx></code> regardless.
  +
  +
A macro is not required to function as a tag order pattern and may output anything or nothing, so long as it only accesses attributes of the input node.
  +
  +
If you want to call a macro and what node gets passed in doesn't matter, you can use the symbol <code>*</code> to represent an empty node.
  +
  +
maybe_det: (if (1.definite = def)
  +
[the@det.def.sp _]
  +
elif (1.number = sg)
  +
[a@det.ind.sp _]
  +
else [] );
  +
DP -> n { *[maybe_det](number=1.number, definite=$definite) 1 };
  +
  +
This will insert "the" if the DP is definite, "a" if it's indefinite and singular, and will output only the noun otherwise.
  +
  +
Since a macro needs to be contained in a conditional but not all macros are conditional, they permit the keyword <code>always</code> in addition to <code>if</code>, <code>else</code>, etc.
  +
  +
vaux: (always 1[vblex]);
  +
  +
This macro is essentially an alias of <code>vblex</code>.
  +
  +
=== Interpolation (not yet implemented) ===
  +
  +
Parsing clitics, such as [[User_talk:Popcorndude/Recursive_Transfer#Serbo-Croatian_clitics]] can be done using multiple output units
  +
  +
vbser n -> @n @vbser {2} {1} ;
  +
NP -> @n @det {2 _1 1} ;
  +
! should be able to handle "noun clitic determiner"
  +
  +
Outputting them, however, is more difficult. My current idea is to do something like this:
  +
  +
NP -> @det @n {2 _1 1};
  +
VP -> NP @vbser {(_1 2)>1};
  +
  +
Where <code>(_1 2)>1</code> means "put the space between the elements and element 2 after the first word of element 1". The corresponding syntax for a right-aligned clitic would be <code>1<(2 _1)</code>. New lexical units could also be put in the parentheses (even if there's only one thing being inserted, the parentheses should, I think, be mandatory for clarity).
  +
  +
I'm not sure whether this will cover all cases, but it should at least cover a lot of them.
  +
  +
== Correspondence with t*x ==
  +
  +
number = sg pl;
  +
gender = m f;
  +
pre_adj = gran buen;
  +
  +
n: _.gender.number;
  +
adj: _.gender;
  +
NP: _.number;
  +
  +
NP -> adj n.$number ?(1.number = 2.number)
  +
(if (1.lem/tl incl pre_adj)
  +
{1(gender=2.gender) _1 2}
  +
else
  +
{2 _1 1(gender=2.gender)}
  +
) ;
  +
  +
<transfer>
  +
<section-def-cats>
  +
<def-cat "n">
  +
<cat-item tags="n"/>
  +
<cat-item tags="n.*"/>
  +
</def-cat>
  +
<def-cat "adj">
  +
<cat-item tags="adj"/>
  +
<cat-item tags="adj.*"/>
  +
</def-cat>
  +
</section-def-cats>
  +
<section-def-attrs>
  +
<def-attr n="number">
  +
<attr-item tags="sg"/>
  +
<attr-item tags="pl"/>
  +
</def-attr>
  +
<def-attr n="gender">
  +
<attr-item tags="m"/>
  +
<attr-item tags="f"/>
  +
</def-attr>
  +
</section-def-attrs>
  +
<section-def-lists>
  +
<def-list n="pre_adj">
  +
<list-item v="gran"/>
  +
<list-item v="buen"/>
  +
</def-list>
  +
</section-def-lists>
  +
<section-rules>
  +
<rule comment="adj n">
  +
<pattern>
  +
<pattern-item n="adj"/>
  +
<pattern-item n="n"/>
  +
</pattern>
  +
<action>
  +
<choose>
  +
<when>
  +
<test>
  +
<not>
  +
<equal>
  +
<clip pos="1" side="tl" part="number"/>
  +
<clip pos="2" side="tl" part="number"/>
  +
</equal>
  +
</not>
  +
</test>
  +
<reject-current-rule/>
  +
</when>
  +
</choose>
  +
<choose>
  +
<when>
  +
<test>
  +
<in caseless="yes">
  +
<clip pos="1" side="tl" part="lem"/>
  +
<list n="pre_adj"/>
  +
</in>
  +
</test>
  +
<out>
  +
<chunk name="default">
  +
<tags>
  +
<tag><lit-tag v="NP"/></tag>
  +
</tags>
  +
<lu>
  +
<clip pos="1" side="tl" part="lemh"/>
  +
<lit-tag v="adj"/>
  +
<clip pos="2" side="tl" part="gender"/>
  +
</lu>
  +
<b pos="1"/>
  +
<lu>
  +
<clip pos="2" side="tl" part="lemh"/>
  +
<lit-tag v="n"/>
  +
<clip pos="2" side="tl" part="gender"/>
  +
<clip pos="2" side="tl" part="number"/>
  +
</lu>
  +
</chunk>
  +
</out>
  +
</when>
  +
<otherwise>
  +
<out>
  +
<chunk name="default">
  +
<tags>
  +
<tag><lit-tag v="NP"/></tag>
  +
</tags>
  +
<lu>
  +
<clip pos="1" side="tl" part="lemh"/>
  +
<lit-tag v="adj"/>
  +
<clip pos="2" side="tl" part="gender"/>
  +
<clip pos="1" side="tl" part="lemq"/>
  +
</lu>
  +
<b pos="1"/>
  +
<lu>
  +
<clip pos="2" side="tl" part="lemh"/>
  +
<lit-tag v="n"/>
  +
<clip pos="2" side="tl" part="gender"/>
  +
<clip pos="2" side="tl" part="number"/>
  +
<clip pos="2" side="tl" part="lemq"/>
  +
</lu>
  +
</chunk>
  +
</out>
  +
</otherwise>
  +
</choose>
  +
</action>
  +
</rule>
  +
</section-rules>
  +
</transfer>
  +
  +
{| class="wikitable" border="1"
  +
|-
  +
| <pre>
  +
number = sg pl;
  +
</pre>
  +
| <pre>
  +
<def-attr n="number">
  +
<attr-item tags="sg"/>
  +
<attr-item tags="pl"/>
  +
</def-attr>
  +
<def-list n="number">
  +
<list-item v="sg"/>
  +
<list-item v="pl"/>
  +
</def-list>
  +
</pre>
  +
| It isn't shown in the above example, but each list simultaneously defines an attribute category and a list.
  +
|-
  +
| <pre>n: _.gender.number;</pre>
  +
| (no direct equivalent)
  +
|
  +
|-
  +
| <pre>NP -> </pre>
  +
| <pre>
  +
<tags>
  +
<tag><lit-tag v="NP"/></tag>
  +
...
  +
</tags>
  +
</pre>
  +
| The further contents of <code><tags></code> is determined by <code>NP: _.number;</code>, which indicates that those contents will be a <code>number</code> tag, probably clipped from one of the inputs.
  +
|-
  +
| <pre>n</pre>
  +
| <pre><def-cat n="some_unique_name">
  +
<cat-item tags="n"/>
  +
<cat-item tags="n.*"/>
  +
</def-cat>
  +
...
  +
<pattern-item n="some_unique_name"/>
  +
</pre>
  +
|
  +
|-
  +
| <pre>.$number</pre>
  +
| <pre><clip pos="2" side="tl" part="number"/></pre>
  +
| This determines the contents of <code><tags></code> in the output chunk.
  +
|-
  +
| <pre>?</pre>
  +
| <pre>
  +
<choose>
  +
<when>
  +
<test>
  +
<not>
  +
...
  +
</not>
  +
</test>
  +
<reject-current-rule/>
  +
</when>
  +
</choose>
  +
</pre>
  +
| There is no functionality equivalent to <code><reject-current-rule shifting="yes"/></code>.
  +
|-
  +
| <pre>1.number</pre>
  +
| <pre><clip pos="1" part="number"/></pre>
  +
| In the example rule, the clips are written as being <code>side="tl"</code>, but an unspecified clip will actually check all three sides (target, then reference, then source) until it finds a value.
  +
|-
  +
| <pre>
  +
(if (...)
  +
...
  +
else
  +
...
  +
)
  +
</pre>
  +
| <pre>
  +
<choose>
  +
<when>
  +
<test>
  +
...
  +
</test>
  +
<out>
  +
...
  +
</out>
  +
</when>
  +
<otherwise>
  +
<out>
  +
...
  +
</out>
  +
</otherwise>
  +
</choose>
  +
</pre>
  +
|
  +
|-
  +
| <pre>(... incl ...)</pre>
  +
| <pre>
  +
<in caseless="yes">
  +
...
  +
<list n="..."/>
  +
</in>
  +
</pre>
  +
|
  +
|-
  +
| <pre>_1</pre>
  +
| <pre><b pos="1"/></pre>
  +
|
  +
|-
  +
| <pre>1(gender=2.gender)</pre>
  +
| <pre>
  +
<lu>
  +
<clip pos="1" side="tl" part="lemh"/>
  +
<lit-tag v="adj"/>
  +
<clip pos="2" side="tl" part="gender"/>
  +
<clip pos="1" side="tl" part="lemq"/>
  +
</lu>
  +
</pre>
  +
| <code><lit-tag v="adj"/></code> should actually be <code><clip pos="1" side="tl" part="pos_tag"/></code> where <code>pos_tag</code> is a special attribute that returns whatever the first tag is.
  +
|-
  +
| <pre>{ ... }</pre>
  +
| <pre>
  +
<chunk name="default">
  +
...
  +
</chunk>
  +
</pre>
  +
| It is possible to make the name be something other than default, for example with <code>n.$lem/sl</code> in the pattern.
  +
|}
  +
  +
Technically this would compile to a rule which output an <code>NP</code> chunk containing the input unchanged and also a separate postchunk rule that would do the actual rearranging so that the conditionals can depend on changed values of the chunk tags.
  +
  +
== Example ==
  +
  +
=== Initial Sentence ===
  +
  +
In a hole in the ground there lived a Hobbit.
  +
  +
=== Output of eng-spa-lex ===
  +
  +
^In<pr>/En<pr>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^hole<n><sg>/agujero<n><m><sg>$ ^in<pr>/en<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^ground<n><sg>/tierra<n><f><sg>$ ^there<adv>/allí<adv>$ ^live<vblex><past>/vivir<vblex><past>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^Hobbit<n><sg>/Hobbit<n><m><sg>$^.<sent>/.<sent>$^.<sent>/.<sent>$
  +
  +
=== A Simple Set of Rules ===
  +
  +
gender = m f;
  +
number = (ND sg) sg pl ND;
  +
definite = def ind;
  +
tense = past pres ifi;
  +
person = (PD p3) p1 p2 p3 PD;
  +
  +
tense > tense : past ifi;
  +
  +
n: _.gender.number;
  +
det: _.definite.gender.number;
  +
pr: _;
  +
vblex: _.tense.person.number;
  +
adv: _;
  +
  +
NP: _.gender.number;
  +
DP: _.gender.number;
  +
PP: _;
  +
VP: _.tense.person.number;
  +
  +
NP -> %n { 1 } |
  +
10: %n PP { 1 _1 2 } ;
  +
  +
PP -> pr DP { 1 _1 2 } ;
  +
  +
DP -> det %NP { 1(gender=2.gender, number=2.number) _1 2 } ;
  +
  +
VP -> %vblex DP { 1 _1 2 } |
  +
adv %VP { 1 _1 2 } |
  +
PP %VP { 1 _1 2 } ;
   
=== Blanks ===
+
=== Process ===
   
  +
{| class="wikitable"
The current transfer system deal with blanks, so in the output section "_n" is the formatting after node "n", so {1 _1 2} is "change nothing". Adding blanks could be either "_", corresponding to the current system, or they could be inserted automatically. Alternatively, the transfer module could ignore blanks.
 
  +
|-
  +
! Action
  +
! Result
  +
! Comments
  +
|-
  +
| Read token
  +
| <ol>
  +
<li>^In<pr>/En<pr>$</li>
  +
</ol>
  +
|
  +
|-
  +
| Read token
  +
| <ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li></ol>
  +
|
  +
|-
  +
| Read token
  +
| <ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol>
  +
|
  +
|-
  +
| Split
  +
| <ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol>
  +
<hr>
  +
<ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^hole<n><sg>/agujero<n><m><sg>$</pre></li></ol>
  +
| Rule 1 (<code>NP -> n</code>) could apply, but it's possible that reading more of the input would make it so rule 2 (<code>NP -> n PP</code>) could apply, so we do both.
  +
|-
  +
| Apply rule 1 (<code>NP -> n</code>)
  +
| <ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$</li></ol>
  +
<hr/>
  +
<ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol>
  +
| Since the rule says <code>%n</code>, the required NP tags (gender and number) are filled in with the values of the noun tags.
  +
|-
  +
| Apply rule 4 (<code>DP -> det NP</code>)
  +
| <ol><li>^In<pr>/En<pr>$</li>
  +
<li>^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$</li></ol>
  +
<hr/>
  +
<ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol>
  +
| Note that the determiner still has GD as it's gender. Child tags are not modified until the output step.
  +
|-
  +
| Apply rule 3 (<code>PP -> pr DP</code>)
  +
| <ol><li>^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$</li></ol>
  +
<hr/>
  +
<ol><li>^In<pr>/En<pr>$</li>
  +
<li>^a<det><ind><sg>/uno<det><ind><GD><sg>$</li>
  +
<li>^hole<n><sg>/agujero<n><m><sg>$</li></ol>
  +
|
  +
|}

Revision as of 22:20, 18 July 2019

A proposal for a recursive transfer rule formalism.

Basic Rule Syntax

Rules consist of a node type, an optional weight, a pattern, an optional condition, an optional variable setting, and an output, in that order.

NP -> det n {2 _1 1};

This matches a determiner followed by a noun, combines them into an NP chunk, and at output time produces "noun determiner".

NP -> 1: n {1} |
      2: n.*.def {the@det.def.sg _ 1};

Here the first rule will match any noun, while the second will match a noun with a <def> tag. Since the second rule has a higher weight, the first rule will not be applied if they both match.

NP -> NP and@cnjcoo NP [$number=pl] {1 _1 2 _2 3};

Here the rule specifies that the resulting chunk will be marked with a <pl> tag.

AP -> adj and@cnjcoo adj ?(1.gender/sl = 3.gender/sl) {1 _1 2 _2 3};

This rule will not apply if the two adjectives have different genders.

The arrow can be written as either -> or .

The process by which rules are selected is described here.

Attribute Lists

A list of attributes can be defined like this:

gender = m f GD ;
number = sg pl ND ;

An attribute list can also specify undefined and default values:

gender = (GD m) m f GD;

This defines the gender category as before, but with the addition that if any rule tries to read the gender of a node that doesn't have a gender tag, the result will be <GD> rather than the empty string. It also states that any remaining <GD> tags will be replaced with <m> tags in the output step.

An attribute category can include another:

definite = def ind;

! The following are equivalent:
det_type = dem [definite] pos;
det_type = dem def ind pos;

Tag Order

The order of tags for each type of node must be defined like this:

n: _.gender.number;
adj: _.gender;
NP: _.number;

Where _ represents the lemma and the part of speech tag. Note that it is currently only possible to specify single tags as patterns. However, it is possible to specify that a different pattern should be used (see the output section below). Note also that the lemma queue is automatically appended to the pattern.

To specify a literal tag in a pattern, put it in angle brackets:

det: _.<def>.number;

Patterns

An element of a pattern must match a single, literal part of speech tag. In order to match multiple part of speech tags, create a separate rule which matches each of them:

NOM -> n {1} | np {1};

To match a lemma or pseudolemma, place it before the part of speech tag, separated by @:

NP -> the@det n {2 _1 1};

It is also possible to match a category of lemmas:

days = sunday monday tuesday wednesday thursday friday saturday;
date -> $days@n the@det num.ord {2 _2 3 _1 1};

Tags besides part of speech can be matched as shown above.

Pattern elements can also specify values for the tags of the chunk being output by the rule.

number = (ND sg) sg pl sp ND;
NP: _.number;
NP -> n.$number adj {1};

This rule specifies that the number tag of the NP chunk should be copied from the noun. It will use the target language side if that is available. If not, it will proceed to the reference side, and then the source side. If all three of these are empty, it will use the default value <ND>. To require that a particular variable be taken from a particular side, put the side after a slash:

NP: number;
NP -> det.$number/ref n {1 _1 2};

/sl refers to the source language, /tl to the target language, and /ref to anything added by anaphora resolution.

If a pattern element is contributing several tags to the chunk, the following shortcut is available:

NP: _.number.gender;
NP -> %n adj {2 _1 1};

The % indicates the noun is the source of all chunk tags not elsewhere specified.

To specify a literal value for a chunk tag, put it in square brackets after the pattern like this:

NP: _.gender.number;
NP -> 0: NP cnjcoo NP [$gender=m, $number=pl] {1 _1 2 _2 3} |
      1: NP.f cnjcoo NP.f [$gender=f, $number=pl] {1 _1 2 _2 3} |
      2: NP.*.sg or@cnjcoo NP.*.sg [$gender=m, $number=sg] {1 _1 2 _2 3} |
      3: NP.f.sg or@cnjcoo NP.f.sg [$gender=f, $number=sg] {1 _1 2 _2 3} ;

That is, treat the gender of the phrase as masculine unless both elements are feminine and the number as singular unless the conjunction is "or" and both elements are singular.

The pattern only looks at the source language, but it is possible to add constraints:

conj_list = and or;
NP: _.gender.number;
NP -> %NP cnjcoo NP ?((2.lem/tl in conj_list) and ~(3.gender = 1.gender)) {1 _1 2 _2 3};

This will only match the pattern if it is also the case that the target language lemma of the conjunction is "and" or "or" and the two NPs have different genders. See below for the syntax of conditions.

Outputs

Output elements are written between curly braces and may be any of the following:

Blanks

An underscore represents a single space. An underscore followed by a number represents the superblank after that position, so 1 _ 2 is elements 1 and 2 separated by a space while 1 _1 2 is elements 1 and 2 separated by whatever separated them in the input.

Matched Elements

A number represents the input element in that position with its tags arranged according to the defined output pattern for its part of speech tag. It can be followed by a specification of where those tags should come from.

1
! the first input element

1(gender=f)
! the first input element with the gender tag <f>

1(gender=2.gender/ref)
! the first input element with the gender tag of the reference side of the second input element

1(gender=$gender)
! the first input element with the gender tag set to a placeholder to be filled on output with the gender tag of its parent chunk

These elements can also be prefixed with % to specify that as many tags as possible should be placeholders for tags of the parent chunk.

These elements can be conjoined using +:

1(gender=f) + 2

This will generate something like ^blah<n><f>+bloop<adj>$.

By default, the order of the output tags is based on the output pattern corresponding to the part of speech tag in the pattern. However, it is possible to override this using square brackets:

vblex: _.tense.person.number;
vbinf: _.<inf>;

V -> vblex.inf {1};
  ! result: ^whatever<vblex><inf><{person}><{number}>$

V -> vblex.inf {1[vbinf]};
  ! result: ^whatever<vblex><inf>$

Note that the part of speech tag of the output is in all cases the part of speech tag of the input. To avoid this behavior (for example, if you want to change the part of speech tag), write an output rule like the following:

adj: lemh.<adj>.number;

Literal Lexical Units

A new lexical unit can be inserted like this:

the@det.def.mf.sp

Placeholders can be included using $:

the@det.def.$gender.sp

And clips from other elements can be placed in square brackets:

the@det.def.[2.gender].[3.number/sl]

Output Conditionals

An output conditional evaluates a sequence of conditions and outputs the element corresponding to the first one that evaluates to true. The element to be output can be any of the possibilities listed above, the entire chunk, or another conditional.

NP -> NP cnjcoo NP
         (if (2.lem/sl = and)
             { 1 _1 3 }
          else
             { 1 _1 2 _2 3 } );

Here the rule determines what the final output will be based on the lemma of the conjunction.

PP -> DP ?(1.case in might_get_pr)
         (if (1.prep_flag = none)
             { 1 }
          else
             { (if (1.prep_flag = to)
                   to@pr
                else-if (1.prep_flag = at)
                   at@pr
                else-if (1.prep_flag = in)
                   in@pr
                else-if (1.prep_flag = on)
                   on@pr
                else
                   for@pr
                )
                _ 1 } );

Here the rule determines first whether to add a preposition. If it is going to add a preposition, it creates a chunk and within that chunk, has another if statement to determine which preposition to add.

The first clause is labeled "if", the last can be "else" or "otherwise", and intermediate ones can be "if", "else-if", or "elif". These labels follow the same rules as logical operators - that is, capitalization, "-", and "_" are all ignored.

For the output of an if statement to have multiple elements, surround those elements with square brackets. Thus the conjunction rule above can be rewritten as follows:

NP -> NP cnjcoo NP
         { 1 _1
           (if (2.lem/sl = and)
               [ 2 _2 ]
            else [] )
          3 };

Conditions

Conditions are written in parentheses. A condition is a value, an operator, and another value. If the operator is "and" or "or" these values are other conditions, otherwise they are clips or strings. A condition can be negated by writing "not" before the operator.

(1.case = 2.case)     ! true if the first and second elements have the same case, otherwise false
(1.case not = 2.case) ! the reverse of the previous line

The full list of operations is as follows:

Name Description Alternate Spellings
And Evaluates to true if both arguments evaluates to true, otherwise false &
Or Evaluates to true if either argument evaluates to true, otherwise false |
Equal Evaluates to true if the arguments are identical strings =
IsPrefix Evaluates to true if the right argument occurs at the beginning of the left argument StartsWith, BeginsWith
IsSuffix Evaluates to true if the right argument occurs at the end of the left argument EndsWith
IsSubstring Evaluates to true if the right argument occurs anywhere in the left argument Contains
HasPrefix Evaluates to true if the left argument begins with anything in the list named by the right argument StartsWithList, BeginsWithList
HasSuffix Evaluates to true if the left argument ends with anything in the list named by the right argument EndsWithList
In Evaluates to true if the left argument is a member of the list named by the right argument

Any of these operators (besides And and Or) can be made to ignore case by adding one of "cl", "caseless", "fold", "foldcase".

maybe_get_pr = dat obj;
(1.case in maybe_get_pr)

footwear = boot sock shoe sandal;
((1.number = du) and (1.lem/tl in_caseless footwear))
! note that "in-case-less", "incl", "IN-cl", and "__IN_CASE_LESS__" would all also work here.

Tag Rewrite Rules

This is a way to convert certain sets of tags, either between two languages that have different sets of tenses, or between something like object agreement and number marking.

object_agr = o1sg o1pl o2sg o2pl o3sg o3pl ;
number = sg pl ;
person = p1 p2 p3 ;

object_agr > person: o1sg p1, o1pl p1, o2sg p2, o2pl p2, o3sg p3, o3pl p3 ;
object_agr > number: o1sg sg, o1pl pl, o2sg sg, o2pl pl, o3sg sg, o3pl pl ;

VP -> @v NP {2(number=1.object_agr) _1 1} ;

In this example, if the verb had <o2sg>, it would be converted to <sg> when it was set as the number attribute of the noun.

tense = farpst nearpst pst prs fut nonpst ;

tense > tense: farpst pst, nearpst pst, prs nonpst, fut nonpst ;

In this example, no explicit assignment needs to take place and the 4 tenses of the source language (farpst, nearpst, prs, fut) would be automatically converted to the 2 of the target language (pst, nonpst).

Converting from 4 to 3 with something like

tense > tense: farpst pst, nearpst pst ;

will also work, the unchanged tags not needing to be explicitly mentioned.

When an attribute category is being mapped to itself, such as in the tense example above, the replacement is always performed. As a result, if a tag appears on the left side of a change and the right side of another, the results may be incorrect. For example:

tense > tense: midpst pst, pst pri;

This rule might convert <midpst> to either <pst> or <pri> in different situations.

However, when a rule maps between different categories, as in the object agreement example, the transformation will not happen invisibly. That is, if you have 1 in the output, a tense > tense conversion will happen, but a object_agr > number one won't. This is because the compiler does not have enough information to know what attributes that node has which can be clipped and thus does not know what it is converting from.

In order for this to be fully automatic, the number element in the relevant output pattern would have to compile to something which checked number and then every attribute that could map to number until it found one. While this behavior could be added if desired, I initially deemed it too complicated and simply required that in such situations the rule author has to write 1(number=1.object_agr) to trigger the object_agr > number conversion.

It is also possible to explicitly convert a value, for example when doing comparisons:

1.object_agr>number
1.object_agr>person

Like attribute category definitions, tag rewrite rules can refer to entire output categories by enclosing them in square brackets. Since the replacement must result in a single value, this can only be done on the source side.

pasts = farpst midpst nearpst;
tense = farpst midpst nearpst past pres fut;

! These are equivalent:
tense > tense : [pasts] past;
tense > tense : farpst past, midpst past, nearpst past;

Macros

The macro facility is a combination of tag order rules and output conditionals.

det_type = dem def ind;

det_dem: _.<dem>.distance;
det_def: _.definite.number;

det: (if (1.det_type = dem)
         1[det_dem]
      else
         1[def_def]
     );

Here we define a "det" pattern which will apply the "det_dem" pattern or the "det_def" pattern to its argument based on whether that argument has a <dem> tag. Since this is a tag order pattern it will be applied to all <det>s by default and can also be manually applied to other things with the 3[det] syntax.

Macros are only allowed to clip from the input node (referred to as 1), including any values passed in.

If a macro specifies a value for an attribute, it will override anything that is passed in. Thus if the above example had 1[det_dem](distance=prx) rather than 1[det_dem], invoking it as 2[det](distance=dist) and as 2[det](distance=med) would make no difference and the output would be have <prx> regardless.

A macro is not required to function as a tag order pattern and may output anything or nothing, so long as it only accesses attributes of the input node.

If you want to call a macro and what node gets passed in doesn't matter, you can use the symbol * to represent an empty node.

maybe_det: (if (1.definite = def)
               [the@det.def.sp _]
            elif (1.number = sg)
               [a@det.ind.sp _]
            else [] );
DP -> n { *[maybe_det](number=1.number, definite=$definite) 1 };

This will insert "the" if the DP is definite, "a" if it's indefinite and singular, and will output only the noun otherwise.

Since a macro needs to be contained in a conditional but not all macros are conditional, they permit the keyword always in addition to if, else, etc.

vaux: (always 1[vblex]);

This macro is essentially an alias of vblex.

Interpolation (not yet implemented)

Parsing clitics, such as User_talk:Popcorndude/Recursive_Transfer#Serbo-Croatian_clitics can be done using multiple output units

vbser n -> @n @vbser {2} {1} ;
NP -> @n @det {2 _1 1} ;
! should be able to handle "noun clitic determiner"

Outputting them, however, is more difficult. My current idea is to do something like this:

NP -> @det @n {2 _1 1};
VP -> NP @vbser {(_1 2)>1};

Where (_1 2)>1 means "put the space between the elements and element 2 after the first word of element 1". The corresponding syntax for a right-aligned clitic would be 1<(2 _1). New lexical units could also be put in the parentheses (even if there's only one thing being inserted, the parentheses should, I think, be mandatory for clarity).

I'm not sure whether this will cover all cases, but it should at least cover a lot of them.

Correspondence with t*x

number = sg pl;
gender = m f;
pre_adj = gran buen;

n: _.gender.number;
adj: _.gender;
NP: _.number;

NP -> adj n.$number ?(1.number = 2.number)
          (if (1.lem/tl incl pre_adj)
              {1(gender=2.gender) _1 2}
           else
              {2 _1 1(gender=2.gender)}
          ) ;
<transfer>
 <section-def-cats>
  <def-cat "n">
   <cat-item tags="n"/>
   <cat-item tags="n.*"/>
  </def-cat>
  <def-cat "adj">
   <cat-item tags="adj"/>
   <cat-item tags="adj.*"/>
  </def-cat>
 </section-def-cats>
 <section-def-attrs>
  <def-attr n="number">
   <attr-item tags="sg"/>
   <attr-item tags="pl"/>
  </def-attr>
  <def-attr n="gender">
   <attr-item tags="m"/>
   <attr-item tags="f"/>
  </def-attr>
 </section-def-attrs>
 <section-def-lists>
  <def-list n="pre_adj">
   <list-item v="gran"/>
   <list-item v="buen"/>
  </def-list>
 </section-def-lists>
 <section-rules>
  <rule comment="adj n">
   <pattern>
    <pattern-item n="adj"/>
    <pattern-item n="n"/>
   </pattern>
   <action>
    <choose>
     <when>
      <test>
       <not>
        <equal>
         <clip pos="1" side="tl" part="number"/>
         <clip pos="2" side="tl" part="number"/>
        </equal>
       </not>
      </test>
      <reject-current-rule/>
     </when>
    </choose>
    <choose>
     <when>
      <test>
       <in caseless="yes">
        <clip pos="1" side="tl" part="lem"/>
        <list n="pre_adj"/>
       </in>
      </test>
      <out>
       <chunk name="default">
        <tags>
         <tag><lit-tag v="NP"/></tag>
        </tags>
        <lu>
         <clip pos="1" side="tl" part="lemh"/>
         <lit-tag v="adj"/>
         <clip pos="2" side="tl" part="gender"/>
        </lu>
        
        <lu>
         <clip pos="2" side="tl" part="lemh"/>
         <lit-tag v="n"/>
         <clip pos="2" side="tl" part="gender"/>
         <clip pos="2" side="tl" part="number"/>
        </lu>
       </chunk>
      </out>
     </when>
     <otherwise>
      <out>
       <chunk name="default">
        <tags>
         <tag><lit-tag v="NP"/></tag>
        </tags>
        <lu>
         <clip pos="1" side="tl" part="lemh"/>
         <lit-tag v="adj"/>
         <clip pos="2" side="tl" part="gender"/>
         <clip pos="1" side="tl" part="lemq"/>
        </lu>
        
        <lu>
         <clip pos="2" side="tl" part="lemh"/>
         <lit-tag v="n"/>
         <clip pos="2" side="tl" part="gender"/>
         <clip pos="2" side="tl" part="number"/>
         <clip pos="2" side="tl" part="lemq"/>
        </lu>
       </chunk>
      </out>
     </otherwise>
    </choose>
   </action>
  </rule>
 </section-rules>
</transfer>
number = sg pl;
   <def-attr n="number">
    <attr-item tags="sg"/>
    <attr-item tags="pl"/>
   </def-attr>
   <def-list n="number">
    <list-item v="sg"/>
    <list-item v="pl"/>
   </def-list>
It isn't shown in the above example, but each list simultaneously defines an attribute category and a list.
n: _.gender.number;
(no direct equivalent)
NP -> 
<tags>
 <tag><lit-tag v="NP"/></tag>
 ...
</tags>
The further contents of <tags> is determined by NP: _.number;, which indicates that those contents will be a number tag, probably clipped from one of the inputs.
n
<def-cat n="some_unique_name">
 <cat-item tags="n"/>
 <cat-item tags="n.*"/>
</def-cat>
...
<pattern-item n="some_unique_name"/>
.$number
<clip pos="2" side="tl" part="number"/>
This determines the contents of <tags> in the output chunk.
?
<choose>
 <when>
  <test>
   <not>
    ...
   </not>
  </test>
  <reject-current-rule/>
 </when>
</choose>
There is no functionality equivalent to <reject-current-rule shifting="yes"/>.
1.number
<clip pos="1" part="number"/>
In the example rule, the clips are written as being side="tl", but an unspecified clip will actually check all three sides (target, then reference, then source) until it finds a value.
(if (...)
    ...
 else
    ...
)
<choose>
 <when>
  <test>
   ...
  </test>
  <out>
   ...
  </out>
 </when>
 <otherwise>
  <out>
   ...
  </out>
 </otherwise>
</choose>
(... incl ...)
<in caseless="yes">
 ...
 <list n="..."/>
</in>
_1
<b pos="1"/>
1(gender=2.gender)
<lu>
 <clip pos="1" side="tl" part="lemh"/>
 <lit-tag v="adj"/>
 <clip pos="2" side="tl" part="gender"/>
 <clip pos="1" side="tl" part="lemq"/>
</lu>
<lit-tag v="adj"/> should actually be <clip pos="1" side="tl" part="pos_tag"/> where pos_tag is a special attribute that returns whatever the first tag is.
{ ... }
<chunk name="default">
 ...
</chunk>
It is possible to make the name be something other than default, for example with n.$lem/sl in the pattern.

Technically this would compile to a rule which output an NP chunk containing the input unchanged and also a separate postchunk rule that would do the actual rearranging so that the conditionals can depend on changed values of the chunk tags.

Example

Initial Sentence

In a hole in the ground there lived a Hobbit.

Output of eng-spa-lex

^In<pr>/En<pr>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^hole<n><sg>/agujero<n><m><sg>$ ^in<pr>/en<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^ground<n><sg>/tierra<n><f><sg>$ ^there<adv>/allí<adv>$ ^live<vblex><past>/vivir<vblex><past>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^Hobbit<n><sg>/Hobbit<n><m><sg>$^.<sent>/.<sent>$^.<sent>/.<sent>$

A Simple Set of Rules

gender = m f;
number = (ND sg) sg pl ND;
definite = def ind;
tense = past pres ifi;
person = (PD p3) p1 p2 p3 PD;

tense > tense : past ifi;

n: _.gender.number;
det: _.definite.gender.number;
pr: _;
vblex: _.tense.person.number;
adv: _;

NP: _.gender.number;
DP: _.gender.number;
PP: _;
VP: _.tense.person.number;

NP -> %n { 1 } |
      10: %n PP { 1 _1 2 } ;

PP -> pr DP { 1 _1 2 } ;

DP -> det %NP { 1(gender=2.gender, number=2.number) _1 2 } ;

VP -> %vblex DP { 1 _1 2 } |
      adv %VP { 1 _1 2 } |
      PP %VP { 1 _1 2 } ;

Process

Action Result Comments
Read token
  1. ^In<pr>/En<pr>$
Read token
  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
Read token
  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^hole<n><sg>/agujero<n><m><sg>$
Split
  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^hole<n><sg>/agujero<n><m><sg>$

  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^hole<n><sg>/agujero<n><m><sg>$
Rule 1 (NP -> n) could apply, but it's possible that reading more of the input would make it so rule 2 (NP -> n PP) could apply, so we do both.
Apply rule 1 (NP -> n)
  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$

  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^hole<n><sg>/agujero<n><m><sg>$
Since the rule says %n, the required NP tags (gender and number) are filled in with the values of the noun tags.
Apply rule 4 (DP -> det NP)
  1. ^In<pr>/En<pr>$
  2. ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$

  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^hole<n><sg>/agujero<n><m><sg>$
Note that the determiner still has GD as it's gender. Child tags are not modified until the output step.
Apply rule 3 (PP -> pr DP)
  1. ^unknown<PP>{^In<pr>/En<pr>$ ^unknown<DP><m><sg>{^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^unknown<NP><m><sg>{^hole<n><sg>/agujero<n><m><sg>$}$}$}$

  1. ^In<pr>/En<pr>$
  2. ^a<det><ind><sg>/uno<det><ind><GD><sg>$
  3. ^hole<n><sg>/agujero<n><m><sg>$