Difference between revisions of "Apertium-recursive/Example"
Jump to navigation
Jump to search
Popcorndude (talk | contribs) (→A Simple Set of Rules: add some comments) |
Popcorndude (talk | contribs) (→A Simple Set of Rules: better explanation of weights) |
||
Line 50: | Line 50: | ||
10: %n PP { 1 _1 2 } ; |
10: %n PP { 1 _1 2 } ; |
||
! "10:" this rule has a weight of 10 |
! "10:" this rule has a weight of 10 |
||
! |
! the weight of a tree is the sum of the weights |
||
! |
! of the rules that were applied to produce it |
||
! and trees with higher weights are preferred |
|||
! In this example, this will prefer |
! In this example, this will prefer |
||
! [in a hole [in the ground]] |
! [in a hole [in the ground]] |
Revision as of 16:59, 9 January 2020
A version of this example with pictures can be found at https://www.overleaf.com/read/pkjjgzjczhzh (if that link doesn't work, the source is here).
Contents
Initial Sentence
In a hole in the ground there lived a Hobbit.
Output of eng-spa-lex
^In<pr>/En<pr>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^hole<n><sg>/agujero<n><m><sg>$ ^in<pr>/en<pr>$ ^the<det><def><sp>/el<det><def><GD><ND>$ ^ground<n><sg>/tierra<n><f><sg>$ ^there<adv>/allí<adv>$ ^live<vblex><past>/vivir<vblex><past>$ ^a<det><ind><sg>/uno<det><ind><GD><sg>$ ^Hobbit<n><sg>/Hobbit<n><m><sg>$^.<sent>/.<sent>$^.<sent>/.<sent>$
A Simple Set of Rules
!!! Attribute categories gender = m f; ! A "gender" tag is either <m> or <f> definite = def ind; tense = past pres ifi; number = (ND sg) sg pl ND; ! A "number" tag is <sg>, <pl>, or <ND> ! if a node has <ND> at output time, it will ! be replaced with <sg> person = (PD p3) p1 p2 p3 PD; !!! Tag rewrite rules tense > tense : past ifi; ! Nodes with tense tag <past> will be output ! with tense tag <ifi> !!! Output patterns n: _.gender.number; ! When outputting a noun, output the part of ! speech tag, then gender, then number det: _.definite.gender.number; pr: _; vblex: _.tense.person.number; adv: _; NP: _.gender.number; ! Nodes in the tree also need tag orders defined DP: _.gender.number; ! These tags are the primary way of passing PP: _; ! information to different parts of the tree VP: _.tense.person.number; !!! Reduction rules NP -> %n { 1 } | ! "n" this rule matches a noun ! "NP ->" this rule outputs an NP (noun phrase) ! "%" any tags that NP needs that aren't specified ! elsewhere should come from the n ! "{ 1 }" output the first item in the pattern (n) 10: %n PP { 1 _1 2 } ; ! "10:" this rule has a weight of 10 ! the weight of a tree is the sum of the weights ! of the rules that were applied to produce it ! and trees with higher weights are preferred ! In this example, this will prefer ! [in a hole [in the ground]] ! rather than ! [in a hole] [in the ground] ! which would also be a syntactically valid parse ! according to these rules PP -> pr DP { 1 _1 2 } ; ! "_1" output the non-word material following ! pattern element 1, that is, the space between ! words 1 and 2 ! You can also just write "_" for a space, but ! there might be formatting information in there ! that we don't want to lose DP -> det %NP { 1[gender=2.gender, number=2.number] _1 2 } ; ! "1[gender=2.gender, number=2.number]" ! replace whatever gender and number tags the det ! had before with the ones from the NP VP -> %vblex DP { 1[tense=$tense, person=$person, number=$number] _1 2 } | ! "tense=$tense" replace the tense tag of the verb ! with the tense tag of the VP (which may have been ! changed farther up the tree adv %VP (if (1.lem/sl = there) { %2 } else { 1 _1 %2 } ) | ! "(if ... else ...)" this rule has different ! output in different situations ! "(1.lem/sl = there)" do the first thing if the ! source language lemma is "there" ! "%2" shorthand for ! "2[tense=$tense, number=$number, ...]" ! does this for all the tags that the word and the ! parent node have in common PP %VP { 1 _1 %2 } ;
Process
Action | Result | Comments | |
---|---|---|---|
Read token |
|
||
Read token |
|
||
Read token |
|
||
Split |
|
Rule 1 (NP -> n ) could apply, but it's possible that reading more of the input would make it so rule 2 (NP -> n PP ) could apply, so we do both.
| |
Apply rule 1 (NP -> n ) in the first branch
|
|
Since the rule says %n , the required NP tags (gender and number) are filled in with the values of the noun tags.
| |
Apply rule 4 (DP -> det NP ) in the first branch
|
|
Note that the determiner still has GD as it's gender. Child tags are not modified until the output step. | |
Apply rule 3 (PP -> pr DP ) in the first branch
|
|
||
Read token |
|
||
Read token |
|
||
Read token |
|
||
Apply rule 1 (NP -> n ) in both branches
|
|
This time the next word is an adverb, rather than a preposition, so no splitting occurs and the rule is applied in each branch. | |
Apply rule 4 (DP -> det NP ) in both branches
|
|
||
Apply rule 3 (PP -> pr DP ) in both branches
|
|
||
Apply rule 2 (NP -> n PP ) in the second branch
|
Weight: 10
|
Note that rule 2 has a weight attached to it, so now the second branch is weighted. | |
Apply rule 4 (DP -> det NP ) in the second branch
|
Weight: 10
|
||
Apply rule 3 (PP -> pr DP ) in the second branch
|
Weight: 10
|
||
Read token |
Weight: 10
|
||
Read token |
Weight: 10
|
||
Read token |
Weight: 10
|
||
Read token |
Weight: 10
|
||
Apply rule 1 (NP -> n ) in both branches
|
Weight: 10
|
||
Apply rule 4 (DP -> det NP ) in both branches
|
Weight: 10
|
||
Apply rule 5 (VP -> vblex DP ) in both branches
|
Weight: 10
|
VP wants tense, person, and number tags. The verb supplies tense, but it doesn't have person or number tags, so the defaults are used instead. | |
Apply rule 6 (VP -> adv VP ) in both branches
|
Weight: 10
|
||
Apply rule 7 (VP -> PP VP ) in the first branch
|
Weight: 10
|
||
Apply rule 7 (VP -> PP VP ) in the first branch
|
Weight: 10
|
||
Apply rule 7 (VP -> PP VP ) in the second branch
|
Weight: 10
|
||
Prune branches | Weight: 10
|
No rules begin with VP, so it's time to output. Both rules have the same number of trees (1), but the second one has higher weight (10), so the first one gets discarded and we output the second one. | |
Apply output side of rule 7 (VP -> PP VP )
|
|
At output, the unspecified tags PD and ND are replaced with the defaults p3 and sg. | |
Apply output side of rule 3 (PP -> pr DP )
|
|
||
Output first word |
|
The preposition wasn't built by a rule, so we just write it to the output stream. | |
Apply output side of rule 4 (DP -> det NP )
|
|
Here the gender and the number of NP are copied to the determiner. | |
Output first word |
|
||
Apply output side of rule 2 (NP -> n PP )
|
|
||
Output first word |
|
||
Apply output side of rule 3 (PP -> pr DP )
|
|
||
Output first word |
|
||
Apply output side of rule 4 (DP -> det NP )
|
|
Once again we copy the gender and number of the NP to the determiner. | |
Output first word |
|
||
Apply output side of rule 1 (NP -> n )
|
|
||
Output first word |
|
||
Apply output side of rule 6 (VP -> adv VP )
|
|
Since the source language lemma of the adverb is "there", we take the first clause of the if statement and only output the VP, which takes all its tags from the parent chunk. | |
Apply output side of rule 5 (VP -> vblex DP )
|
|
As with the previous line, the verb gets all its tags from the parent chunk, but in this rule we've explicitly listed them. | |
Output first word |
|
||
Apply output side of rule 4 (DP -> det NP )
|
|
||
Output first word |
|
||
Apply output side of rule 1 (NP -> n )
|
|
||
Output first word | |||
Read token |
|
||
Output first word | No rules apply to punctuation in this example, so we just immediately output it when we see it. | ||
Read token |
|
||
Output first word |
Output of Transfer
^En<pr>$ ^uno<det><ind><m><sg>$ ^agujero<n><m><sg>$ ^en<pr>$ ^el<det><def><f><sg>$ ^tierra<n><f><sg>$ ^vivir<vblex><ifi><p3><sg>$ ^uno<det><ind><m><sg>$ ^Hobbit<n><m><sg>$^.<sent>$^.<sent>$
Overall Output
En un agujero en la tierra vivió un Hobbit.