Difference between revisions of "Limited rule-based lexical selection"
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language it is possible to do '''limited rule-based lexical selection'''. |
While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language or a constraint grammar it is possible to do '''limited rule-based lexical selection'''. |
||
The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as ''crude'' semantic annotation. |
The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as ''crude'' semantic annotation. |
Revision as of 09:30, 21 January 2009
While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language or a constraint grammar it is possible to do limited rule-based lexical selection.
The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as crude semantic annotation.
Example
In this case, we are translating from Spanish to English and have a word which we want to translate in two ways depending on context. If it is followed by the name of a person, or family member, we want to translate "recoger" as "pick up", but the default translation should be "collect". It isn't wrong to say, for example "I'm going to collect Alan from the airport", but "I'm going to pick up Alan from the airport" sounds more fluent.
- Voy a recoger Alan del aeropuerto.
- "I'm going to pick up Alan from the airport"
- Voy a recoger mi coche del aeropuerto.
- "I'm going to collect my car from the airport"
(Note: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)
Substitution
Apertium transfer
So we have a normal t1x
file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.
<def-cat n="nom-human"> <cat-item tags="np.ant.*"/> <cat-item lemma="madre" tags="n.*"/> <cat-item lemma="padre" tags="n.*"/> <cat-item lemma="hermano" tags="n.*"/> </def-cat> <def-cat n="nom"> <cat-item tags="n.*"/> <cat-item tags="np.*"/> </def-cat> <def-cat n="verb"> <cat-item tags="vblex.*"/> </def-cat>
We then make a rule which matches verb
followed by nom-human
and adds a tag (S1
-- which could be anything):
<rule comment="REGLA: VERB NOM-HUMAN"> <pattern> <pattern-item n="verb"/> <pattern-item n="nom-human"/> </pattern> <action> <choose> <when> <test> <equal caseless="yes"> <clip pos="1" side="tl" part="lem"/> <lit v="recoger"/> </equal> </test> <out> <lu> <clip pos="1" side="tl" part="lemh"/> <clip pos="1" side="tl" part="a_verb"/> <lit-tag v="S1"/> <clip pos="1" side="tl" part="temps"/> <clip pos="1" side="tl" part="pers"/> <clip pos="1" side="tl" part="nbr"/> <clip pos="1" side="tl" part="lemq"/> </lu> <b pos="1"/> <lu> <clip pos="2" side="tl" part="whole"/> </lu> </out> </when> <otherwise> <out> <lu> <clip pos="2" side="tl" part="whole"/> </lu> <b pos="1"/> <lu> <clip pos="2" side="tl" part="whole"/> </lu> </out> </otherwise> </choose> </action> </rule>
Here is the output with the two phrases:
$ echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en-tagger | apertium-pretransfer | \ apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin ^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><S1><inf>$ ^Alan<np><ant><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^aeropuerto<n><m><sg>$^.<sent>$
$ echo "Voy a recoger mi coche del aeropuerto" | apertium -d . es-en-tagger | apertium-pretransfer | \ apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin ^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><inf>$ ^mío<det><pos><mf><sg>$ ^coche<n><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^aeropuerto<n><m><sg>$^.<sent>$
So the next thing to do is make bidix entries,
<e><p><l>collect<s n="vblex"/></l><r>recoger<s n="vblex"/></r></p></e> <e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>
We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):
$ echo "Voy a recoger mi coche del aeropuerto" | apertium -d . es-en+lexfer I go to collect my car of the airport. echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en+lexfer I go to pick up Alan of the airport.
Constraint grammar
The same effect could be done with a constraint grammar.
LIST Human = "madre" "padre" "hijo" "hermano" (np ant); LIST Verb = (vblex); LIST Prep = (pr); SECTION # Select S1 ("pick up") for "recoger" if a human noun is found between the verb and the next preposition. SUBSTITUTE ("recoger" vblex) ("recoger" vblex S1) Verb (1* Human BARRIER Prep);
Giving:
$ echo "Voy a recoger Alan del aeropuerto" | lt-proc es-en.automorf.bin | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin ^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^Alan/Alan<np><ant><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$
$ echo "Voy a recoger mi coche del aeropuerto" | lt-proc es-en.automorf.bin | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin ^Voy/Ir<vblex><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><inf>$ ^mi/mío<det><pos><mf><sg>$ ^coche/coche<n><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$
The benefit of the constraint grammar over the apertium-transfer
approach is that you don't need to have a separate rule for each pattern, e.g. the above would even work with "mi madre":
$ echo "Voy a recoger mi madre del aeropuerto" | lt-proc es-en.automorf.bin | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin ^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^mi/mío<det><pos><mf><sg>$ ^madre/madre<n><f><sg>$ ^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$