Limited rule-based lexical selection

From Apertium
Jump to navigation Jump to search
This module deals with lexical selection, for more information on the topic, see the main page.

  This discussion page is deprecated as the functionality now exists.

While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language or a constraint grammar it is possible to do limited rule-based lexical selection.

The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as crude semantic annotation.

Example

In this case, we are translating from Spanish to English and have a word which we want to translate in two ways depending on context. If it is followed by the name of a person, or family member, we want to translate "recoger" as "pick up", but the default translation should be "collect". It isn't wrong to say, for example "I'm going to collect Alan from the airport", but "I'm going to pick up Alan from the airport" sounds more fluent.

Voy a recoger Alan del aeropuerto.
"I'm going to pick up Alan from the airport"
Voy a recoger mi coche del aeropuerto.
"I'm going to collect my car from the airport"

(Note: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)

Substitution

We have two alternative techniques, one using only the Apertium modules, and one using Constraint Grammar.

Apertium transfer

So we have a normal t1x file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.

    <def-cat n="nom-human">
      <cat-item tags="np.ant.*"/>

      <cat-item lemma="madre" tags="n.*"/>
      <cat-item lemma="padre" tags="n.*"/>
      <cat-item lemma="hermano" tags="n.*"/>
    </def-cat>
    <def-cat n="nom">
      <cat-item tags="n.*"/>
      <cat-item tags="np.*"/>
    </def-cat>
    <def-cat n="verb">
      <cat-item tags="vblex.*"/>
    </def-cat>

We then make a rule which matches verb followed by nom-human and adds a tag (S1 -- which could be anything):

    <rule comment="REGLA: VERB NOM-HUMAN">
      <pattern>
        <pattern-item n="verb"/>
        <pattern-item n="nom-human"/>
      </pattern>
      <action>
        <choose>
          <when>
            <test>
              <equal caseless="yes">
                <clip pos="1" side="tl" part="lem"/>
                <lit v="recoger"/>
              </equal>
            </test>
            <out>
              <lu>
                <clip pos="1" side="tl" part="lemh"/>
                <clip pos="1" side="tl" part="a_verb"/>
                <lit-tag v="S1"/>
                <clip pos="1" side="tl" part="temps"/>
                <clip pos="1" side="tl" part="pers"/>
                <clip pos="1" side="tl" part="nbr"/>
                <clip pos="1" side="tl" part="lemq"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </when>
          <otherwise>
            <out>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </otherwise>
        </choose>
      </action>
    </rule>

Here is the output with the two phrases:

$ echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><S1><inf>$ ^Alan<np><ant><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^aeropuerto<n><m><sg>$^.<sent>$
$ echo "Voy a recoger mi coche del aeropuerto"  | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><inf>$ ^mío<det><pos><mf><sg>$ ^coche<n><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ 
^aeropuerto<n><m><sg>$^.<sent>$

So the next thing to do is make bidix entries,

    <e><p><l>collect<s n="vblex"/></l><r>recoger<s n="vblex"/></r></p></e>
    <e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>

Note: You will need to add an <sdef n="S1"/> to the <sdefs> section at the top of the file.

We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):

$ echo "Voy a recoger mi coche del aeropuerto" | apertium -d . es-en+lexfer
I go to collect my car of the airport.

echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en+lexfer
I go to pick up Alan of the airport.

Constraint grammar

The same effect can be done with a constraint grammar.

LIST Human = "madre" "padre" "hijo" "hermano" (np ant);
LIST Verb = (vblex);
LIST Prep = (pr);

SECTION

# Select S1 ("pick up") for "recoger" if a human noun is found between the verb and the next preposition.
SUBSTITUTE ("recoger" vblex) ("recoger" vblex S1) Verb (1* Human BARRIER Prep);

Giving:

$ echo "Voy a recoger Alan del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^Alan/Alan<np><ant><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$
$ echo "Voy a recoger mi coche del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><inf>$ ^mi/mío<det><pos><mf><sg>$ ^coche/coche<n><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$

The benefit of the constraint grammar over the apertium-transfer approach is that you don't need to have a separate rule for each pattern, e.g. the above would even work with "mi madre":

$ echo "Voy a recoger mi madre del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^mi/mío<det><pos><mf><sg>$ ^madre/madre<n><f><sg>$ 
^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$

This technique is used in apertium-sme-nob and apertium-is-en, see files ending in .lex

See also

External links