Limited rule-based lexical selection

From Apertium
Revision as of 09:22, 21 January 2009 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language it is possible to do limited rule-based lexical selection.

The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as crude semantic annotation.

Example

In this case, we are translating from Spanish to English and have a word which we want to translate in two ways depending on context. If it is followed by the name of a person, or family member, we want to translate "recoger" as "pick up", but the default translation should be "collect". It isn't wrong to say, for example "I'm going to collect Alan from the airport", but "I'm going to pick up Alan from the airport" sounds more fluent.

Voy a recoger Alan del aeropuerto.
"I'm going to pick up Alan from the airport"
Voy a recoger mi coche del aeropuerto.
"I'm going to collect my car from the airport"

(Note: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)

Substitution

Apertium transfer

So we have a normal t1x file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.

    <def-cat n="nom-human">
      <cat-item tags="np.ant.*"/>

      <cat-item lemma="madre" tags="n.*"/>
      <cat-item lemma="padre" tags="n.*"/>
      <cat-item lemma="hermano" tags="n.*"/>
    </def-cat>
    <def-cat n="nom">
      <cat-item tags="n.*"/>
      <cat-item tags="np.*"/>
    </def-cat>
    <def-cat n="verb">
      <cat-item tags="vblex.*"/>
    </def-cat>

We then make a rule which matches verb followed by nom-human:

    <rule comment="REGLA: VERB NOM-HUMAN">
      <pattern>
        <pattern-item n="verb"/>
        <pattern-item n="nom-human"/>
      </pattern>
      <action>
        <choose>
          <when>
            <test>
              <equal caseless="yes">
                <clip pos="1" side="tl" part="lem"/>
                <lit v="recoger"/>
              </equal>
            </test>
            <out>
              <lu>
                <clip pos="1" side="tl" part="lemh"/>
                <clip pos="1" side="tl" part="a_verb"/>
                <lit-tag v="S1"/>
                <clip pos="1" side="tl" part="temps"/>
                <clip pos="1" side="tl" part="pers"/>
                <clip pos="1" side="tl" part="nbr"/>
                <clip pos="1" side="tl" part="lemq"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </when>
          <otherwise>
            <out>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </otherwise>
        </choose>
      </action>
    </rule>

Here is the output with the two phrases:

$ echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><S1><inf>$ ^Alan<np><ant><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^aeropuerto<n><m><sg>$^.<sent>$
$ echo "Voy a recoger mi coche del aeropuerto"  | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><inf>$ ^mío<det><pos><mf><sg>$ ^coche<n><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ 
^aeropuerto<n><m><sg>$^.<sent>$

So the next thing to do is make bidix entries,

    <e><p><l>collect<s n="vblex"/></l><r>recoger<s n="vblex"/></r></p></e>
    <e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>

We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):

$ echo "Voy a recoger mi coche del aeropuerto" | apertium -d . es-en+lexfer
I go to collect my car of the airport.

echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en+lexfer
I go to pick up Alan of the airport.

Constraint grammar

The same effect could be done with a constraint grammar.

LIST Human = "madre" "padre" "hijo" "hermano" (np ant);
LIST Verb = (vblex);

SECTION

SUBSTITUTE (vblex) (vblex S1) Verb (1* Human);

Giving:

$ echo "Voy a recoger Alan del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^Alan/Alan<np><ant><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$
$ echo "Voy a recoger mi coche del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><inf>$ ^mi/mío<det><pos><mf><sg>$ ^coche/coche<n><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$

See also