Difference between revisions of "Limited rule-based lexical selection"

From Apertium
Jump to navigation Jump to search
 
(18 intermediate revisions by 3 users not shown)
Line 1: Line 1:
:''This module deals with [[lexical selection]], for more information on the topic, see the [[lexical selection|main page]].''
While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language it is possible to do '''limited rule-based lexical selection'''.
{{deprecated}}
{{TOCD}}
While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language or a constraint grammar it is possible to do '''limited rule-based lexical selection'''.


The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as ''crude'' semantic annotation.
The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as ''crude'' semantic annotation.
Line 15: Line 18:
(<u>Note</u>: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)
(<u>Note</u>: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)


==Transfer==
==Substitution==
We have two alternative techniques, one using only the Apertium modules, and one using Constraint Grammar.

===Apertium transfer===


So we have a normal <code>t1x</code> file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.
So we have a normal <code>t1x</code> file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.
Line 36: Line 42:
</pre>
</pre>


We then make a rule which matches <code>verb</code> followed by <code>nom-human</code>:
We then make a rule which matches <code>verb</code> followed by <code>nom-human</code> and adds a tag (<code>S1</code> -- which could be anything):


<pre>
<pre>
<rule comment="REGLA: VERB NOM">
<rule comment="REGLA: VERB NOM-HUMAN">
<pattern>
<pattern>
<pattern-item n="verb"/>
<pattern-item n="verb"/>
Line 105: Line 111:
<e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>
<e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>
</pre>
</pre>

Note: You will need to add an <code><sdef n="S1"/></code> to the <code><sdefs></code> section at the top of the file.


We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):
We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):
Line 115: Line 123:
I go to pick up Alan of the airport.
I go to pick up Alan of the airport.
</pre>
</pre>

===Constraint grammar===

The same effect can be done with a [[constraint grammar]].

<pre>
LIST Human = "madre" "padre" "hijo" "hermano" (np ant);
LIST Verb = (vblex);
LIST Prep = (pr);

SECTION

# Select S1 ("pick up") for "recoger" if a human noun is found between the verb and the next preposition.
SUBSTITUTE ("recoger" vblex) ("recoger" vblex S1) Verb (1* Human BARRIER Prep);
</pre>

Giving:

<pre>
$ echo "Voy a recoger Alan del aeropuerto" | lt-proc es-en.automorf.bin | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^Alan/Alan<np><ant><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$
^aeropuerto/aeropuerto<n><m><sg>$
</pre>

<pre>
$ echo "Voy a recoger mi coche del aeropuerto" | lt-proc es-en.automorf.bin | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><inf>$ ^mi/mío<det><pos><mf><sg>$ ^coche/coche<n><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$
^aeropuerto/aeropuerto<n><m><sg>$
</pre>

The benefit of the constraint grammar over the <code>apertium-transfer</code> approach is that you don't need to have a separate rule for each pattern, e.g. the above would even work with "mi madre":

<pre>
$ echo "Voy a recoger mi madre del aeropuerto" | lt-proc es-en.automorf.bin | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^mi/mío<det><pos><mf><sg>$ ^madre/madre<n><f><sg>$
^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$
</pre>

This technique is used in [[apertium-sme-nob]] and [[apertium-is-en]], see files ending in <code>.lex</code>


==See also==
==See also==
Line 120: Line 167:
* [[Lexical selection]]
* [[Lexical selection]]


==External links==

* [https://sourceforge.net/mailarchive/forum.php?thread_name=1221565388.3537.24.camel%40eki.prompsit.com&forum_name=apertium-stuff apertium-stuff: Losing sense information from surface form]


[[Category:Development]]
[[Category:Development]]
[[Category:Lexical selection]]
[[Category:Documentation in English]]

Latest revision as of 21:07, 1 December 2013

This module deals with lexical selection, for more information on the topic, see the main page.

  This discussion page is deprecated as the functionality now exists.

While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language or a constraint grammar it is possible to do limited rule-based lexical selection.

The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as crude semantic annotation.

Example[edit]

In this case, we are translating from Spanish to English and have a word which we want to translate in two ways depending on context. If it is followed by the name of a person, or family member, we want to translate "recoger" as "pick up", but the default translation should be "collect". It isn't wrong to say, for example "I'm going to collect Alan from the airport", but "I'm going to pick up Alan from the airport" sounds more fluent.

Voy a recoger Alan del aeropuerto.
"I'm going to pick up Alan from the airport"
Voy a recoger mi coche del aeropuerto.
"I'm going to collect my car from the airport"

(Note: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)

Substitution[edit]

We have two alternative techniques, one using only the Apertium modules, and one using Constraint Grammar.

Apertium transfer[edit]

So we have a normal t1x file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.

    <def-cat n="nom-human">
      <cat-item tags="np.ant.*"/>

      <cat-item lemma="madre" tags="n.*"/>
      <cat-item lemma="padre" tags="n.*"/>
      <cat-item lemma="hermano" tags="n.*"/>
    </def-cat>
    <def-cat n="nom">
      <cat-item tags="n.*"/>
      <cat-item tags="np.*"/>
    </def-cat>
    <def-cat n="verb">
      <cat-item tags="vblex.*"/>
    </def-cat>

We then make a rule which matches verb followed by nom-human and adds a tag (S1 -- which could be anything):

    <rule comment="REGLA: VERB NOM-HUMAN">
      <pattern>
        <pattern-item n="verb"/>
        <pattern-item n="nom-human"/>
      </pattern>
      <action>
        <choose>
          <when>
            <test>
              <equal caseless="yes">
                <clip pos="1" side="tl" part="lem"/>
                <lit v="recoger"/>
              </equal>
            </test>
            <out>
              <lu>
                <clip pos="1" side="tl" part="lemh"/>
                <clip pos="1" side="tl" part="a_verb"/>
                <lit-tag v="S1"/>
                <clip pos="1" side="tl" part="temps"/>
                <clip pos="1" side="tl" part="pers"/>
                <clip pos="1" side="tl" part="nbr"/>
                <clip pos="1" side="tl" part="lemq"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </when>
          <otherwise>
            <out>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </otherwise>
        </choose>
      </action>
    </rule>

Here is the output with the two phrases:

$ echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><S1><inf>$ ^Alan<np><ant><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^aeropuerto<n><m><sg>$^.<sent>$
$ echo "Voy a recoger mi coche del aeropuerto"  | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><inf>$ ^mío<det><pos><mf><sg>$ ^coche<n><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ 
^aeropuerto<n><m><sg>$^.<sent>$

So the next thing to do is make bidix entries,

    <e><p><l>collect<s n="vblex"/></l><r>recoger<s n="vblex"/></r></p></e>
    <e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>

Note: You will need to add an <sdef n="S1"/> to the <sdefs> section at the top of the file.

We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):

$ echo "Voy a recoger mi coche del aeropuerto" | apertium -d . es-en+lexfer
I go to collect my car of the airport.

echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en+lexfer
I go to pick up Alan of the airport.

Constraint grammar[edit]

The same effect can be done with a constraint grammar.

LIST Human = "madre" "padre" "hijo" "hermano" (np ant);
LIST Verb = (vblex);
LIST Prep = (pr);

SECTION

# Select S1 ("pick up") for "recoger" if a human noun is found between the verb and the next preposition.
SUBSTITUTE ("recoger" vblex) ("recoger" vblex S1) Verb (1* Human BARRIER Prep);

Giving:

$ echo "Voy a recoger Alan del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^Alan/Alan<np><ant><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$
$ echo "Voy a recoger mi coche del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><inf>$ ^mi/mío<det><pos><mf><sg>$ ^coche/coche<n><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$

The benefit of the constraint grammar over the apertium-transfer approach is that you don't need to have a separate rule for each pattern, e.g. the above would even work with "mi madre":

$ echo "Voy a recoger mi madre del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^mi/mío<det><pos><mf><sg>$ ^madre/madre<n><f><sg>$ 
^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$

This technique is used in apertium-sme-nob and apertium-is-en, see files ending in .lex

See also[edit]

External links[edit]