Difference between revisions of "Limited rule-based lexical selection"

Latest revision as of 21:07, 1 December 2013

This module deals with lexical selection, for more information on the topic, see the main page.

This discussion page is deprecated as the functionality now exists.

Example[edit]

In this case, we are translating from Spanish to English and have a word which we want to translate in two ways depending on context. If it is followed by the name of a person, or family member, we want to translate "recoger" as "pick up", but the default translation should be "collect". It isn't wrong to say, for example "I'm going to collect Alan from the airport", but "I'm going to pick up Alan from the airport" sounds more fluent.

Voy a recoger Alan del aeropuerto.

"I'm going to pick up Alan from the airport"

Voy a recoger mi coche del aeropuerto.

"I'm going to collect my car from the airport"

(Note: Actually, I think that it should be "recoger a Alan" and not "recoger Alan", because often there is a preposition when using verbs with people -- nevertheless, this does not detract from the principle)

Substitution[edit]

We have two alternative techniques, one using only the Apertium modules, and one using Constraint Grammar.

Apertium transfer[edit]

So we have a normal t1x file, and specify the following categories. This is just one way to do it, it could also be done with a list, for example.

    <def-cat n="nom-human">
      <cat-item tags="np.ant.*"/>

      <cat-item lemma="madre" tags="n.*"/>
      <cat-item lemma="padre" tags="n.*"/>
      <cat-item lemma="hermano" tags="n.*"/>
    </def-cat>
    <def-cat n="nom">
      <cat-item tags="n.*"/>
      <cat-item tags="np.*"/>
    </def-cat>
    <def-cat n="verb">
      <cat-item tags="vblex.*"/>
    </def-cat>

We then make a rule which matches verb followed by nom-human and adds a tag (S1 -- which could be anything):

    <rule comment="REGLA: VERB NOM-HUMAN">
      <pattern>
        <pattern-item n="verb"/>
        <pattern-item n="nom-human"/>
      </pattern>
      <action>
        <choose>
          <when>
            <test>
              <equal caseless="yes">
                <clip pos="1" side="tl" part="lem"/>
                <lit v="recoger"/>
              </equal>
            </test>
            <out>
              <lu>
                <clip pos="1" side="tl" part="lemh"/>
                <clip pos="1" side="tl" part="a_verb"/>
                <lit-tag v="S1"/>
                <clip pos="1" side="tl" part="temps"/>
                <clip pos="1" side="tl" part="pers"/>
                <clip pos="1" side="tl" part="nbr"/>
                <clip pos="1" side="tl" part="lemq"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </when>
          <otherwise>
            <out>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
              <b pos="1"/>
              <lu>
                <clip pos="2" side="tl" part="whole"/>
              </lu>
            </out>
          </otherwise>
        </choose>
      </action>
    </rule>

Here is the output with the two phrases:

$ echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><S1><inf>$ ^Alan<np><ant><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ ^aeropuerto<n><m><sg>$^.<sent>$

$ echo "Voy a recoger mi coche del aeropuerto"  | apertium -d . es-en-tagger | apertium-pretransfer | \
 apertium-transfer -n apertium-en-es.es-en.lexfer.t1x es-en.lexfer.t1x.bin
^Ir<vblex><pri><p1><sg>$ ^a<pr>$ ^recoger<vblex><inf>$ ^mío<det><pos><mf><sg>$ ^coche<n><m><sg>$ ^de<pr>$ ^el<det><def><m><sg>$ 
^aeropuerto<n><m><sg>$^.<sent>$

So the next thing to do is make bidix entries,

    <e><p><l>collect<s n="vblex"/></l><r>recoger<s n="vblex"/></r></p></e>
    <e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>

Note: You will need to add an <sdef n="S1"/> to the <sdefs> section at the top of the file.

We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):

$ echo "Voy a recoger mi coche del aeropuerto" | apertium -d . es-en+lexfer
I go to collect my car of the airport.

echo "Voy a recoger Alan del aeropuerto" | apertium -d . es-en+lexfer
I go to pick up Alan of the airport.

Constraint grammar[edit]

The same effect can be done with a constraint grammar.

LIST Human = "madre" "padre" "hijo" "hermano" (np ant);
LIST Verb = (vblex);
LIST Prep = (pr);

SECTION

# Select S1 ("pick up") for "recoger" if a human noun is found between the verb and the next preposition.
SUBSTITUTE ("recoger" vblex) ("recoger" vblex S1) Verb (1* Human BARRIER Prep);

Giving:

$ echo "Voy a recoger Alan del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^Alan/Alan<np><ant><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$

$ echo "Voy a recoger mi coche del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><inf>$ ^mi/mío<det><pos><mf><sg>$ ^coche/coche<n><m><sg>$ ^del/de<pr>+el<det><def><m><sg>$ 
^aeropuerto/aeropuerto<n><m><sg>$

The benefit of the constraint grammar over the apertium-transfer approach is that you don't need to have a separate rule for each pattern, e.g. the above would even work with "mi madre":

$ echo "Voy a recoger mi madre del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^mi/mío<det><pos><mf><sg>$ ^madre/madre<n><f><sg>$ 
^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$

This technique is used in apertium-sme-nob and apertium-is-en, see files ending in .lex

External links[edit]

apertium-stuff: Losing sense information from surface form

@@ Line 1: / Line 1: @@
+:''This module deals with [[lexical selection]], for more information on the topic, see the [[lexical selection|main page]].''
-While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language it is possible to do '''limited rule-based lexical selection'''.
+{{deprecated}}
+{{TOCD}}
+While Apertium still needs a separate module for lexical selection that works on statistics and rules, using the Apertium transfer language or a constraint grammar it is possible to do '''limited rule-based lexical selection'''.
 The general idea is that you use a transfer before transfer to add tags after the part-of-speech to differentiate between different translation senses. Think of it as ''crude'' semantic annotation.
@@ Line 16: / Line 19: @@
 ==Substitution==
+We have two alternative techniques, one using only the Apertium modules, and one using Constraint Grammar.
 ===Apertium transfer===
@@ Line 38: / Line 42: @@
 </pre>
-We then make a rule which matches <code>verb</code> followed by <code>nom-human</code>:
+We then make a rule which matches <code>verb</code> followed by <code>nom-human</code> and adds a tag (<code>S1</code> -- which could be anything):
 <pre>
@@ Line 107: / Line 111: @@
     <e r="RL"><p><l>pick<g><b/>up</g><s n="vblex"/></l><r>recoger<s n="vblex"/><s n="S1"/></r></p></e>
 </pre>
+Note: You will need to add an <code><sdef n="S1"/></code> to the <code><sdefs></code> section at the top of the file.
 We can then recompile, and we get two different translations (admittedly not perfect, but it illustrates the idea):
@@ Line 120: / Line 126: @@
 ===Constraint grammar===
-The same effect could be done with a [[constraint grammar]].
+The same effect can be done with a [[constraint grammar]].
 <pre>
 LIST Human = "madre" "padre" "hijo" "hermano" (np ant);
 LIST Verb = (vblex);
+LIST Prep = (pr);
 SECTION
+# Select S1 ("pick up") for "recoger" if a human noun is found between the verb and the next preposition.
-SUBSTITUTE (vblex) (vblex S1) Verb (1* Human);
+SUBSTITUTE ("recoger" vblex) ("recoger" vblex S1) Verb (1* Human BARRIER Prep);
 </pre>
@@ Line 144: / Line 152: @@
 ^aeropuerto/aeropuerto<n><m><sg>$
 </pre>
+The benefit of the constraint grammar over the <code>apertium-transfer</code> approach is that you don't need to have a separate rule for each pattern, e.g. the above would even work with "mi madre":
+<pre>
+$ echo "Voy a recoger mi madre del aeropuerto" | lt-proc es-en.automorf.bin  | apertium-tagger -p -g es-en.prob | cg-proc es-en.lexfer.rlx.bin
+^Voy/Ir<vblex><S1><pri><p1><sg>$ ^a/a<pr>$ ^recoger/recoger<vblex><S1><inf>$ ^mi/mío<det><pos><mf><sg>$ ^madre/madre<n><f><sg>$
+^del/de<pr>+el<det><def><m><sg>$ ^aeropuerto/aeropuerto<n><m><sg>$
+</pre>
+This technique is used in [[apertium-sme-nob]] and [[apertium-is-en]], see files ending in <code>.lex</code>
 ==See also==
@@ Line 149: / Line 167: @@
 * [[Lexical selection]]
+==External links==
+* [https://sourceforge.net/mailarchive/forum.php?thread_name=1221565388.3537.24.camel%40eki.prompsit.com&forum_name=apertium-stuff apertium-stuff: Losing sense information from surface form]
 [[Category:Development]]
+[[Category:Lexical selection]]
+[[Category:Documentation in English]]

Difference between revisions of "Limited rule-based lexical selection"

Latest revision as of 21:07, 1 December 2013

Contents

Example[edit]

Substitution[edit]

Apertium transfer[edit]

Constraint grammar[edit]

See also[edit]

External links[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools