Difference between revisions of "Constraint-based lexical selection module"

From Apertium
Jump to navigation Jump to search
Line 43: Line 43:
 
s ("carn" n) ("flesh" n) (1 "i") (2 "os")
 
s ("carn" n) ("flesh" n) (1 "i") (2 "os")
 
s ("sobre" pr) ("over" n) (-1 "victòria")
 
s ("sobre" pr) ("over" n) (-1 "victòria")
  +
s ("dona" n) ("wife" n) (-1 "*" det pos)
  +
s ("dona" n) ("wife" n) (-1 "el") (1 "de")
  +
s ("dona" n) ("woman" n) (1 "de") (2 "*" det pos) (3 "somni")
  +
r ("patró n) ("pattern" n) (1 "*" np ant)
 
</pre>
 
</pre>
   

Revision as of 11:16, 3 October 2011

Lexical transfer

This is the output of lt-proc -b on an ambiguous bilingual dictionary.

[74306] ^El<det><def><f><sg>/The<det><def><f><sg>$ 
^estació<n><f><sg>/season<n><sg>/station<n><sg>$ ^més<preadv>/more<preadv>$ ^plujós<adj><f><sg>/rainy<adj><sint><f><sg>$ 
^ser<vbser><pri><p3><sg>/be<vbser><pri><p3><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^tardor<n><f><sg>/autumn<n><sg>/fall<n><sg>$^,<cm>/,<cm>$ ^i<cnjcoo>/and<cnjcoo>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^més<preadv>/more<preadv>$ ^sec<adj><f><sg>/dry<adj><sint><f><sg>$ ^el<det><def><m><sg>/the<det><def><m><sg>$ 
^estiu<n><m><sg>/summer<n><sg>$^.<sent>/.<sent>$

The module requires VM for transfer, or another apertium transfer implementation without lexical transfer in order to work.

Rule format

A rule is made up of:

  • An action (select, remove)
  • A "centre" (the source language token that will be treated)
  • A target language pattern on which the action takes place
  • A source language context

Text

s	("estació" n)	("season" n)	(1 "plujós")
s	("estació" n)	("season" n)	(2 "plujós")
s	("estació" n)	("season" n)	(1 "de") (3 "any")
s	("estació" n)	("station" n)	(1 "de") (3 "Línia")
s	("prova" n)	("evidence" n)	(1 "arqueològic")
s	("prova" n)	("test" n)	(1 "estadístic")
s	("prova" n)	("event" n)	(-3 "guanyador") (-2 "de") 
s	("prova" n)	("testing" n)	(-2 "tècnica") (-1 "de") 
s	("joc" n)	("game" n)	(1 "olímpic")
s	("joc" n)	("set" n)	(1 "de") (2 "caràcter")
r	("pista" n)	("hint" n)	(1 "més") (2 "llarg")
r	("pista" n)	("clue" n)	(1 "més") (2 "llarg")
r	("motiu" n)	("motif" n)	(-1 "aquest") (-2 "per")
s	("carn" n)	("flesh" n)	(1 "i") (2 "os")
s	("sobre" pr)	("over" n)	(-1 "victòria")
s       ("dona" n)      ("wife" n)      (-1 "*" det pos)
s       ("dona" n)      ("wife" n)      (-1 "el") (1 "de")
s       ("dona" n)      ("woman" n)     (1 "de") (2 "*" det pos) (3 "somni")
r       ("patró n)      ("pattern" n)   (1 "*" np ant)

Usage

$ cat /tmp/test | python apertium-lex-rules.py rules.txt 2>/dev/null
^El<det><def><f><sg>/The<det><def><f><sg>$ 
^estació<n><f><sg>/season<n><sg>$ ^més<preadv>/more<preadv>$ ^plujós<adj><f><sg>/rainy<adj><sint><f><sg>$ 
^ser<vbser><pri><p3><sg>/be<vbser><pri><p3><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^tardor<n><f><sg>/autumn<n><sg>/fall<n><sg>$^,<cm>/,<cm>$ ^i<cnjcoo>/and<cnjcoo>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^més<preadv>/more<preadv>$ ^sec<adj><f><sg>/dry<adj><sint><f><sg>$ ^el<det><def><m><sg>/the<det><def><m><sg>$ 
^estiu<n><m><sg>/summer<n><sg>$ ^.<sent>/.<sent>$ 
With rules
$ cat /tmp/test | python apertium-lex-rules.py rules.txt | apertium-vm -c ca-en.t1x.vmb | apertium-vm -c ca-en.t2x.vmb |\
   apertium-vm -c ca-en.t3x.vmb | lt-proc -g ca-en.autogen.bin

The 
rainiest season 
is the 
autumn, and the 
driest the 
summer. 
With bilingual dictionary defaults
$ cat /tmp/test | apertium-lex-defaults ca-en.autoldx.bin | apertium-vm -c ca-en.t1x.vmb | apertium-vm -c ca-en.t2x.vmb |\
   apertium-vm -c ca-en.t3x.vmb | lt-proc -g ca-en.autogen.bin

The 
rainiest station 
is the 
autumn, and the 
driest the 
summer.

XML

Rule application process

The following is an inefficient implementation of the rule application process:


rule_table 
i = 0

FOREACH pair(sl, tl) IN sentence:
   
    FOREACH centre IN rule_table: 

        IF centre IN sl: 

            FOREACH rule IN rule_table[centre]: 

                matched = False   

                FOREACH context_item IN rule_table[centre][rule]: 

                  IF context_item in sentence: 
                      matched = True
                  ELSE:
                      matched = False
                
                # If all of the context items have matched, and none of them have not matched
                IF matched == True:
 
                      sentence[i] = ApplyRule(rule_table[centre][rule], sentence[i])

    i = i + 1

A more efficient one would match LRLM based on the SL contexts.

Writing and generating rules

See also