Constraint-based lexical selection module

Lexical transfer

This is the output of lt-proc -b on an ambiguous bilingual dictionary.

[74306] ^El<det><def><f><sg>/The<det><def><f><sg>$ 
^estació<n><f><sg>/season<n><sg>/station<n><sg>$ ^més<preadv>/more<preadv>$ ^plujós<adj><f><sg>/rainy<adj><sint><f><sg>$ 
^ser<vbser><pri><p3><sg>/be<vbser><pri><p3><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^tardor<n><f><sg>/autumn<n><sg>/fall<n><sg>$^,<cm>/,<cm>$ ^i<cnjcoo>/and<cnjcoo>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^més<preadv>/more<preadv>$ ^sec<adj><f><sg>/dry<adj><sint><f><sg>$ ^el<det><def><m><sg>/the<det><def><m><sg>$ 
^estiu<n><m><sg>/summer<n><sg>$^.<sent>/.<sent>$

The module requires VM for transfer, or another apertium transfer implementation without lexical transfer in order to work.

Rule format

A rule is made up of:

An action (select, remove)
A "centre" (the source language token that will be treated)
A target language pattern on which the action takes place
A source language context

Text

s	("estació" n)	("season" n)	(1 "plujós")
s	("estació" n)	("season" n)	(2 "plujós")
s	("estació" n)	("season" n)	(1 "de") (3 "any")
s	("estació" n)	("station" n)	(1 "de") (3 "Línia")
s	("prova" n)	("evidence" n)	(1 "arqueològic")
s	("prova" n)	("test" n)	(1 "estadístic")
s	("prova" n)	("event" n)	(-3 "guanyador") (-2 "de") 
s	("prova" n)	("testing" n)	(-2 "tècnica") (-1 "de") 
s	("joc" n)	("game" n)	(1 "olímpic")
s	("joc" n)	("set" n)	(1 "de") (2 "caràcter")
r	("pista" n)	("hint" n)	(1 "més") (2 "llarg")
r	("pista" n)	("clue" n)	(1 "més") (2 "llarg")
r	("motiu" n)	("motif" n)	(-1 "aquest") (-2 "per")
s	("carn" n)	("flesh" n)	(1 "i") (2 "os")
s	("sobre" pr)	("over" n)	(-1 "victòria")
s       ("dona" n)      ("wife" n)      (-1 "*" det pos)
s       ("dona" n)      ("wife" n)      (-1 "el") (1 "de")
s       ("dona" n)      ("woman" n)     (1 "de") (2 "*" det pos) (3 "somni")
r       ("patró n)      ("pattern" n)   (1 "*" np ant)

Usage

$ cat /tmp/test | python apertium-lex-rules.py rules.txt 2>/dev/null
^El<det><def><f><sg>/The<det><def><f><sg>$ 
^estació<n><f><sg>/season<n><sg>$ ^més<preadv>/more<preadv>$ ^plujós<adj><f><sg>/rainy<adj><sint><f><sg>$ 
^ser<vbser><pri><p3><sg>/be<vbser><pri><p3><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^tardor<n><f><sg>/autumn<n><sg>/fall<n><sg>$^,<cm>/,<cm>$ ^i<cnjcoo>/and<cnjcoo>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^més<preadv>/more<preadv>$ ^sec<adj><f><sg>/dry<adj><sint><f><sg>$ ^el<det><def><m><sg>/the<det><def><m><sg>$ 
^estiu<n><m><sg>/summer<n><sg>$ ^.<sent>/.<sent>$

With rules

$ cat /tmp/test | python apertium-lex-rules.py rules.txt | apertium-vm -c ca-en.t1x.vmb | apertium-vm -c ca-en.t2x.vmb |\
   apertium-vm -c ca-en.t3x.vmb | lt-proc -g ca-en.autogen.bin

The 
rainiest season 
is the 
autumn, and the 
driest the 
summer.

With bilingual dictionary defaults

$ cat /tmp/test | apertium-lex-defaults ca-en.autoldx.bin | apertium-vm -c ca-en.t1x.vmb | apertium-vm -c ca-en.t2x.vmb |\
   apertium-vm -c ca-en.t3x.vmb | lt-proc -g ca-en.autogen.bin

The 
rainiest station 
is the 
autumn, and the 
driest the 
summer.

XML

Rule application process

The following is an inefficient implementation of the rule application process:


# s	("prova" n)	("event" n)	(-3 "guanyador") (-2 "de") 
#
# tipus = "select";
# centre = "^prova<n>.*"
# tl_patro = ["^event<n>.*"]
# sl_patro = {-3: "^guanyador<", -2: "^de<"}

CLASS Rule: 
        tipus = enum('select', 'remove')
        centre = '';
        tl_patro = [];
        sl_patro = {};


rule_table = {}; # e.g. rule_table["estació"] = [rule1, rule2, rule3];
i = 0

DEFINE ApplyRule(rule, lu): 
    

    FOREACH target IN lu.tl: 
        SWITCH rule.tipus:
            'select': 
                 IF target NOT IN rule.tl_patro:
                     DELETE target
            'remove': 
                 IF target IN rule.tl_patro:
                     DELETE target



FOREACH pair(sl, tl) IN sentence:
   
    FOREACH centre IN rule_table: 

        IF centre IN sl: 

            FOREACH rule IN rule_table[centre]: 

                matched = False   

                FOREACH context_item IN rule_table[centre][rule]: 

                  IF context_item in sentence: 
                      matched = True
                  ELSE:
                      matched = False
                
                # If all of the context items have matched, and none of them have not matched
                # if a rule matches break and continue to the pair. 
                IF matched == True:
 
                      sentence[i] = ApplyRule(rule_table[centre][rule], sentence[i])
                      break 

    i = i + 1

A more efficient one would match LRLM based on the SL contexts.

Constraint-based lexical selection module

Contents

Lexical transfer

Rule format

Text

Usage

XML

Rule application process

Writing and generating rules

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools