Constraint-based lexical selection module
Revision as of 09:52, 5 October 2011 by Francis Tyers (talk | contribs) (→Rule application process)
Lexical transfer
This is the output of lt-proc -b on an ambiguous bilingual dictionary.
[74306] ^El<det><def><f><sg>/The<det><def><f><sg>$ ^estació<n><f><sg>/season<n><sg>/station<n><sg>$ ^més<preadv>/more<preadv>$ ^plujós<adj><f><sg>/rainy<adj><sint><f><sg>$ ^ser<vbser><pri><p3><sg>/be<vbser><pri><p3><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ ^tardor<n><f><sg>/autumn<n><sg>/fall<n><sg>$^,<cm>/,<cm>$ ^i<cnjcoo>/and<cnjcoo>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ ^més<preadv>/more<preadv>$ ^sec<adj><f><sg>/dry<adj><sint><f><sg>$ ^el<det><def><m><sg>/the<det><def><m><sg>$ ^estiu<n><m><sg>/summer<n><sg>$^.<sent>/.<sent>$
The module requires VM for transfer, or another apertium transfer implementation without lexical transfer in order to work.
Rule format
A rule is made up of:
- An action (select, remove)
- A "centre" (the source language token that will be treated)
- A target language pattern on which the action takes place
- A source language context
Text
s ("estació" n) ("season" n) (1 "plujós")
s ("estació" n) ("season" n) (2 "plujós")
s ("estació" n) ("season" n) (1 "de") (3 "any")
s ("estació" n) ("station" n) (1 "de") (3 "Línia")
s ("prova" n) ("evidence" n) (1 "arqueològic")
s ("prova" n) ("test" n) (1 "estadístic")
s ("prova" n) ("event" n) (-3 "guanyador") (-2 "de")
s ("prova" n) ("testing" n) (-2 "tècnica") (-1 "de")
s ("joc" n) ("game" n) (1 "olímpic")
s ("joc" n) ("set" n) (1 "de") (2 "caràcter")
r ("pista" n) ("hint" n) (1 "més") (2 "llarg")
r ("pista" n) ("clue" n) (1 "més") (2 "llarg")
r ("motiu" n) ("motif" n) (-1 "aquest") (-2 "per")
s ("carn" n) ("flesh" n) (1 "i") (2 "os")
s ("sobre" pr) ("over" n) (-1 "victòria")
s ("dona" n) ("wife" n) (-1 "*" det pos)
s ("dona" n) ("wife" n) (-1 "el") (1 "de")
s ("dona" n) ("woman" n) (1 "de") (2 "*" det pos) (3 "somni")
r ("patró n) ("pattern" n) (1 "*" np ant)
Usage
$ cat /tmp/test | python apertium-lex-rules.py rules.txt 2>/dev/null ^El<det><def><f><sg>/The<det><def><f><sg>$ ^estació<n><f><sg>/season<n><sg>$ ^més<preadv>/more<preadv>$ ^plujós<adj><f><sg>/rainy<adj><sint><f><sg>$ ^ser<vbser><pri><p3><sg>/be<vbser><pri><p3><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ ^tardor<n><f><sg>/autumn<n><sg>/fall<n><sg>$^,<cm>/,<cm>$ ^i<cnjcoo>/and<cnjcoo>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ ^més<preadv>/more<preadv>$ ^sec<adj><f><sg>/dry<adj><sint><f><sg>$ ^el<det><def><m><sg>/the<det><def><m><sg>$ ^estiu<n><m><sg>/summer<n><sg>$ ^.<sent>/.<sent>$
- With rules
$ cat /tmp/test | python apertium-lex-rules.py rules.txt | apertium-vm -c ca-en.t1x.vmb | apertium-vm -c ca-en.t2x.vmb |\ apertium-vm -c ca-en.t3x.vmb | lt-proc -g ca-en.autogen.bin The rainiest season is the autumn, and the driest the summer.
- With bilingual dictionary defaults
$ cat /tmp/test | apertium-lex-defaults ca-en.autoldx.bin | apertium-vm -c ca-en.t1x.vmb | apertium-vm -c ca-en.t2x.vmb |\ apertium-vm -c ca-en.t3x.vmb | lt-proc -g ca-en.autogen.bin The rainiest station is the autumn, and the driest the summer.
XML
Rule application process
The following is an inefficient implementation of the rule application process:
# s ("prova" n) ("event" n) (-3 "guanyador") (-2 "de")
#
# tipus = "select";
# centre = "^prova<n>.*"
# tl_patro = ["^event<n>.*"]
# sl_patro = {-3: "^guanyador<", -2: "^de<"}
CLASS Rule:
tipus = enum('select', 'remove')
centre = '';
tl_patro = [];
sl_patro = {};
rule_table = {}; # e.g. rule_table["estació"] = [rule1, rule2, rule3];
i = 0
DEFINE ApplyRule(rule, lu):
FOREACH target IN lu.tl:
SWITCH rule.tipus:
'select':
IF target NOT IN rule.tl_patro:
DELETE target
'remove':
IF target IN rule.tl_patro:
DELETE target
FOREACH pair(sl, tl) IN sentence:
FOREACH centre IN rule_table:
IF centre IN sl:
FOREACH rule IN rule_table[centre]:
matched = False
FOREACH context_item IN rule_table[centre][rule]:
IF context_item in sentence:
matched = True
ELSE:
matched = False
# If all of the context items have matched, and none of them have not matched
# if a rule matches break and continue to the pair.
IF matched == True:
sentence[i] = ApplyRule(rule_table[centre][rule], sentence[i])
break
i = i + 1
A more efficient one would match LRLM based on the SL contexts.