Postgenerator
Sometimes you want to be able to merge two tokens in output, for example for contractions, e.g. de + el = del.
You can do this using the postgenerator.
First make sure you add the postgenerator wakeup symbol to your monolingual dictionary, e.g. apertium-aaa.aaa.dix
apertium-aaa.aaa.dix:
<pardef n="/de__pr"> <e r="LR"><p><l>de</l><r>de<s n="pr"/></r></p></e> <e r="RL"><p><l><a/>de</l><r>de<s n="pr"/></r></p></e> </pardef> ... <e lm="de"><i></i><par n="/de__pr"/></e> ...
You should get entries like:
de:>:de<pr> ~de:<:de<pr>
from lt-expand apertium-aaa.aaa.dix. apertium-aaa.post-aaa.dix:
<?xml version="1.0" encoding="UTF-8"?> <dictionary> <alphabet/> <sdefs> <sdef n="test"/> </sdefs> <section id="main" type="standard"> <e> <p><l><a/>de<b/>el</l><r>del</r></p></e> </section> </dictionary>
You can compile it like:
$ lt-comp lr apertium-aaa.post-aaa.dix aaa.autopgen.bin main@standard 7 6
And use it like:
$ echo "~de el" | lt-proc -p aaa.autopgen.bin del
In your modes file:
... <program name="lt-proc $1"> <file name="aaa-bbb.autogen.bin"/> </program> <program name="lt-proc -p"> <file name="aaa-bbb.autopgen.bin"/> </program> ...
Postgeneration Using apertium-separable
If you have at least version 0.7.0 of apertium-separable, you can accomplish the same as above using lsx-proc
.
This allows you to write postgeneration rules conditioned on lemmas and tags rather than needing multiple copies of each relevant dictionary entry.
For the de + el → del
rule above, we can write the following:
1 <?xml version="1.0" encoding="UTF-8"?>
2 <dictionary>
3 <alphabet/>
4 <sdefs>
5 <sdef n="test"/>
6 </sdefs>
7 <section id="main" type="standard">
8 <e>
9 <i>de<s n="pr"/>/d</i>
10 <p><l>e</l><r></r></p>
11 <i><d space="no"/>el<t/>/el</i>
12 </e>
13 </section>
14 </dictionary>