Apertium separable

Installing

Prerequisites and compilation are the same as lttoolbox and apertium. See Installation. On Debian/Ubuntu derivatives, it is part of the nightly repo as apt-get install apertium-separable.

The code can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable and instructions for compiling the module are:

./autogen.sh
./configure
make
make install

You'll need an up-to-date version of lttoolbox and associated libraries, and zlib (debian: zlib1g-dev).

~~It is not currently part of distributed Apertium binaries for other distros/OSs.~~ It is now available via the nightly repositories as the apertium-separable module.

Lexical transfer in the pipeline

lsx-proc runs directly AFTER apertium-tagger and apertium-pretransfer:
(note: previously this page had said that lsx-proc runs between BETWEEN apertium-tagger and apertium-pretransfer. it has now been determined that it should run AFTER pretransfer.)

… | apertium-tagger -g en-es.prob |  apertium-pretransfer | lsx-proc en-es.autoseq.bin | …

Usage

Creating the lsx-dictionary

Make a dictionary file:

<dictionary type="separable">
    <alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
    <sdefs>
        <sdef n="adj"/>
        <sdef n="adv"/>
        <sdef n="n"/>
        <sdef n="sep"/>
        <sdef n="vblex"/>
    </sdefs>
    <pardefs>
        <pardef n="adj">
            <e><i><w/><s n="adj"/><j/></i></e>
            <e><i><w/><s n="adj"/><t/><j/></i></e>
        </pardef>
        <pardef n="n">
            <e><i><w/><s n="n"/><t/><j/></i></e>
        </pardef>
        <pardef n="SN">
            <e><par n="n"/></e>
            <e><par n="adj"/><par n="n"/></e>
            <e><par n="adj"/><par n="adj"/><par n="n"/></e>
        </pardef>
        <pardef n="freq-adv">
            <e><i>always<s n="adv"/><j/></i></e>
            <e><i>anually<s n="adv"/><j/></i></e>
            <e><i>bianually<s n="adv"/><j/></i></e>
        </pardef>
    </pardefs>
    <section id="main" type="standard">
        <e lm="be late" c="llegar tarde">
            <p><l>be<s n="vbser"/></l><r>be<g><b/>late</g><s n="vbser"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SAdv"/><p><l>late<t/><j/></l><r></r></p>
        </e>
        <e lm="take away" c="sacar, quitar">
            <p><l>take<s n="vblex"/></l><r>take<g><b/>away</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SN"/><p><l>away<t/><j/></l><r></r></p>
        </e>
    </section>
</dictionary>

Note:

<w/> stands for one or more alphabetic symbols
<t/> stands for one or more tags (multicharacter symbols).

i.e.

<e><w/><t/><j/></e> is equivalent to any-one-or-more-chars<adj><required-anytag><...optional-anytag...><$>
- ^tall<adj><sint><...>$
<e><w/><j/></e> is equivalent to any-one-or-more-chars<adj><$>
- ^tall<adj>$

A larger example dictionary can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable/examples/apertium-eng-spa.eng-spa.lsx

Compilation

Compilation into the binary format is achieved by means of the lsx-comp program.

$ lsx-comp apertium-eng-spa.eng-spa.lsx eng-spa.autoseq.bin
main@standard 61 73

Processing

Processing can be done using the lsx-proc program.

The input to lsx-proc is the output of apertium-tagger and apertium-pretransfer ,

$ echo '^take<vblex><imp>$ ^prpers<prn><obj><p3><nt><sg>$ ^out of<pr>$ ^there<adv>$^.<sent>$' | lsx-proc eng-spa.autoseq.bin
^take# out<vblex><sep><imp>$ ^prpers<prn><obj><p3><nt><sg>$ ^of<pr>$ ^there<adv>$^.<sent>$

Example usages

Example #1: A sentence in plain text,

The Aragonese took Ramiro out of a monastery and made him king.

This is the output of feeding the sentence through apertium-tagger and then apertium-pretransfer :

^the<det><def><sp>$ ^Aragonese<n><sg>$ ^take<vblex><past>$ ^Ramiro<np><ant><m><sg>$ ^out of<pr>$ ^a<det><ind><sg>$ ^monastery<n><sg>$ ^and<cnjcoo>$ ^make<vblex><pp>$ ^prpers<prn><obj><p3><m><sg>$ ^king<n><sg>$^.<sent>$

This is the output of feeding the output above through lsx-proc with apertium-eng-spa.eng-spa.lsx:

^the<det><def><sp>$ ^Aragonese<n><sg>$ ^take# out<vblex><sep><past>$ ^Ramiro<np><ant><m><sg>$ ^of<pr>$ ^a<det><ind><sg>$ ^monastery<n><sg>$ ^and<cnjcoo>$ ^make<vblex><pp>$ ^prpers<prn><obj><p3><m><sg>$ ^king<n><sg>$^.<sent>$

Naming Convention

apertium-eng-cat.eng-cat.lsx, eng-cat.autoseq.bin

Troubleshooting

Segmentation fault

Segmentation fault upon compilation or usage
The lsx-dictionary compiles fine with zero entries but gives a seg fault once entries are added

...no solution found yet
something is not updated or something in the makefile (?)

make sure that the makefile ...

Complaints about step_override()

svn update in lttoolbox
You'll need an up-to-date version of lttoolbox and associated libraries, and zlib (debian: zlib1g-dev).

Undefined symbol

In your dictionary you are probably using a symbol that you didn't define in the sdefs. Add the symbol to the sdefs.

Future work

In theory we're offloading multiwords from the transducers to lsx. This leaves open some questions:
- how do we do N N compounds with lsx?
- how does translation to a multiword work? In theory it's possible to invert the transducer, but an attempt to try this (—Firespeaker (talk) 00:02, 1 September 2017 (CEST)) results in a transducer that looks right but doesn't seem to be able to be processed correctly.
recycling dictionaries and/or paradigms? lsx-dictionaries are packaged in language pairs. the eng-spa lsx-dictionary can mostly be reaped by eng-cat. could we make use of the similarity?
Support for language pairs: we haven't gotten much extensive beta testing. The following are language pairs that have packaged the lsx-module:
- eng-cat
- eng-deu (?)
- kaz-kir

Apertium separable

Contents

Installing

Lexical transfer in the pipeline

Usage

Creating the lsx-dictionary

Compilation

Processing

Example usages

Naming Convention

Troubleshooting

Segmentation fault

Complaints about step_override()

Undefined symbol

Future work

See also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools