Apertium separable

Installing

Prerequisites and compilation are the same as lttoolbox and apertium. See Installation.

The code can be found at ... and compiled by ... It is not currently part of distributed Apertium binaries.

Lexical transfer in the pipeline

lsx-proc runs between apertium-tagger and apertium-pretransfer:

… | apertium-tagger -g eng.prob | lsx-proc english.bin | apertium-pretransfer | …

Example

A sentence in plain text,

Thus, it was asserted that a tax on foreign workers would reduce the numbers coming in and “taking jobs away” from American citizens.

This is the output of feeding the sentence through apertium-tagger :

^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ “^take<vblex><ger>$ ^job<n><pl>$ ^away<adv>$” ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$

This is the output of feeding the output above through lsx-proc :

^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ “^take# away<vblex><sep><ger>$ ^job<n><pl>$” ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$

Usage

Make a dictionary file:

<dictionary type="separable">
    <alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
    <sdefs>
        <sdef n="adj"/>
        <sdef n="adv"/>
        <sdef n="n"/>
        <sdef n="sep"/>
        <sdef n="vblex"/>
    </sdefs>
    <pardefs>
        <pardef n="adj">
            <e><i><w/><s n="adj"/><j/></i></e>
            <e><i><w/><s n="adj"/><t/><j/></i></e>
        </pardef>
        <pardef n="n">
            <e><i><w/><s n="n"/><t/><j/></i></e>
        </pardef>
        <pardef n="SN">
            <e><par n="n"/></e>
            <e><par n="adj"/><par n="n"/></e>
            <e><par n="adj"/><par n="adj"/><par n="n"/></e>
        </pardef>
        <pardef n="freq-adv">
            <e><i>always<s n="adv"/><j/></i></e>
            <e><i>anually<s n="adv"/><j/></i></e>
            <e><i>bianually<s n="adv"/><j/></i></e>
        </pardef>
    </pardefs>
    <section id="main" type="standard">
        <e lm="be late" c="llegar tarde">
            <p><l>be<s n="vblex"/></l><r>be<g><b/>late</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="freq-adv"/><p><l>late<t/></l><r></r></p>
        </e>
        <e lm="take away" c="sacar, quitar">
            <p><l>take<s n="vblex"/></l><r>take<g><b/>away</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SN"/><p><l>away<t/></l><r></r></p>
        </e>
    </section>
</dictionary>

Note:

<w/> stands for one or more alphabetic symbols
<t/> stands for one or more tags (multicharacter symbols).

Then compile it:

$ lsx-comp dictionary.xml english.bin
main@standard 61 73

The input to lsx-proc is the output of apertium-tagger ,

$ echo '^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ <b>“^take<vblex><ger>$ ^job<n><pl>$ ^away<adv>$”</b> ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$' | lsx-proc english.bin

Dictionary format

A paradigm is made up of:

A dictionary entry is made up of:

Preparedness of languages

Language	entries
`apertium-eng`	18,563

Todo and bugs

Decide whether the lsx module is part of monolingual modules, language pairs, either, or both.
Instead of dictionary.xml and english.bin and the like, we should have standardised naming conventions. Some options/proposals:
- eng-cat.autolsx.xml, eng-cat.autolsx.bin
- eng-cat.autosep.lsx, eng-cat.autosep.bin
- ...

Troubleshooting

References

Apertium separable

Contents

Installing

Lexical transfer in the pipeline

Example

Usage

Dictionary format

Preparedness of languages

Todo and bugs

Troubleshooting

See also

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools