Difference between revisions of "Apertium separable"

From Apertium
Jump to navigation Jump to search
Line 5: Line 5:
Prerequisites and compilation are the same as lttoolbox and apertium. See [[Installation]].
Prerequisites and compilation are the same as lttoolbox and apertium. See [[Installation]].

{{highlight|The code can be found at ... and compiled by ... It is not currently part of distributed Apertium binaries.}}

==Lexical transfer in the pipeline==
==Lexical transfer in the pipeline==

Revision as of 21:51, 8 August 2017

Lttoolbox provides a module for reordering separable/discontiguous multiwords and processing them in the pipeline. Multiwords are manually written in an additional xml-format dictionary.


Prerequisites and compilation are the same as lttoolbox and apertium. See Installation.

The code can be found at ... and compiled by ... It is not currently part of distributed Apertium binaries.

Lexical transfer in the pipeline

lsx-proc runs between apertium-tagger and apertium-pretransfer:

… | apertium-tagger -g eng.prob | lsx-proc english.bin | apertium-pretransfer | …


A sentence in plain text,

Thus, it was asserted that a tax on foreign workers would reduce the numbers coming in and “taking jobs away” from American citizens.

This is the output of feeding the sentence through apertium-tagger :

^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ “^take<vblex><ger>$ ^job<n><pl>$ ^away<adv>$” ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$

This is the output of feeding the output above through lsx-proc :

^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ “^take# away<vblex><sep><ger>$ ^job<n><pl>$” ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$


Make a dictionary file:

<dictionary type="separable">
        <sdef n="adj"/>
        <sdef n="adv"/>
        <sdef n="n"/>
        <sdef n="sep"/>
        <sdef n="vblex"/>
        <pardef n="adj">
            <e><i><w/><s n="adj"/><j/></i></e>
            <e><i><w/><s n="adj"/><t/><j/></i></e>
        <pardef n="n">
            <e><i><w/><s n="n"/><t/><j/></i></e>
        <pardef n="SN">
            <e><par n="n"/></e>
            <e><par n="adj"/><par n="n"/></e>
            <e><par n="adj"/><par n="adj"/><par n="n"/></e>
        <pardef n="freq-adv">
            <e><i>always<s n="adv"/><j/></i></e>
            <e><i>anually<s n="adv"/><j/></i></e>
            <e><i>bianually<s n="adv"/><j/></i></e>
    <section id="main" type="standard">
        <e lm="be late" c="llegar tarde">
            <p><l>be<s n="vblex"/></l><r>be<g><b/>late</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="freq-adv"/><p><l>late<t/></l><r></r></p>
        <e lm="take away" c="sacar, quitar">
            <p><l>take<s n="vblex"/></l><r>take<g><b/>away</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SN"/><p><l>away<t/></l><r></r></p>


  • <w/> stands for one or more alphabetic symbols
  • <t/> stands for one or more tags (multicharacter symbols).

Then compile it:

$ lsx-comp dictionary.xml english.bin
main@standard 61 73

The input to lsx-proc is the output of apertium-tagger ,

$ echo '^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ <b>“^take<vblex><ger>$ ^job<n><pl>$ ^away<adv>$”</b> ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$' | lsx-proc english.bin

Dictionary format

A paradigm is made up of:
A dictionary entry is made up of:

Preparedness of languages

Language entries
apertium-eng 18,563

Todo and bugs


See also
