Difference between revisions of "Apertium separable"

Revision as of 18:28, 12 August 2017

Installing

Prerequisites and compilation are the same as lttoolbox and apertium. See Installation. The code can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable and instructions for compiling the module are:

./autogen.sh
./configure
make

It is not currently part of distributed Apertium binaries.

Lexical transfer in the pipeline

lsx-proc runs between apertium-tagger and apertium-pretransfer:

… | apertium-tagger -g eng.prob | lsx-proc english.bin | apertium-pretransfer | …

Example

A sentence in plain text,

Thus, it was asserted that a tax on foreign workers would reduce the numbers coming in and “taking jobs away” from American citizens.

This is the output of feeding the sentence through apertium-tagger :

^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ “^take<vblex><ger>$ ^job<n><pl>$ ^away<adv>$” ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$

This is the output of feeding the output above through lsx-proc :

^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ “^take# away<vblex><sep><ger>$ ^job<n><pl>$” ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$

Compilation and Usage

Make a dictionary file:

<dictionary type="separable">
    <alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
    <sdefs>
        <sdef n="adj"/>
        <sdef n="adv"/>
        <sdef n="n"/>
        <sdef n="sep"/>
        <sdef n="vblex"/>
    </sdefs>
    <pardefs>
        <pardef n="adj">
            <e><i><w/><s n="adj"/><j/></i></e>
            <e><i><w/><s n="adj"/><t/><j/></i></e>
        </pardef>
        <pardef n="n">
            <e><i><w/><s n="n"/><t/><j/></i></e>
        </pardef>
        <pardef n="SN">
            <e><par n="n"/></e>
            <e><par n="adj"/><par n="n"/></e>
            <e><par n="adj"/><par n="adj"/><par n="n"/></e>
        </pardef>
        <pardef n="freq-adv">
            <e><i>always<s n="adv"/><j/></i></e>
            <e><i>anually<s n="adv"/><j/></i></e>
            <e><i>bianually<s n="adv"/><j/></i></e>
        </pardef>
    </pardefs>
    <section id="main" type="standard">
        <e lm="be late" c="llegar tarde">
            <p><l>be<s n="vblex"/></l><r>be<g><b/>late</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="freq-adv"/><p><l>late<t/></l><r></r></p>
        </e>
        <e lm="take away" c="sacar, quitar">
            <p><l>take<s n="vblex"/></l><r>take<g><b/>away</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SN"/><p><l>away<t/></l><r></r></p>
        </e>
    </section>
</dictionary>

Note:

<w/> stands for one or more alphabetic symbols
<t/> stands for one or more tags (multicharacter symbols).

i.e.

<e><w/><t/><j/></e> is equivalent to any-one-or-more-chars<adj><required-anytag><optional-anytag><...>
<e><w/><j/></e> is equivalent to any-one-or-more-chars<adj><optional-anytag><...>

Then compile it:

$ lsx-comp dictionary.xml english.bin
main@standard 61 73

The input to lsx-proc is the output of apertium-tagger ,

$ echo '^thus<adv>$^,<cm>$ ^prpers<prn><subj><p3><nt><sg>$ ^be<vbser><past><p3><sg>$ ^assert<vblex><pp>$ ^that<prn><tn><mf><sg>$ ^a<det><ind><sg>$ ^tax<n><sg>$ ^on<pr>$ ^foreign<adj>$ ^worker<n><pl>$ ^would<vaux><inf>$ ^reduce<vblex><inf>$ ^the<det><def><sp>$ ^number<vblex><pri><p3><sg>$ ^come<vblex><ger># in$ ^and<cnjcoo>$ <b>“^take<vblex><ger>$ ^job<n><pl>$ ^away<adv>$”</b> ^from<pr>$ ^american<adj>$ ^citizen<n><pl>$^.<sent>$^.<sent>$' | lsx-proc english.bin

A larger example dictionary can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable/examples/new-example.dix

Dictionary format

A paradigm is made up of:

A dictionary entry is made up of:

Preparedness of languages

Languages that beta-testing the module:

eng

Todo and bugs

Decide whether the lsx module is part of monolingual modules, language pairs, either, or both.
Instead of dictionary.xml and english.bin and the like, we should have standardised naming conventions. Some options/proposals:
- eng-cat.autolsx.xml, eng-cat.autolsx.bin
- eng-cat.autosep.lsx, eng-cat.autosep.bin
- apertium-eng-cat.eng-cat.lsx, eng-cat.autoseq.bin

kaz-eng

$ echo "хабар еткен" | apertium-destxt | apertium -f none -d . kaz-eng-tagger | ~/source/apertium/branches/apertium-separable/src/lsx-proc kaz-eng.autoseq.bin 
 ^хабарет<v><tv>$ ^хабарет<v><tv><past>$^хабарет<v><tv><past><p3>$^хабарет<v><tv><past><p3><sg>$^.<sent>$[][


* /p/apertium/svn/incubator/apertium-fao-nor/apertium-fao-nor.fao-nor.dix
** input:  ^snjúgva<vblex><ind><pres><p3><sg>$ ^seg<prn><ref><acc>$ ^um<pr>$   should output: snjúgva# seg<vblex><ind><pres><p3><sg>$ ^um<pr>$
** input: ^at<cnjsub>$ ^*leidningarnir$ ^halda<vblex><inf>$ ^fram<adv>$^,<cm>$ ^at<cnjsub>$, output: ^at<cnjsub>$ ^*leidningarnir$ ^halda# fram<vblex><adv>$^,<cm>$  ^at<cnjsub>$
*** notice the extra space and the fact that you get <vblex><adv> not <vblex><inf>

* blow# out of the water

* 
wolfgangth Hi, I tested the new module for reordering separable multiwords and I have a problem if one of the entries (the last) has more then one word
wolfgangth before lsx-proc : ^heute Nachmittag<adv>$
wolfgangth after lsx-proc : ^heuteNachmittag<adv>$
wolfgangth the blank was lost if it was part of a rule that was executed

Troubleshooting

References

https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable
project proposal and workplan

@@ Line 157: / Line 157: @@
 * https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable
 * project [[User:Irene/proposal | proposal]] and [[User:Irene/workplan | workplan]]
-* [[Category:Documentation in English]]
+[[Category:Documentation in English]]

Difference between revisions of "Apertium separable"

Revision as of 18:28, 12 August 2017

Contents

Installing

Lexical transfer in the pipeline

Example

Compilation and Usage

Dictionary format

Preparedness of languages

Todo and bugs

Troubleshooting

See also

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools