Difference between revisions of "Apertium separable"

Revision as of 03:10, 29 August 2017

Installing

Prerequisites and compilation are the same as lttoolbox and apertium. See Installation. On Debian/Ubuntu derivatives, it is part of the nightly repo as apt-get install apertium-separable.

The code can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable and instructions for compiling the module are:

./autogen.sh
./configure
make
make install

It is not currently part of distributed Apertium binaries for other distros/OSs.

Lexical transfer in the pipeline

lsx-proc runs directly AFTER apertium-tagger and apertium-pretransfer: note: previously this page had said that lsx-proc runs between BETWEEN apertium-tagger and apertium-pretransfer. it has now been determined that it should run AFTER pretransfer.

… | apertium-tagger -g en-es.prob |  apertium-pretransfer | lsx-proc en-es.autoseq.bin | …

Usage

Creating the lsx-dictionary

Make a dictionary file:

<dictionary type="separable">
    <alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
    <sdefs>
        <sdef n="adj"/>
        <sdef n="adv"/>
        <sdef n="n"/>
        <sdef n="sep"/>
        <sdef n="vblex"/>
    </sdefs>
    <pardefs>
        <pardef n="adj">
            <e><i><w/><s n="adj"/><j/></i></e>
            <e><i><w/><s n="adj"/><t/><j/></i></e>
        </pardef>
        <pardef n="n">
            <e><i><w/><s n="n"/><t/><j/></i></e>
        </pardef>
        <pardef n="SN">
            <e><par n="n"/></e>
            <e><par n="adj"/><par n="n"/></e>
            <e><par n="adj"/><par n="adj"/><par n="n"/></e>
        </pardef>
        <pardef n="freq-adv">
            <e><i>always<s n="adv"/><j/></i></e>
            <e><i>anually<s n="adv"/><j/></i></e>
            <e><i>bianually<s n="adv"/><j/></i></e>
        </pardef>
    </pardefs>
    <section id="main" type="standard">
        <e lm="be late" c="llegar tarde">
            <p><l>be<s n="vbser"/></l><r>be<g><b/>late</g><s n="vbser"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SAdv"/><p><l>late<t/><j/></l><r></r></p>
        </e>
        <e lm="take away" c="sacar, quitar">
            <p><l>take<s n="vblex"/></l><r>take<g><b/>away</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
            <par n="SN"/><p><l>away<t/><j/></l><r></r></p>
        </e>
    </section>
</dictionary>

Note:

<w/> stands for one or more alphabetic symbols
<t/> stands for one or more tags (multicharacter symbols).

i.e.

<e><w/><t/><j/></e> is equivalent to any-one-or-more-chars<adj><required-anytag><...optional-anytag...><$>
- ^tall<adj><sint><...>$
<e><w/><j/></e> is equivalent to any-one-or-more-chars<adj><$>
- ^tall<adj>$

A larger example dictionary can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable/examples/apertium-eng-spa.eng-spa.lsx

Compilation

Compilation into the binary format is achieved by means of the lsx-comp program.

$ lsx-comp apertium-eng-spa.eng-spa.lsx eng-spa.autoseq.bin
main@standard 61 73

Processing

Processing can be done using the lsx-proc program.

The input to lsx-proc is the output of apertium-tagger and apertium-pretransfer ,

$ echo '^take<vblex><imp>$ ^prpers<prn><obj><p3><nt><sg>$ ^out of<pr>$ ^there<adv>$^.<sent>$' | lsx-proc eng-spa.autoseq.bin
^take# out<vblex><sep><imp>$ ^prpers<prn><obj><p3><nt><sg>$ ^of<pr>$ ^there<adv>$^.<sent>$

Example usages

Example #1: A sentence in plain text,

The Aragonese took Ramiro out of a monastery and made him king.

This is the output of feeding the sentence through apertium-tagger :

^the<det><def><sp>$ ^Aragonese<n><sg>$ ^take<vblex><past>$ ^Ramiro<np><ant><m><sg>$ ^out of<pr>$ ^a<det><ind><sg>$ ^monastery<n><sg>$ ^and<cnjcoo>$ ^make<vblex><pp>$ ^prpers<prn><obj><p3><m><sg>$ ^king<n><sg>$^.<sent>$

This is the output of feeding the output above through lsx-proc with apertium-eng-spa.eng-spa.lsx:

^the<det><def><sp>$ ^Aragonese<n><sg>$ ^take# out<vblex><sep><past>$ ^Ramiro<np><ant><m><sg>$ ^of<pr>$ ^a<det><ind><sg>$ ^monastery<n><sg>$ ^and<cnjcoo>$ ^make<vblex><pp>$ ^prpers<prn><obj><p3><m><sg>$ ^king<n><sg>$^.<sent>$

Naming Convention

apertium-eng-cat.eng-cat.lsx, eng-cat.autoseq.bin

Troubleshooting

Segmentation fault

The lsx-dictionary compiles fine with zero entries but gives a seg fault once entries are added:

error appears on (linux machine?) ...no solution found yet

References

https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable
project proposal and workplan

@@ Line 122: / Line 122: @@
 ==Naming Convention==
 <code>apertium-eng-cat.eng-cat.lsx</code>, <code>eng-cat.autoseq.bin</code>
-==Resolved issues==
-* kaz-eng
-<pre>
-$ echo "хабар еткен" | apertium-destxt | apertium -f none -d . kaz-eng-tagger | ~/source/apertium/branches/apertium-separable/src/lsx-proc kaz-eng.autoseq.bin
- ^хабарет<v><tv>$ ^хабарет<v><tv><past>$^хабарет<v><tv><past><p3>$^хабарет<v><tv><past><p3><sg>$^.<sent>$[][
-</pre>
-* kaz-kir
-:35 firespeaker: http://svn.code.sf.net/p/apertium/svn/nursery/apertium-kaz-kir/apertium-kaz-kir.kaz-kir.lsx <br/>
-:35 firespeaker: with input ^абай<adj>$ ^бол<v><iv><imp><p2><sg>$
-*deu
-wolfgangth Hi, I tested the new module for reordering separable multiwords and I have a problem if one of the entries (the last) has more then one word <br/>
-wolfgangth before lsx-proc : ^heute Nachmittag<adv>$ wolfgangth after lsx-proc : ^heuteNachmittag<adv>$ <br/>
-wolfgangth the blank was lost if it was part of a rule that was executed <br/>
-* /p/apertium/svn/incubator/apertium-fao-nor/apertium-fao-nor.fao-nor.dix
-<pre>
-input:  ^snjúgva<vblex><ind><pres><p3><sg>$ ^seg<prn><ref><acc>$ ^um<pr>$   should output: snjúgva# seg<vblex><ind><pres><p3><sg>$ ^um<pr>$
-input: ^at<cnjsub>$ ^*leidningarnir$ ^halda<vblex><inf>$ ^fram<adv>$^,<cm>$ ^at<cnjsub>$, output: ^at<cnjsub>$ ^*leidningarnir$ ^halda# fram<vblex><adv>$^,<cm>$  ^at<cnjsub>$
-notice the extra space and the fact that you get <vblex><adv> not <vblex><inf>
-</pre>
-* +
-<pre>
-:35 firespeaker: $ echo "абай болмайсың ба" | apertium -d . kaz-kir-autoseq
-:35 firespeaker: ^абай<adj>$ ^бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$
-:35 firespeaker: oh, it's probably the +
-:35 firespeaker: seems to be okay with everything else
-:36 firespeaker: we'll need to ask spectie how we want to be dealing with this
-:38 irene_: what's the expected output?
-:39 begiak: apertium: jonorthwash * 81610: /nursery/apertium-kaz-kir/: Makefile.am, apertium-kaz-kir.kaz-kir.dix and 2 other files: kaz-kir-autoseq mode
-:39 irene_: of ^абай<adj>$ ^бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$
-:39 firespeaker: ^абай бол<v><iv><neg><aor><p2><sg>+ма<qst>$^.<sent>$  I guess
-</pre>
-* append <j/> with <t/>? => no
-* append <j/> with every </e> in lsx-comp, instead of writing the final <j/> in the dictionary => no, having lsx-comp append <j/> messes with paradigms
-* have the language-data writer write it explicitly in the .lsx file.
-* lsx-comp doesn't register loop for ANY_TAG when in pair, only when in identity  => fixed in matchTransduction()
-** blow# out of the water, be# oppose to
-** fao-nor
-<pre>
-        <e lm="snjúgva seg um" c="">
-           <p><l>snjúgva<s n="vblex"/></l><r>snjúgva<g><b/>seg</g><s n="vblex"/></r></p>
-           <i><t/><j/></i>
-           <p><l>seg<s n="prn"/><t/><j/>um<s n="pr"/></l><r>um<s n="pr"/></r></p>
-           <i><j/></i>
-        </e>
-$ lt-print fao-nob.autoseq.bin
-	1	s	s
-	2	n	n
-	3	j	j
-	4	ú	ú
-	5	g	g
-	6	v	v
-	7	a	a
-	8	<vblex>	#
-	9	ε
-	10	ε	s
-	11	ε	e
-	12	ε	g
-	13	ε	<vblex>
-	14	<ANY_TAG>	<ANY_TAG>
-	14	<ANY_TAG>	<ANY_TAG>
-	15	<$>	<$>
-	16	s	u
-	17	e	m
-	18	g	<pr>
-	19	<prn>	ε
-	20	<ANY_TAG>	ε
-	21	<$>	ε
-	22	u	ε
-	23	m	ε
-	24	<pr>	ε
-	25	<$>	<$>
-$ echo "^snjúgva<vblex><ind><pres><p3><sg>$ ^seg<prn><ref><acc>$ ^um<pr>$" | ~/source/apertium/branches/apertium-separable/src/lsx-proc fao-nob.autoseq.bin
-^snjúgva<vblex><ind><pres><p3><sg>$ ^seg<prn><ref><acc>$ ^um<pr>$
-        <e lm="halda fram, at" c="">
-           <p><l>halda<s n="vblex"/></l><r>halda<g><b/>fram</g><s n="vblex"/></r></p>
-           <i><t/><j/></i>
-           <p><l>fram<s n="adv"/><j/>,<s n="cm"/><j/>at<s n="cnjsub"/></l><r>,<s n="cm"/><j/>at<s n="cnjsub"/><j/></r></p>
-           <i><j/></i>
-        </e>
-$ echo "^at<cnjsub>$ ^*leidningarnir$ ^halda<vblex><inf>$ ^fram<adv>$^,<cm>$ ^at<cnjsub>$" | ~/source/apertium/branches/apertium-separable/src/lsx-proc fao-nob.autoseq.bin
-^at<cnjsub>$ ^*leidningarnir$ ^halda# fram<vblex><inf>$^,<cm>$ ^at<cnjsub>$ ^$
-</pre>
 ==Troubleshooting==

Difference between revisions of "Apertium separable"

Revision as of 03:10, 29 August 2017

Contents

Installing

Lexical transfer in the pipeline

Usage

Creating the lsx-dictionary

Compilation

Processing

Example usages

Naming Convention

Troubleshooting

Segmentation fault

See also

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools