Difference between revisions of "Apertium separable"
| Line 130: | Line 130: | ||
...no solution found yet |
...no solution found yet |
||
== |
==See also== |
||
* https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable |
* https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable |
||
* [[Apertium system architecture]] |
|||
* project [[User:Irene/proposal | proposal]], [[User:Irene/workplan | workplan]], [[Lsx_module_report | report]] |
* project [[User:Irene/proposal | proposal]], [[User:Irene/workplan | workplan]], [[Lsx_module_report | report]] |
||
[[Category:Documentation in English]] |
[[Category:Documentation in English]] |
||
Revision as of 03:12, 29 August 2017
Lttoolbox provides a module for reordering separable/discontiguous multiwords and processing them in the pipeline. Multiwords are manually written in an additional xml-format dictionary.
Installing
Prerequisites and compilation are the same as lttoolbox and apertium. See Installation. On Debian/Ubuntu derivatives, it is part of the nightly repo as apt-get install apertium-separable.
The code can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable and instructions for compiling the module are:
./autogen.sh ./configure make make install
It is not currently part of distributed Apertium binaries for other distros/OSs.
Lexical transfer in the pipeline
lsx-proc runs directly AFTER apertium-tagger and apertium-pretransfer: note: previously this page had said that lsx-proc runs between BETWEEN apertium-tagger and apertium-pretransfer. it has now been determined that it should run AFTER pretransfer.
… | apertium-tagger -g en-es.prob | apertium-pretransfer | lsx-proc en-es.autoseq.bin | …
Usage
Creating the lsx-dictionary
Make a dictionary file:
<dictionary type="separable">
<alphabet>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</alphabet>
<sdefs>
<sdef n="adj"/>
<sdef n="adv"/>
<sdef n="n"/>
<sdef n="sep"/>
<sdef n="vblex"/>
</sdefs>
<pardefs>
<pardef n="adj">
<e><i><w/><s n="adj"/><j/></i></e>
<e><i><w/><s n="adj"/><t/><j/></i></e>
</pardef>
<pardef n="n">
<e><i><w/><s n="n"/><t/><j/></i></e>
</pardef>
<pardef n="SN">
<e><par n="n"/></e>
<e><par n="adj"/><par n="n"/></e>
<e><par n="adj"/><par n="adj"/><par n="n"/></e>
</pardef>
<pardef n="freq-adv">
<e><i>always<s n="adv"/><j/></i></e>
<e><i>anually<s n="adv"/><j/></i></e>
<e><i>bianually<s n="adv"/><j/></i></e>
</pardef>
</pardefs>
<section id="main" type="standard">
<e lm="be late" c="llegar tarde">
<p><l>be<s n="vbser"/></l><r>be<g><b/>late</g><s n="vbser"/><s n="sep"/></r></p><i><t/><j/></i>
<par n="SAdv"/><p><l>late<t/><j/></l><r></r></p>
</e>
<e lm="take away" c="sacar, quitar">
<p><l>take<s n="vblex"/></l><r>take<g><b/>away</g><s n="vblex"/><s n="sep"/></r></p><i><t/><j/></i>
<par n="SN"/><p><l>away<t/><j/></l><r></r></p>
</e>
</section>
</dictionary>
Note:
<w/>stands for one or more alphabetic symbols<t/>stands for one or more tags (multicharacter symbols).
i.e.
<e><w/>is equivalent to<t/><j/></e>any-one-or-more-chars<adj><required-anytag><...optional-anytag...><$>- ^tall<adj><sint><...>$
<e><w/>is equivalent to<j/></e>any-one-or-more-chars<adj><$>- ^tall<adj>$
A larger example dictionary can be found at https://svn.code.sf.net/p/apertium/svn/branches/apertium-separable/examples/apertium-eng-spa.eng-spa.lsx
Compilation
Compilation into the binary format is achieved by means of the lsx-comp program.
$ lsx-comp apertium-eng-spa.eng-spa.lsx eng-spa.autoseq.bin main@standard 61 73
Processing
Processing can be done using the lsx-proc program.
The input to lsx-proc is the output of apertium-tagger and apertium-pretransfer ,
$ echo '^take<vblex><imp>$ ^prpers<prn><obj><p3><nt><sg>$ ^out of<pr>$ ^there<adv>$^.<sent>$' | lsx-proc eng-spa.autoseq.bin ^take# out<vblex><sep><imp>$ ^prpers<prn><obj><p3><nt><sg>$ ^of<pr>$ ^there<adv>$^.<sent>$
Example usages
Example #1: A sentence in plain text,
The Aragonese took Ramiro out of a monastery and made him king.
This is the output of feeding the sentence through apertium-tagger :
^the<det><def><sp>$ ^Aragonese<n><sg>$ ^take<vblex><past>$ ^Ramiro<np><ant><m><sg>$ ^out of<pr>$ ^a<det><ind><sg>$ ^monastery<n><sg>$ ^and<cnjcoo>$ ^make<vblex><pp>$ ^prpers<prn><obj><p3><m><sg>$ ^king<n><sg>$^.<sent>$
This is the output of feeding the output above through lsx-proc with apertium-eng-spa.eng-spa.lsx:
^the<det><def><sp>$ ^Aragonese<n><sg>$ ^take# out<vblex><sep><past>$ ^Ramiro<np><ant><m><sg>$ ^of<pr>$ ^a<det><ind><sg>$ ^monastery<n><sg>$ ^and<cnjcoo>$ ^make<vblex><pp>$ ^prpers<prn><obj><p3><m><sg>$ ^king<n><sg>$^.<sent>$
Naming Convention
apertium-eng-cat.eng-cat.lsx, eng-cat.autoseq.bin
Troubleshooting
Segmentation fault
The lsx-dictionary compiles fine with zero entries but gives a seg fault once entries are added:
error appears on (linux machine?) ...no solution found yet