Difference between revisions of "Apertium-recursive"
Popcorndude (talk | contribs) (Created page with "Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recurs...") |
Popcorndude (talk | contribs) (add links) |
||
Line 1: | Line 1: | ||
Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recursively. Rules can be written in a format almost identical to that of apertium-transfer or in a somewhat Yacc-like format created for this purpose. |
Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recursively. Rules can be written in a format almost identical to that of apertium-transfer or in a somewhat [[Apertium-recursive/Formalism|Yacc-like]] format created for this purpose. |
||
== Installing == |
== Installing == |
||
Line 102: | Line 102: | ||
Similarly, if the order of elements inside and <code><out></code> isn't "LU/chunk blank LU/chunk ... blank LU/chunk", later rules may fail to match properly. |
Similarly, if the order of elements inside and <code><out></code> isn't "LU/chunk blank LU/chunk ... blank LU/chunk", later rules may fail to match properly. |
||
== See Also == |
|||
* [[Apertium-recursive/Formalism]] Rule-file format |
|||
* [[Apertium-recursive/Parser]] Explanation of how the parser works |
|||
* [[Apertium-recursive/Example]] Example of the parser in action |
|||
* [[Apertium-recursive/Bytecode]] Documentation of binary file format |
|||
* [[User:Popcorndude/Recursive_Transfer]] GSoC 2019 project proposal |
|||
[[Category:Transfer]] |
Revision as of 20:20, 31 July 2019
Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recursively. Rules can be written in a format almost identical to that of apertium-transfer or in a somewhat Yacc-like format created for this purpose.
Contents
Installing
Download from https://github.com/apertium/apertium-recursive
./autogen.sh make make install
Incorporating Into a Pair
The following instructions are for the Yacc-like syntax. To use XML, replace references to rtx-comp
with trx-comp
.
Makefile.am
Add $(PREFIX1).rtx.bin
and $(PREFIX2).rtx.bin
to TARGETS_COMMON
.
$(PREFIX1).rtx.bin: $(BASENAME).$(PREFIX1).rtx rtx-comp $< $@ $(PREFIX2).rtx.bin: $(BASENAME).$(PREFIX2).rtx rtx-comp $< $@
modes.xml
Replace
<program name="apertium-transfer -b"> <file name="apertium-eng-kir.eng-kir.t1x"/> <file name="eng-kir.t1x.bin"/> </program> <program name="apertium-interchunk"> <file name="apertium-eng-kir.eng-kir.t2x"/> <file name="eng-kir.t2x.bin"/> </program> <program name="apertium-postchunk"> <file name="apertium-eng-kir.eng-kir.t3x"/> <file name="eng-kir.t3x.bin"/> </program>
with
<program name="rtx-proc"> <file name="eng-kir.rtx.bin"/> </program>
If the pair uses apertium-anaphora
, use rtx-proc -a
rather than rtx-proc
.
configure.ac
AC_PATH_PROG([RTXCOMP], [rtx-comp], [false], [$PATH$PATH_SEPARATOR$with_rtx_comp/bin]) AS_IF([test x$RTXCOMP = xfalse], [AC_MSG_ERROR([You don't have rtx-comp installed])]) AC_PATH_PROG([RTXPROC], [rtx-proc], [false], [$PATH$PATH_SEPARATOR$with_rtx_proc/bin]) AS_IF([test x$RTXPROC = xfalse], [AC_MSG_ERROR([You don't have rtx-proc installed])])
Differences From .t*x
The following aspects of standard transfer files are unsupported or have potentially unpredictable results and should be avoided.
Chunks With the Same Tags as LUs
A number of pairs have rules that match punctuation and reset variables. Unfortunately, these rules often produce chunks with the same part of speech tag (e.g. match <sent>
and output <sent>
), which can lead to infinite recursion since the rule can match its own output.
With trx-comp
this is possible but should be avoided.
Literal Chunks
In interchunk, new chunks are sometimes inserted like this:
<chunk> <lit v="det"/> <lit-tag v="DET.def"/> <lit v="{^the"/> <lit-tag v="det.def.mf.sp"/> <lit v="$}"/> </chunk>
trx-comp
will make some effort to deal with this, but results are not guaranteed and the curly braces may show up in the output. Instead write the above as:
<chunk name="det"> <tags> <tag><lit-tag v="DET"/></tag> <tag><lit-tag v="def"/></tag> </tags> <lit v="the"/> <lit-tag v="det.def.mf.sp"/> </chunk>
or even
<lu> <lit v="the"/> <lit-tag v="det.def.mf.sp"/> </lu>
depending on what you're doing with it. (Note that trx-comp
doesn't check the syntactic distinctions between .t1x and .t2x files.)
Chunks Not Containing Blanks
If the contents of a chunk are not alternating LUs/chunks and blanks, postchunk rules may not be able to handle them properly. So if there is a postchunk rule involved, always put
<chunk> <lu>...</lu> <lu>...</lu> </chunk>
rather than
<chunk> <lu>...</lu> <lu>...</lu> </chunk>
even if you don't intend to output the blank.
Similarly, if the order of elements inside and <out>
isn't "LU/chunk blank LU/chunk ... blank LU/chunk", later rules may fail to match properly.
See Also
- Apertium-recursive/Formalism Rule-file format
- Apertium-recursive/Parser Explanation of how the parser works
- Apertium-recursive/Example Example of the parser in action
- Apertium-recursive/Bytecode Documentation of binary file format
- User:Popcorndude/Recursive_Transfer GSoC 2019 project proposal