Difference between revisions of "Apertium-recursive"

From Apertium
Jump to navigation Jump to search
(Created page with "Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recurs...")
 
(add links)
Line 1: Line 1:
Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recursively. Rules can be written in a format almost identical to that of apertium-transfer or in a somewhat Yacc-like format created for this purpose.
+
Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recursively. Rules can be written in a format almost identical to that of apertium-transfer or in a somewhat [[Apertium-recursive/Formalism|Yacc-like]] format created for this purpose.
   
 
== Installing ==
 
== Installing ==
Line 102: Line 102:
   
 
Similarly, if the order of elements inside and <code><out></code> isn't "LU/chunk blank LU/chunk ... blank LU/chunk", later rules may fail to match properly.
 
Similarly, if the order of elements inside and <code><out></code> isn't "LU/chunk blank LU/chunk ... blank LU/chunk", later rules may fail to match properly.
  +
  +
== See Also ==
  +
  +
* [[Apertium-recursive/Formalism]] Rule-file format
  +
* [[Apertium-recursive/Parser]] Explanation of how the parser works
  +
* [[Apertium-recursive/Example]] Example of the parser in action
  +
* [[Apertium-recursive/Bytecode]] Documentation of binary file format
  +
* [[User:Popcorndude/Recursive_Transfer]] GSoC 2019 project proposal
  +
  +
[[Category:Transfer]]

Revision as of 20:20, 31 July 2019

Apertium-recursive is an alternative to apertium-transfer, apertium-interchunk, and apertium-postchunk. It uses a GLR parser rather than chunking and so can apply rules recursively. Rules can be written in a format almost identical to that of apertium-transfer or in a somewhat Yacc-like format created for this purpose.

Installing

Download from https://github.com/apertium/apertium-recursive

./autogen.sh
make
make install

Incorporating Into a Pair

The following instructions are for the Yacc-like syntax. To use XML, replace references to rtx-comp with trx-comp.

Makefile.am

Add $(PREFIX1).rtx.bin and $(PREFIX2).rtx.bin to TARGETS_COMMON.

$(PREFIX1).rtx.bin: $(BASENAME).$(PREFIX1).rtx
	rtx-comp $< $@

$(PREFIX2).rtx.bin: $(BASENAME).$(PREFIX2).rtx
	rtx-comp $< $@

modes.xml

Replace

     <program name="apertium-transfer -b">
       <file name="apertium-eng-kir.eng-kir.t1x"/>
       <file name="eng-kir.t1x.bin"/>
     </program>
     <program name="apertium-interchunk">
       <file name="apertium-eng-kir.eng-kir.t2x"/>
       <file name="eng-kir.t2x.bin"/>
     </program>
     <program name="apertium-postchunk">
       <file name="apertium-eng-kir.eng-kir.t3x"/>
       <file name="eng-kir.t3x.bin"/>
     </program>

with

     <program name="rtx-proc">
       <file name="eng-kir.rtx.bin"/>
     </program>

If the pair uses apertium-anaphora, use rtx-proc -a rather than rtx-proc.

configure.ac

AC_PATH_PROG([RTXCOMP], [rtx-comp], [false], [$PATH$PATH_SEPARATOR$with_rtx_comp/bin])
AS_IF([test x$RTXCOMP = xfalse], [AC_MSG_ERROR([You don't have rtx-comp installed])])

AC_PATH_PROG([RTXPROC], [rtx-proc], [false], [$PATH$PATH_SEPARATOR$with_rtx_proc/bin])
AS_IF([test x$RTXPROC = xfalse], [AC_MSG_ERROR([You don't have rtx-proc installed])])

Differences From .t*x

The following aspects of standard transfer files are unsupported or have potentially unpredictable results and should be avoided.

Chunks With the Same Tags as LUs

A number of pairs have rules that match punctuation and reset variables. Unfortunately, these rules often produce chunks with the same part of speech tag (e.g. match <sent> and output <sent>), which can lead to infinite recursion since the rule can match its own output.

With trx-comp this is possible but should be avoided.

Literal Chunks

In interchunk, new chunks are sometimes inserted like this:

<chunk>
 <lit v="det"/>
 <lit-tag v="DET.def"/>
 <lit v="{^the"/>
 <lit-tag v="det.def.mf.sp"/>
 <lit v="$}"/>
</chunk>

trx-comp will make some effort to deal with this, but results are not guaranteed and the curly braces may show up in the output. Instead write the above as:

<chunk name="det">
 <tags>
  <tag><lit-tag v="DET"/></tag>
  <tag><lit-tag v="def"/></tag>
 </tags>
 <lit v="the"/>
 <lit-tag v="det.def.mf.sp"/>
</chunk>

or even

<lu>
 <lit v="the"/>
 <lit-tag v="det.def.mf.sp"/>
</lu>

depending on what you're doing with it. (Note that trx-comp doesn't check the syntactic distinctions between .t1x and .t2x files.)

Chunks Not Containing Blanks

If the contents of a chunk are not alternating LUs/chunks and blanks, postchunk rules may not be able to handle them properly. So if there is a postchunk rule involved, always put

<chunk> <lu>...</lu>  <lu>...</lu> </chunk>

rather than

<chunk> <lu>...</lu> <lu>...</lu> </chunk>

even if you don't intend to output the blank.

Similarly, if the order of elements inside and <out> isn't "LU/chunk blank LU/chunk ... blank LU/chunk", later rules may fail to match properly.

See Also