Matxin

From Apertium
Jump to navigation Jump to search

Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium. The linguistic data available under a free-licence is a fraction of the data that is used in the papers and descriptions of the subject, so naturally the translations from the pair will be less good than you can find results in the paper.

Prerequisites

If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix>
  • lttoolbox (version 2.0 -- yes, this is ancient)


Building

Checkout
$ svn co http://matxin.svn.sourceforge.net/svnroot/matxin

First comment out the deformatters in the src/Makefile.am as they don't build properly.

bin_PROGRAMS = Analyzer LT ST_intra ST_inter ST_prep ST_verb SG_inter SG_intra \
               MG 

#reFormat txt-deformat html-deformat rtf-deformat



#
#reFormat_SOURCES = reFormat.C data_manager.C XML_reader.C simpleregex.C
#reFormat_LDADD = -lpcre -lxml2
#
#txt_deformat_SOURCES = txt-deformat.C
#html_deformat_SOURCES = html-deformat.C
#rtf_deformat_SOURCES = rtf-deformat.C
#
#txt-deformat.C: txt-format.xml Makefile.am deformat.xsl
#	$(XSLTPROC) --stringparam mode matxin deformat.xsl txt-format.xml | $(FLEX) -Cfer -t >$@
#
#html-deformat.C: html-format.xml Makefile.am deformat.xsl
#	$(XSLTPROC) --stringparam mode matxin deformat.xsl html-format.xml | $(FLEX) -Cfer -t >$@
#
#rtf-deformat.C: rtf-format.xml Makefile.am deformat.xsl
#	$(XSLTPROC) --stringparam mode matxin deformat.xsl rtf-format.xml | $(FLEX) -Cfer -t >$@
#


Then in the file configure.ac, change

AC_CHECK_LIB(lttoolbox-2.0, main, [], [AC_MSG_ERROR([library 'lttoolbox' is required for Matxin])])

for

PKG_CHECK_MODULES(MATXIN, [lttoolbox-2.0 >= 2.0.0],,exit)

CFLAGS="$CFLAGS $MATXIN_CFLAGS"
CPPFLAGS="$CPPFLAGS $MATXIN_CPPFLAGS"
LIBS="$LIBS $MATXIN_LIBS"

Then you need to comment out the check for gcc version in configure.ac

#
# GCC version check
#
#AC_MSG_CHECKING(for GCC version)
#gccver=`$CXX -dumpversion`
#AC_MSG_RESULT($gccver);
#case "$gccver" in
#  4.3*)
#    AC_MSG_ERROR([GCC version must be <= 4.2])
#esac
#

Then do:

$ aclocal; automake -a; autoconf
$ LTTOOLBOX_DIR=/home/fran/svnroot/local/unstable/ LDFLAGS="-L/home/fran/local/lib -L/usr/lib" 
CPPFLAGS="-I/home/fran/local/include" ./configure --prefix=/home/fran/local/

You will probably need to fix a few things in the file: src/Makefile to get it to build, for example add:

DEFAULT_INCLUDES = -I. -I$(srcdir) -I/usr/include/libxml2
CPPFLAGS = -I<prefix>/include/lttoolbox-2.0 -I<prefix>/include

Executing

$ export MATXIN_DIR=<prefix>
$ echo "Esto es una prueba" |  \
./Analyzer -f $MATXIN_DIR/share/matxin/config.cfg | \
./LT -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_prep -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_verb   -f $MATXIN_DIR/share/matxin/config.cfg  | \
./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config.cfg | \
./SG_inter -f $MATXIN_DIR/share/matxin/config.cfg | \
./SG_intra -f $MATXIN_DIR/share/matxin/config.cfg | \
./MG -f $MATXIN_DIR/share/matxin/config.cfg | \
./reFormat

Hau prueba bat da

Speed

Between 25--30 words per second.

Documentation