Difference between revisions of "Matxin"

From Apertium
Jump to navigation Jump to search
 
(85 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}
'''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium. The linguistic data available under a free-licence is a fraction of the data that is used in the papers and descriptions of the subject, so naturally the translations from the pair will be less good than you can find results in the paper.
'''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium.


This page describes how to install the system, see [[Matxin#Documentation]] below for how to create or maintain language pairs.
==Prerequisites==


==Installation==
* libcfg+ — http://platon.sk/upload/_projects/00003/libcfg+-0.6.2.tar.gz
* BerkleyDB — sudo apt-get install libdb4.6++-dev
* libomlet — https://lafarga.cpl.upc.edu/frs/download.php/130/libomlet-0.97.tar.gz
* libfries — https://lafarga.cpl.upc.edu/frs/download.php/129/libfries-0.95.tar.gz
* FreeLing (version 1.5) — https://lafarga.cpl.upc.edu/frs/download.php/90/FreeLing-1.5.tar.gz
:If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix>
* [[lttoolbox]] (version 2.0 -- yes, this is ancient)

==Troubleshooting==

;libfries

If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the file <code>src/include/fries/language.h</code>.

If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libfries/RGF.cc</code>.

;libomlet

If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the file <code>src/libomlet/adaboost.cc</code>.

;FreeLing

If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the files <code>src/utilities/indexdict.cc</code>, <code>src/libmorfo/accents.cc</code>, <code>src/libmorfo/accents_modules.cc</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/tagger.cc</code>, <code>src/libmorfo/punts.cc</code>, <code>src/libmorfo/maco_options.cc</code>, <code>src/libmorfo/splitter.cc</code> <code>src/libmorfo/suffixes.cc</code> <code>src/libmorfo/senses.cc</code> <code>src/libmorfo/hmm_tagger.cc</code>.

If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/libmorfo/automat.cc</code>, <code>src/libmorfo/dates.cc</code>, <code>src/libmorfo/locutions.cc</code>, <code>src/libmorfo/maco.cc</code>, <code>src/libmorfo/np.cc</code>, <code>src/libmorfo/nec.cc</code>, <code>src/libmorfo/numbers.cc</code>, <code>src/libmorfo/numbers_modules.cc</code>, <code>src/libmorfo/quantities.cc</code>, <code>src/libmorfo/tokenizer.cc</code> and <code>src/libmorfo/dates_modules.cc</code>.

If you get the error to do with "memcpy was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/include/traces.h</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/traces.cc</code> <code>src/libmorfo/senses.cc</code>, <code>src/libmorfo/feature_extractor/fex.cc</code>

If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libmorfo/feature_extractor/RGF.cc</code>.

If you get the error:


<pre>
<pre>
$ git clone https://github.com/matxin/matxin.git
In file included from analyzer.cc:72:
$ cd matxin/
config.h:32:18: error: cfg+.h: No such file or directory
$ ./autogen.sh
$ make
# make install
</pre>
</pre>


==Language pairs==
Run <code>make</code> like <code>make CXXFLAGS=-I/home/fran/local/include</code>


* [[matxin-spa-eus]]
;Matxin
* [[matxin-eng-eus]]


==Troubleshooting==
<pre>
Analyzer.C: In function ‘int main(int, char**)’:
Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’
/home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool)
/home/fran/local/include/hmm_tagger.h:84: note: hmm_tagger::hmm_tagger(const hmm_tagger&)
Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’
/home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool)
/home/fran/local/include/relax_tagger.h:51: note: relax_tagger::relax_tagger(const relax_tagger&)
Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’
/home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&)
/home/fran/local/include/senses.h:45: note: senses::senses(const senses&)
</pre>


;Can't find AP_MKINCLUDE
Make the following changes in the file <code>src/Analyzer.C</code>:


<pre>
if (cfg.TAGGER_which == HMM)
- tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect);
+ tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false);
else if (cfg.TAGGER_which == RELAX)
- tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter,
+ tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter,
cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon,
- cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect);
+ false);
if (cfg.NEC_NEClassification)
neclass = new nec("NP", cfg.NEC_FilePrefix);
if (cfg.SENSE_SenseAnnotation!=NONE)
- sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis);
+ sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis);
</pre>


==Building==


;Checkout


set your <code>ACLOCAL_PATH</code> to include the path to <code>matxin.m4</code>
<pre>
$ svn co http://matxin.svn.sourceforge.net/svnroot/matxin
</pre>


==Documentation==
First comment out the deformatters in the <code>src/Makefile.am</code> as they don't build properly.


* [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish)
<pre>
* [[Documentation of Matxin]] (in English)
bin_PROGRAMS = Analyzer LT ST_intra ST_inter ST_prep ST_verb SG_inter SG_intra \
* [[Matxin New Language Pair HOWTO]]
MG


==Contact==
#reFormat txt-deformat html-deformat rtf-deformat


Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list.


==External links==


*[http://ixa.si.ehu.es/Ixa IXA Research Group]
#
#reFormat_SOURCES = reFormat.C data_manager.C XML_reader.C simpleregex.C
#reFormat_LDADD = -lpcre -lxml2
#
#txt_deformat_SOURCES = txt-deformat.C
#html_deformat_SOURCES = html-deformat.C
#rtf_deformat_SOURCES = rtf-deformat.C
#
#txt-deformat.C: txt-format.xml Makefile.am deformat.xsl
# $(XSLTPROC) --stringparam mode matxin deformat.xsl txt-format.xml | $(FLEX) -Cfer -t >$@
#
#html-deformat.C: html-format.xml Makefile.am deformat.xsl
# $(XSLTPROC) --stringparam mode matxin deformat.xsl html-format.xml | $(FLEX) -Cfer -t >$@
#
#rtf-deformat.C: rtf-format.xml Makefile.am deformat.xsl
# $(XSLTPROC) --stringparam mode matxin deformat.xsl rtf-format.xml | $(FLEX) -Cfer -t >$@
#


</pre>

Then in the file <code>configure.ac</code>, change

<pre>
AC_CHECK_LIB(lttoolbox-2.0, main, [], [AC_MSG_ERROR([library 'lttoolbox' is required for Matxin])])
</pre>

for

<pre>
PKG_CHECK_MODULES(MATXIN, [lttoolbox-2.0 >= 2.0.0],,exit)

CFLAGS="$CFLAGS $MATXIN_CFLAGS"
CPPFLAGS="$CPPFLAGS $MATXIN_CPPFLAGS"
LIBS="$LIBS $MATXIN_LIBS"
</pre>

Then you need to comment out the check for gcc version in <code>configure.ac</code>

<pre>
#
# GCC version check
#
#AC_MSG_CHECKING(for GCC version)
#gccver=`$CXX -dumpversion`
#AC_MSG_RESULT($gccver);
#case "$gccver" in
# 4.3*)
# AC_MSG_ERROR([GCC version must be <= 4.2])
#esac
#
</pre>

Then do:

<pre>
$ aclocal; automake -a; autoconf
$ LTTOOLBOX_DIR=/home/fran/svnroot/local/unstable/ LDFLAGS="-L/home/fran/local/lib -L/usr/lib"
CPPFLAGS="-I/home/fran/local/include" ./configure --prefix=/home/fran/local/
</pre>

You will probably need to fix a few things in the file: <code>src/Makefile</code> to get it to build, for example add:

<pre>
DEFAULT_INCLUDES = -I. -I$(srcdir) -I/usr/include/libxml2
CPPFLAGS = -I<prefix>/include/lttoolbox-2.0 -I<prefix>/include
</pre>

==Executing==

<pre>
$ export MATXIN_DIR=<prefix>
$ echo "Esto es una prueba" | \
./Analyzer -f $MATXIN_DIR/share/matxin/config.cfg | \
./LT -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_prep -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_verb -f $MATXIN_DIR/share/matxin/config.cfg | \
./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config.cfg | \
./SG_inter -f $MATXIN_DIR/share/matxin/config.cfg | \
./SG_intra -f $MATXIN_DIR/share/matxin/config.cfg | \
./MG -f $MATXIN_DIR/share/matxin/config.cfg | \
./reFormat

Hau prueba bat da
</pre>

==Speed==

Between 25--30 words per second.

==Documentation==

* [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish)


[[Category:Tools]]
[[Category:Matxin|*]]

Latest revision as of 20:29, 7 May 2016

Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium.

This page describes how to install the system, see Matxin#Documentation below for how to create or maintain language pairs.

Installation[edit]

$ git clone https://github.com/matxin/matxin.git
$ cd matxin/
$ ./autogen.sh 
$ make
# make install

Language pairs[edit]

Troubleshooting[edit]

Can't find AP_MKINCLUDE



set your ACLOCAL_PATH to include the path to matxin.m4

Documentation[edit]

Contact[edit]

Questions and comments about Matxin can be sent to their mailing list matxin-devel, or to the apertium-stuff list.

External links[edit]