Difference between revisions of "Matxin"
Jump to navigation
Jump to search
(64 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
'''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium. |
'''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium. |
||
This page describes how to install the system, see [[Matxin#Documentation]] below for how to create or maintain language pairs. |
|||
==Prerequisites== |
|||
==Installation== |
|||
* BerkleyDB — sudo apt-get install libdb4.6++-dev |
|||
* libpcre3 — sudo apt-get install libpcre3-dev |
|||
Install the following libraries in <prefix>, |
|||
* libcfg+ — http://platon.sk/upload/_projects/00003/libcfg+-0.6.2.tar.gz |
|||
* libomlet — https://lafarga.cpl.upc.edu/frs/download.php/130/libomlet-0.97.tar.gz |
|||
* libfries — https://lafarga.cpl.upc.edu/frs/download.php/129/libfries-0.95.tar.gz |
|||
* FreeLing (version 1.5) — https://lafarga.cpl.upc.edu/frs/download.php/90/FreeLing-1.5.tar.gz |
|||
:If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix> |
|||
* [[lttoolbox]] (version 2.0 -- yes, this is ancient) — http://fastbull.dl.sourceforge.net/sourceforge/apertium/lttoolbox-2.0.3.tar.gz |
|||
==Building== |
|||
;Checkout |
|||
<pre> |
<pre> |
||
$ |
$ git clone https://github.com/matxin/matxin.git |
||
$ cd matxin/ |
|||
$ ./autogen.sh |
|||
$ make |
|||
# make install |
|||
</pre> |
</pre> |
||
==Language pairs== |
|||
First comment out the deformatters in the <code>src/Makefile.am</code> as they don't build properly. |
|||
* [[matxin-spa-eus]] |
|||
<pre> |
|||
* [[matxin-eng-eus]] |
|||
bin_PROGRAMS = Analyzer LT ST_intra ST_inter ST_prep ST_verb SG_inter SG_intra \ |
|||
MG |
|||
#reFormat txt-deformat html-deformat rtf-deformat |
|||
# |
|||
#reFormat_SOURCES = reFormat.C data_manager.C XML_reader.C simpleregex.C |
|||
#reFormat_LDADD = -lpcre -lxml2 |
|||
# |
|||
#txt_deformat_SOURCES = txt-deformat.C |
|||
#html_deformat_SOURCES = html-deformat.C |
|||
#rtf_deformat_SOURCES = rtf-deformat.C |
|||
# |
|||
#txt-deformat.C: txt-format.xml Makefile.am deformat.xsl |
|||
# $(XSLTPROC) --stringparam mode matxin deformat.xsl txt-format.xml | $(FLEX) -Cfer -t >$@ |
|||
# |
|||
#html-deformat.C: html-format.xml Makefile.am deformat.xsl |
|||
# $(XSLTPROC) --stringparam mode matxin deformat.xsl html-format.xml | $(FLEX) -Cfer -t >$@ |
|||
# |
|||
#rtf-deformat.C: rtf-format.xml Makefile.am deformat.xsl |
|||
# $(XSLTPROC) --stringparam mode matxin deformat.xsl rtf-format.xml | $(FLEX) -Cfer -t >$@ |
|||
# |
|||
</pre> |
|||
Then in the file <code>configure.ac</code>, change |
|||
<pre> |
|||
AC_CHECK_LIB(lttoolbox-2.0, main, [], [AC_MSG_ERROR([library 'lttoolbox' is required for Matxin])]) |
|||
</pre> |
|||
for |
|||
<pre> |
|||
PKG_CHECK_MODULES(MATXIN, [lttoolbox-2.0 >= 2.0.0],,exit) |
|||
CFLAGS="$CFLAGS $MATXIN_CFLAGS" |
|||
CPPFLAGS="$CPPFLAGS $MATXIN_CPPFLAGS" |
|||
LIBS="$LIBS $MATXIN_LIBS" |
|||
</pre> |
|||
Then you need to comment out the check for gcc version in <code>configure.ac</code> |
|||
<pre> |
|||
# |
|||
# GCC version check |
|||
# |
|||
#AC_MSG_CHECKING(for GCC version) |
|||
#gccver=`$CXX -dumpversion` |
|||
#AC_MSG_RESULT($gccver); |
|||
#case "$gccver" in |
|||
# 4.3*) |
|||
# AC_MSG_ERROR([GCC version must be <= 4.2]) |
|||
#esac |
|||
# |
|||
</pre> |
|||
Then do: |
|||
<pre> |
|||
$ aclocal; automake -a; autoconf |
|||
$ LTTOOLBOX_DIR=<prefix> LDFLAGS="-L<prefix>/lib -L/usr/lib" |
|||
CPPFLAGS="-I<prefix>/include" ./configure --prefix=<prefix> |
|||
</pre> |
|||
You will probably need to fix a few things in the file: <code>src/Makefile</code> to get it to build, for example add: |
|||
<pre> |
|||
DEFAULT_INCLUDES = -I. -I$(srcdir) -I/usr/include/libxml2 |
|||
CPPFLAGS = -I<prefix>/include/lttoolbox-2.0 -I<prefix>/include |
|||
</pre> |
|||
After you've got it built, do: |
|||
<pre> |
|||
$ make install |
|||
</pre> |
|||
==Executing== |
|||
<pre> |
|||
$ export MATXIN_DIR=<prefix> |
|||
$ echo "Esto es una prueba" | \ |
|||
./Analyzer -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./LT -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./ST_prep -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./ST_verb -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./SG_inter -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./SG_intra -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./MG -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
|||
./reFormat |
|||
Da proba bat hau |
|||
</pre> |
|||
==Speed== |
|||
Between 25--30 words per second. |
|||
==Troubleshooting== |
==Troubleshooting== |
||
;Can't find AP_MKINCLUDE |
|||
;libfries |
|||
If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the file <code>src/include/fries/language.h</code>. |
|||
If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libfries/RGF.cc</code>. |
|||
Sometimes, people get the error: |
|||
set your <code>ACLOCAL_PATH</code> to include the path to <code>matxin.m4</code> |
|||
<pre> |
|||
configure:2668: error: C++ compiler cannot create executables |
|||
</pre> |
|||
==Documentation== |
|||
Try installing libpcre3-dev and trying again. |
|||
* [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish) |
|||
;libomlet |
|||
* [[Documentation of Matxin]] (in English) |
|||
* [[Matxin New Language Pair HOWTO]] |
|||
==Contact== |
|||
If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the file <code>src/libomlet/adaboost.cc</code>. |
|||
Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list. |
|||
;FreeLing |
|||
==External links== |
|||
If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the files <code>src/utilities/indexdict.cc</code>, <code>src/libmorfo/accents.cc</code>, <code>src/libmorfo/accents_modules.cc</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/tagger.cc</code>, <code>src/libmorfo/punts.cc</code>, <code>src/libmorfo/maco_options.cc</code>, <code>src/libmorfo/splitter.cc</code> <code>src/libmorfo/suffixes.cc</code> <code>src/libmorfo/senses.cc</code> <code>src/libmorfo/hmm_tagger.cc</code>. |
|||
*[http://ixa.si.ehu.es/Ixa IXA Research Group] |
|||
If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/libmorfo/automat.cc</code>, <code>src/libmorfo/dates.cc</code>, <code>src/libmorfo/locutions.cc</code>, <code>src/libmorfo/maco.cc</code>, <code>src/libmorfo/np.cc</code>, <code>src/libmorfo/nec.cc</code>, <code>src/libmorfo/numbers.cc</code>, <code>src/libmorfo/numbers_modules.cc</code>, <code>src/libmorfo/quantities.cc</code>, <code>src/libmorfo/tokenizer.cc</code> and <code>src/libmorfo/dates_modules.cc</code>. |
|||
If you get the error to do with "memcpy was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/include/traces.h</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/traces.cc</code> <code>src/libmorfo/senses.cc</code>, <code>src/libmorfo/feature_extractor/fex.cc</code> |
|||
If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libmorfo/feature_extractor/RGF.cc</code>. |
|||
To add the following strings |
|||
<pre> |
|||
#include <stdlib.h> |
|||
#include <string.h> |
|||
#include <algorithm> |
|||
</pre> |
|||
in the top of every .cc file in the FreeLing-1.5 directory, you can use the following command: |
|||
<pre> |
|||
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ ./configure |
|||
.. |
|||
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ find . -type f -name "*.cc" | awk '{ print "echo \"#include <stdlib.h>\n#include <string.h>\n\ |
|||
#include <algorithm>\n\" > " $1 ".new && cat " $1 " >> " $1 ".new && mv " $1 ".new " $1 }' > k |
|||
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ sh k |
|||
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ make |
|||
.. |
|||
</pre> |
|||
If you get the error: |
|||
<pre> |
|||
In file included from analyzer.cc:72: |
|||
config.h:32:18: error: cfg+.h: No such file or directory |
|||
</pre> |
|||
Run <code>make</code> like <code>make CXXFLAGS=-I<prefix>/include</code> |
|||
;Matxin |
|||
If you get the error: |
|||
<pre> |
|||
g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include -I/usr/local/include/lttoolbox-2.0 -I/usr/include/libxml2 -g -O2 -ansi -march=i686 -O3 |
|||
-fno-pic -fomit-frame-pointer -MT Analyzer.o -MD -MP -MF .deps/Analyzer.Tpo -c -o Analyzer.o Analyzer.C |
|||
--->Analyzer.C:10:22: error: freeling.h: Datei oder Verzeichnis nicht gefunden |
|||
In file included from Analyzer.C:9: |
|||
config.h: In constructor 'config::config(char**)': |
|||
config.h:413: warning: deprecated conversion from string constant to 'char*' |
|||
Analyzer.C: In function 'void PrintResults(std::list<sentence, std::allocator<sentence> >&, const config&, int&)': |
|||
Analyzer.C:123: error: aggregate 'std::ofstream log_file' has incomplete type and cannot be defined |
|||
Analyzer.C:126: error: incomplete type 'std::ofstream' used in nested name s... |
|||
</pre> |
|||
Then change the header files in <code>src/Analyzer.C</code> to: |
|||
<pre> |
|||
//#include "freeling.h" |
|||
#include "util.h" |
|||
#include "tokenizer.h" |
|||
#include "splitter.h" |
|||
#include "maco.h" |
|||
#include "nec.h" |
|||
#include "senses.h" |
|||
#include "tagger.h" |
|||
#include "hmm_tagger.h" |
|||
#include "relax_tagger.h" |
|||
#include "chart_parser.h" |
|||
#include "maco_options.h" |
|||
#include "dependencies.h" |
|||
</pre> |
|||
Upon finding yourself battling the following compile problem, |
|||
<pre> |
|||
Analyzer.C: In function ‘int main(int, char**)’: |
|||
Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’ |
|||
/home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool) |
|||
/home/fran/local/include/hmm_tagger.h:84: note: hmm_tagger::hmm_tagger(const hmm_tagger&) |
|||
Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’ |
|||
/home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool) |
|||
/home/fran/local/include/relax_tagger.h:51: note: relax_tagger::relax_tagger(const relax_tagger&) |
|||
Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’ |
|||
/home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&) |
|||
/home/fran/local/include/senses.h:45: note: senses::senses(const senses&) |
|||
</pre> |
|||
Make the following changes in the file <code>src/Analyzer.C</code>: |
|||
<pre> |
|||
if (cfg.TAGGER_which == HMM) |
|||
- tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); |
|||
+ tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false); |
|||
else if (cfg.TAGGER_which == RELAX) |
|||
- tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter, |
|||
+ tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter, |
|||
cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon, |
|||
- cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); |
|||
+ false); |
|||
if (cfg.NEC_NEClassification) |
|||
neclass = new nec("NP", cfg.NEC_FilePrefix); |
|||
if (cfg.SENSE_SenseAnnotation!=NONE) |
|||
- sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis); |
|||
+ sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis); |
|||
</pre> |
|||
Then probably there will be issues with actually running Matxin. |
|||
If you get this error: |
|||
<pre> |
|||
$ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg |
|||
Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 2. Syntax error: Unexpected 'SETS' found. |
|||
Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 7. Syntax error: Unexpected 'DetFem' found. |
|||
Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 10. Syntax error: Unexpected 'VerbPron' found. |
|||
</pre> |
|||
You can change the tagger from the RelaxCG to HMM, edit the file <code><prefix>/share/matxin/config/es-eu.cfg</code>, and change: |
|||
<pre> |
|||
#### Tagger options |
|||
#Tagger=relax |
|||
Tagger=hmm |
|||
</pre> |
|||
Then there might be a problem in the dependency grammar: |
|||
<pre> |
|||
$ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg |
|||
DEPENDENCIES: Error reading dependencies from '/home/fran/local//share/matxin/freeling/es/dep/dependences.dat'. Unregistered function d:sn.tonto |
|||
</pre> |
|||
The easiest thing to do here is to just remove references to the stuff it complains about: |
|||
<pre> |
|||
cat <prefix>/share/matxin/freeling/es/dep/dependences.dat | grep -v d:grup-sp.lemma > newdep |
|||
cat newdep | grep -v d\.class > newdep2 |
|||
cat newdep2 | grep -v d:sn.tonto > <prefix>/share/matxin/freeling/es/dep/dependences.dat |
|||
</pre> |
|||
==Documentation== |
|||
* [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish) |
|||
[[Category: |
[[Category:Matxin|*]] |
Latest revision as of 20:29, 7 May 2016
Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium.
This page describes how to install the system, see Matxin#Documentation below for how to create or maintain language pairs.
Installation[edit]
$ git clone https://github.com/matxin/matxin.git $ cd matxin/ $ ./autogen.sh $ make # make install
Language pairs[edit]
Troubleshooting[edit]
- Can't find AP_MKINCLUDE
set your ACLOCAL_PATH
to include the path to matxin.m4
Documentation[edit]
- Descripción del sistema de traducción es-eu Matxin (in Spanish)
- Documentation of Matxin (in English)
- Matxin New Language Pair HOWTO
Contact[edit]
Questions and comments about Matxin can be sent to their mailing list matxin-devel, or to the apertium-stuff list.