Difference between revisions of "Matxin"
(→Speed) |
|||
Line 123: | Line 123: | ||
Between 25--30 words per second. |
Between 25--30 words per second. |
||
==Troubleshooting== |
|||
;libfries |
|||
If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the file <code>src/include/fries/language.h</code>. |
|||
If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libfries/RGF.cc</code>. |
|||
;libomlet |
|||
If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the file <code>src/libomlet/adaboost.cc</code>. |
|||
;FreeLing |
|||
If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the files <code>src/utilities/indexdict.cc</code>, <code>src/libmorfo/accents.cc</code>, <code>src/libmorfo/accents_modules.cc</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/tagger.cc</code>, <code>src/libmorfo/punts.cc</code>, <code>src/libmorfo/maco_options.cc</code>, <code>src/libmorfo/splitter.cc</code> <code>src/libmorfo/suffixes.cc</code> <code>src/libmorfo/senses.cc</code> <code>src/libmorfo/hmm_tagger.cc</code>. |
|||
If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/libmorfo/automat.cc</code>, <code>src/libmorfo/dates.cc</code>, <code>src/libmorfo/locutions.cc</code>, <code>src/libmorfo/maco.cc</code>, <code>src/libmorfo/np.cc</code>, <code>src/libmorfo/nec.cc</code>, <code>src/libmorfo/numbers.cc</code>, <code>src/libmorfo/numbers_modules.cc</code>, <code>src/libmorfo/quantities.cc</code>, <code>src/libmorfo/tokenizer.cc</code> and <code>src/libmorfo/dates_modules.cc</code>. |
|||
If you get the error to do with "memcpy was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/include/traces.h</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/traces.cc</code> <code>src/libmorfo/senses.cc</code>, <code>src/libmorfo/feature_extractor/fex.cc</code> |
|||
If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libmorfo/feature_extractor/RGF.cc</code>. |
|||
If you get the error: |
|||
<pre> |
|||
In file included from analyzer.cc:72: |
|||
config.h:32:18: error: cfg+.h: No such file or directory |
|||
</pre> |
|||
Run <code>make</code> like <code>make CXXFLAGS=-I/home/fran/local/include</code> |
|||
;Matxin |
|||
If you get the error: |
|||
<pre> |
|||
g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include -I/usr/local/include/lttoolbox-2.0 -I/usr/include/libxml2 -g -O2 -ansi -march=i686 -O3 |
|||
-fno-pic -fomit-frame-pointer -MT Analyzer.o -MD -MP -MF .deps/Analyzer.Tpo -c -o Analyzer.o Analyzer.C |
|||
--->Analyzer.C:10:22: error: freeling.h: Datei oder Verzeichnis nicht gefunden |
|||
In file included from Analyzer.C:9: |
|||
config.h: In constructor 'config::config(char**)': |
|||
config.h:413: warning: deprecated conversion from string constant to 'char*' |
|||
Analyzer.C: In function 'void PrintResults(std::list<sentence, std::allocator<sentence> >&, const config&, int&)': |
|||
Analyzer.C:123: error: aggregate 'std::ofstream log_file' has incomplete type and cannot be defined |
|||
Analyzer.C:126: error: incomplete type 'std::ofstream' used in nested name s... |
|||
</pre> |
|||
Then change the header files in <code>src/Analyzer.C</code> to: |
|||
<pre> |
|||
//#include "freeling.h" |
|||
#include "util.h" |
|||
#include "tokenizer.h" |
|||
#include "splitter.h" |
|||
#include "maco.h" |
|||
#include "nec.h" |
|||
#include "senses.h" |
|||
#include "tagger.h" |
|||
#include "hmm_tagger.h" |
|||
#include "relax_tagger.h" |
|||
#include "chart_parser.h" |
|||
#include "maco_options.h" |
|||
#include "dependencies.h" |
|||
</pre> |
|||
Upon finding yourself battling the following compile problem, |
|||
<pre> |
|||
Analyzer.C: In function ‘int main(int, char**)’: |
|||
Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’ |
|||
/home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool) |
|||
/home/fran/local/include/hmm_tagger.h:84: note: hmm_tagger::hmm_tagger(const hmm_tagger&) |
|||
Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’ |
|||
/home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool) |
|||
/home/fran/local/include/relax_tagger.h:51: note: relax_tagger::relax_tagger(const relax_tagger&) |
|||
Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’ |
|||
/home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&) |
|||
/home/fran/local/include/senses.h:45: note: senses::senses(const senses&) |
|||
</pre> |
|||
Make the following changes in the file <code>src/Analyzer.C</code>: |
|||
<pre> |
|||
if (cfg.TAGGER_which == HMM) |
|||
- tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); |
|||
+ tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false); |
|||
else if (cfg.TAGGER_which == RELAX) |
|||
- tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter, |
|||
+ tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter, |
|||
cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon, |
|||
- cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); |
|||
+ false); |
|||
if (cfg.NEC_NEClassification) |
|||
neclass = new nec("NP", cfg.NEC_FilePrefix); |
|||
if (cfg.SENSE_SenseAnnotation!=NONE) |
|||
- sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis); |
|||
+ sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis); |
|||
</pre> |
|||
==Documentation== |
==Documentation== |
Revision as of 20:32, 16 April 2009
Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium. The linguistic data available under a free-licence is a fraction of the data that is used in the papers and descriptions of the subject, so naturally the translations from the pair will be less good than you can find results in the paper.
Prerequisites
- libcfg+ — http://platon.sk/upload/_projects/00003/libcfg+-0.6.2.tar.gz
- BerkleyDB — sudo apt-get install libdb4.6++-dev
- libomlet — https://lafarga.cpl.upc.edu/frs/download.php/130/libomlet-0.97.tar.gz
- libfries — https://lafarga.cpl.upc.edu/frs/download.php/129/libfries-0.95.tar.gz
- FreeLing (version 1.5) — https://lafarga.cpl.upc.edu/frs/download.php/90/FreeLing-1.5.tar.gz
- If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix>
- lttoolbox (version 2.0 -- yes, this is ancient)
Building
- Checkout
$ svn co http://matxin.svn.sourceforge.net/svnroot/matxin
First comment out the deformatters in the src/Makefile.am
as they don't build properly.
bin_PROGRAMS = Analyzer LT ST_intra ST_inter ST_prep ST_verb SG_inter SG_intra \ MG #reFormat txt-deformat html-deformat rtf-deformat # #reFormat_SOURCES = reFormat.C data_manager.C XML_reader.C simpleregex.C #reFormat_LDADD = -lpcre -lxml2 # #txt_deformat_SOURCES = txt-deformat.C #html_deformat_SOURCES = html-deformat.C #rtf_deformat_SOURCES = rtf-deformat.C # #txt-deformat.C: txt-format.xml Makefile.am deformat.xsl # $(XSLTPROC) --stringparam mode matxin deformat.xsl txt-format.xml | $(FLEX) -Cfer -t >$@ # #html-deformat.C: html-format.xml Makefile.am deformat.xsl # $(XSLTPROC) --stringparam mode matxin deformat.xsl html-format.xml | $(FLEX) -Cfer -t >$@ # #rtf-deformat.C: rtf-format.xml Makefile.am deformat.xsl # $(XSLTPROC) --stringparam mode matxin deformat.xsl rtf-format.xml | $(FLEX) -Cfer -t >$@ #
Then in the file configure.ac
, change
AC_CHECK_LIB(lttoolbox-2.0, main, [], [AC_MSG_ERROR([library 'lttoolbox' is required for Matxin])])
for
PKG_CHECK_MODULES(MATXIN, [lttoolbox-2.0 >= 2.0.0],,exit) CFLAGS="$CFLAGS $MATXIN_CFLAGS" CPPFLAGS="$CPPFLAGS $MATXIN_CPPFLAGS" LIBS="$LIBS $MATXIN_LIBS"
Then you need to comment out the check for gcc version in configure.ac
# # GCC version check # #AC_MSG_CHECKING(for GCC version) #gccver=`$CXX -dumpversion` #AC_MSG_RESULT($gccver); #case "$gccver" in # 4.3*) # AC_MSG_ERROR([GCC version must be <= 4.2]) #esac #
Then do:
$ aclocal; automake -a; autoconf $ LTTOOLBOX_DIR=/home/fran/svnroot/local/unstable/ LDFLAGS="-L/home/fran/local/lib -L/usr/lib" CPPFLAGS="-I/home/fran/local/include" ./configure --prefix=/home/fran/local/
You will probably need to fix a few things in the file: src/Makefile
to get it to build, for example add:
DEFAULT_INCLUDES = -I. -I$(srcdir) -I/usr/include/libxml2 CPPFLAGS = -I<prefix>/include/lttoolbox-2.0 -I<prefix>/include
Executing
$ export MATXIN_DIR=<prefix> $ echo "Esto es una prueba" | \ ./Analyzer -f $MATXIN_DIR/share/matxin/config.cfg | \ ./LT -f $MATXIN_DIR/share/matxin/config.cfg | \ ./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config.cfg | \ ./ST_prep -f $MATXIN_DIR/share/matxin/config.cfg | \ ./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config.cfg | \ ./ST_verb -f $MATXIN_DIR/share/matxin/config.cfg | \ ./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config.cfg | \ ./SG_inter -f $MATXIN_DIR/share/matxin/config.cfg | \ ./SG_intra -f $MATXIN_DIR/share/matxin/config.cfg | \ ./MG -f $MATXIN_DIR/share/matxin/config.cfg | \ ./reFormat Hau prueba bat da
Speed
Between 25--30 words per second.
Troubleshooting
- libfries
If you get the error to do with "strlen was not declared in this scope", add #include <string.h>
to the file src/include/fries/language.h
.
If you get the error to do with "set_union was not declared in this scope", add #include <algorithm>
to the file src/libfries/RGF.cc
.
- libomlet
If you get the error to do with "exit was not declared in this scope", add #include <stdlib.h>
to the file src/libomlet/adaboost.cc
.
- FreeLing
If you get the error to do with "exit was not declared in this scope", add #include <stdlib.h>
to the files src/utilities/indexdict.cc
, src/libmorfo/accents.cc
, src/libmorfo/accents_modules.cc
, src/libmorfo/dictionary.cc
, src/libmorfo/tagger.cc
, src/libmorfo/punts.cc
, src/libmorfo/maco_options.cc
, src/libmorfo/splitter.cc
src/libmorfo/suffixes.cc
src/libmorfo/senses.cc
src/libmorfo/hmm_tagger.cc
.
If you get the error to do with "strlen was not declared in this scope", add #include <string.h>
to the files src/libmorfo/automat.cc
, src/libmorfo/dates.cc
, src/libmorfo/locutions.cc
, src/libmorfo/maco.cc
, src/libmorfo/np.cc
, src/libmorfo/nec.cc
, src/libmorfo/numbers.cc
, src/libmorfo/numbers_modules.cc
, src/libmorfo/quantities.cc
, src/libmorfo/tokenizer.cc
and src/libmorfo/dates_modules.cc
.
If you get the error to do with "memcpy was not declared in this scope", add #include <string.h>
to the files src/include/traces.h
, src/libmorfo/dictionary.cc
, src/libmorfo/traces.cc
src/libmorfo/senses.cc
, src/libmorfo/feature_extractor/fex.cc
If you get the error to do with "set_union was not declared in this scope", add #include <algorithm>
to the file src/libmorfo/feature_extractor/RGF.cc
.
If you get the error:
In file included from analyzer.cc:72: config.h:32:18: error: cfg+.h: No such file or directory
Run make
like make CXXFLAGS=-I/home/fran/local/include
- Matxin
If you get the error:
g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include -I/usr/local/include/lttoolbox-2.0 -I/usr/include/libxml2 -g -O2 -ansi -march=i686 -O3 -fno-pic -fomit-frame-pointer -MT Analyzer.o -MD -MP -MF .deps/Analyzer.Tpo -c -o Analyzer.o Analyzer.C --->Analyzer.C:10:22: error: freeling.h: Datei oder Verzeichnis nicht gefunden In file included from Analyzer.C:9: config.h: In constructor 'config::config(char**)': config.h:413: warning: deprecated conversion from string constant to 'char*' Analyzer.C: In function 'void PrintResults(std::list<sentence, std::allocator<sentence> >&, const config&, int&)': Analyzer.C:123: error: aggregate 'std::ofstream log_file' has incomplete type and cannot be defined Analyzer.C:126: error: incomplete type 'std::ofstream' used in nested name s...
Then change the header files in src/Analyzer.C
to:
//#include "freeling.h" #include "util.h" #include "tokenizer.h" #include "splitter.h" #include "maco.h" #include "nec.h" #include "senses.h" #include "tagger.h" #include "hmm_tagger.h" #include "relax_tagger.h" #include "chart_parser.h" #include "maco_options.h" #include "dependencies.h"
Upon finding yourself battling the following compile problem,
Analyzer.C: In function ‘int main(int, char**)’: Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’ /home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool) /home/fran/local/include/hmm_tagger.h:84: note: hmm_tagger::hmm_tagger(const hmm_tagger&) Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’ /home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool) /home/fran/local/include/relax_tagger.h:51: note: relax_tagger::relax_tagger(const relax_tagger&) Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’ /home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&) /home/fran/local/include/senses.h:45: note: senses::senses(const senses&)
Make the following changes in the file src/Analyzer.C
:
if (cfg.TAGGER_which == HMM) - tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); + tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false); else if (cfg.TAGGER_which == RELAX) - tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter, + tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter, cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon, - cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); + false); if (cfg.NEC_NEClassification) neclass = new nec("NP", cfg.NEC_FilePrefix); if (cfg.SENSE_SenseAnnotation!=NONE) - sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis); + sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis);
Documentation
- Descripción del sistema de traducción es-eu Matxin (in Spanish)