Difference between revisions of "Matxin"

From Apertium
Jump to navigation Jump to search
 
(36 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
'''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium. The linguistic data available under a free licence is a fraction of the data that is used in the papers and descriptions of the subject, so naturally the translations from the pair will be less good than you can find results in the papers.
+
'''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium.
   
  +
This page describes how to install the system, see [[Matxin#Documentation]] below for how to create or maintain language pairs.
==Contact==
 
 
Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list.
 
 
==Prerequisites==
 
 
* BerkleyDB — sudo apt-get install libdb4.6++-dev
 
* libpcre3 — sudo apt-get install libpcre3-dev
 
 
Install the following libraries in <prefix>,
 
 
* libcfg+ &mdash; http://platon.sk/upload/_projects/00003/libcfg+-0.6.2.tar.gz
 
* libomlet &mdash; https://lafarga.cpl.upc.edu/frs/download.php/130/libomlet-0.97.tar.gz
 
* libfries &mdash; https://lafarga.cpl.upc.edu/frs/download.php/129/libfries-0.95.tar.gz
 
* FreeLing (from SVN) &mdash; (<code>svn co http://devel.cpl.upc.edu/freeling/svn/latest/freeling</code>)
 
:If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix>
 
* [[lttoolbox]] (from SVN) &mdash; (<code>svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox</code>)
 
 
==Building==
 
 
;Checkout
 
 
<pre>
 
$ svn co http://matxin.svn.sourceforge.net/svnroot/matxin
 
</pre>
 
   
  +
==Installation==
Then do the usual:
 
   
 
<pre>
 
<pre>
  +
$ git clone https://github.com/matxin/matxin.git
$ ./configure --prefix=<prefix>
 
  +
$ cd matxin/
  +
$ ./autogen.sh
 
$ make
 
$ make
</pre>
 
 
After you've got it built, do:
 
 
<pre>
 
$ su
 
# export LD_LIBRARY_PATH=/usr/local/lib
 
# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
 
 
# make install
 
# make install
 
</pre>
 
</pre>
   
==Executing==
+
==Language pairs==
   
  +
* [[matxin-spa-eus]]
The default for <code>MATXIN_DIR</code>, if you have not specified a prefix is <code>/usr/local/bin</code>, if you have not specified a prefix, then you should <code>cd /usr/local/bin</code> to make the tests.
 
  +
* [[matxin-eng-eus]]
 
Bundled with Matxin there's a script called <code>Matxin_translator</code> which calls all the necessary modules and interconnects them using UNIX pipes. This is the recommended way of running Matxin for getting translations. <b>This does not work in the given form.</b>
 
 
<pre>
 
$ echo "Esto es una prueba" | ./Matxin_translator -f $MATXIN_DIR/share/matxin/config/es-eu.cfg
 
</pre>
 
 
For advanced uses, you can run each part of the pipe separately and save the output to temporary files for feeding the next modules. <b>At the moment this is the method of coice</b>
 
 
<prefix> is typically /usr/local
 
 
<pre>
 
$ export MATXIN_DIR=<prefix>
 
$ echo "Esto es una prueba" | \
 
./Analyzer -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./LT -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./ST_prep -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./ST_verb -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./SG_inter -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./SG_intra -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./MG -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
 
./reFormat
 
 
Da proba bat hau
 
 
</pre>
 
 
There exists a program txt-deformat calling sequence: txt-deformat format-file input-file. txt-deformat creates an xml file from a normal txt input file. format file has the format:.... (To be done)
 
example (TBD)
 
 
==Speed==
 
 
Between 25--30 words per second.
 
   
 
==Troubleshooting==
 
==Troubleshooting==
   
  +
;Can't find AP_MKINCLUDE
;libfries
 
   
If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the file <code>src/include/fries/language.h</code>.
 
   
If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libfries/RGF.cc</code>.
 
   
Sometimes, people get the error:
 
   
  +
set your <code>ACLOCAL_PATH</code> to include the path to <code>matxin.m4</code>
<pre>
 
configure:2668: error: C++ compiler cannot create executables
 
</pre>
 
   
  +
==Documentation==
Try installing libpcre3-dev and trying again.
 
   
  +
* [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish)
;libomlet
 
  +
* [[Documentation of Matxin]] (in English)
  +
* [[Matxin New Language Pair HOWTO]]
   
  +
==Contact==
If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the file <code>src/libomlet/adaboost.cc</code>.
 
   
  +
Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list.
;FreeLing
 
   
  +
==External links==
If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the files <code>src/utilities/indexdict.cc</code>, <code>src/libmorfo/accents.cc</code>, <code>src/libmorfo/accents_modules.cc</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/tagger.cc</code>, <code>src/libmorfo/punts.cc</code>, <code>src/libmorfo/maco_options.cc</code>, <code>src/libmorfo/splitter.cc</code> <code>src/libmorfo/suffixes.cc</code> <code>src/libmorfo/senses.cc</code> <code>src/libmorfo/hmm_tagger.cc</code>.
 
   
  +
*[http://ixa.si.ehu.es/Ixa IXA Research Group]
If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/libmorfo/automat.cc</code>, <code>src/libmorfo/dates.cc</code>, <code>src/libmorfo/locutions.cc</code>, <code>src/libmorfo/maco.cc</code>, <code>src/libmorfo/np.cc</code>, <code>src/libmorfo/nec.cc</code>, <code>src/libmorfo/numbers.cc</code>, <code>src/libmorfo/numbers_modules.cc</code>, <code>src/libmorfo/quantities.cc</code>, <code>src/libmorfo/tokenizer.cc</code> and <code>src/libmorfo/dates_modules.cc</code>.
 
 
If you get the error to do with "memcpy was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/include/traces.h</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/traces.cc</code> <code>src/libmorfo/senses.cc</code>, <code>src/libmorfo/feature_extractor/fex.cc</code>
 
 
If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libmorfo/feature_extractor/RGF.cc</code>.
 
 
To add the following strings
 
 
<pre>
 
#include <stdlib.h>
 
#include <string.h>
 
#include <algorithm>
 
</pre>
 
 
in the top of every .cc file in the FreeLing-1.5 directory, you can use the following command:
 
 
<pre>
 
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ ./configure
 
..
 
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ find . -type f -name "*.cc" | awk '{ print "echo \"#include <stdlib.h>\n#include <string.h>\n\
 
#include <algorithm>\n\" > " $1 ".new && cat " $1 " >> " $1 ".new && mv " $1 ".new " $1 }' > k
 
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ sh k
 
pasquale@dell:~/stuff/matxin/FreeLing-1.5$ make
 
..
 
</pre>
 
 
If you get the error:
 
 
<pre>
 
In file included from analyzer.cc:72:
 
config.h:32:18: error: cfg+.h: No such file or directory
 
</pre>
 
 
Run <code>make</code> like <code>make CXXFLAGS=-I<prefix>/include</code>
 
 
;Matxin
 
 
If you get the error:
 
 
<pre>
 
g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include -I/usr/local/include/lttoolbox-2.0 -I/usr/include/libxml2 -g -O2 -ansi -march=i686 -O3
 
-fno-pic -fomit-frame-pointer -MT Analyzer.o -MD -MP -MF .deps/Analyzer.Tpo -c -o Analyzer.o Analyzer.C
 
 
--->Analyzer.C:10:22: error: freeling.h: Datei oder Verzeichnis nicht gefunden
 
In file included from Analyzer.C:9:
 
config.h: In constructor 'config::config(char**)':
 
config.h:413: warning: deprecated conversion from string constant to 'char*'
 
Analyzer.C: In function 'void PrintResults(std::list<sentence, std::allocator<sentence> >&, const config&, int&)':
 
Analyzer.C:123: error: aggregate 'std::ofstream log_file' has incomplete type and cannot be defined
 
Analyzer.C:126: error: incomplete type 'std::ofstream' used in nested name s...
 
</pre>
 
 
Then change the header files in <code>src/Analyzer.C</code> to:
 
 
<pre>
 
//#include "freeling.h"
 
 
#include "util.h"
 
#include "tokenizer.h"
 
#include "splitter.h"
 
#include "maco.h"
 
#include "nec.h"
 
#include "senses.h"
 
#include "tagger.h"
 
#include "hmm_tagger.h"
 
#include "relax_tagger.h"
 
#include "chart_parser.h"
 
#include "maco_options.h"
 
#include "dependencies.h"
 
</pre>
 
 
Upon finding yourself battling the following compile problem,
 
 
<pre>
 
Analyzer.C: In function ‘int main(int, char**)’:
 
Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’
 
/home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool)
 
/home/fran/local/include/hmm_tagger.h:84: note: hmm_tagger::hmm_tagger(const hmm_tagger&)
 
Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’
 
/home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool)
 
/home/fran/local/include/relax_tagger.h:51: note: relax_tagger::relax_tagger(const relax_tagger&)
 
Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’
 
/home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&)
 
/home/fran/local/include/senses.h:45: note: senses::senses(const senses&)
 
</pre>
 
 
Make the following changes in the file <code>src/Analyzer.C</code>:
 
 
<pre>
 
if (cfg.TAGGER_which == HMM)
 
- tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect);
 
+ tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false);
 
else if (cfg.TAGGER_which == RELAX)
 
- tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter,
 
+ tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter,
 
cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon,
 
- cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect);
 
+ false);
 
 
if (cfg.NEC_NEClassification)
 
neclass = new nec("NP", cfg.NEC_FilePrefix);
 
 
if (cfg.SENSE_SenseAnnotation!=NONE)
 
- sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis);
 
+ sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis);
 
</pre>
 
 
Then probably there will be issues with actually running Matxin.
 
 
If you get the error:
 
 
<pre>
 
config.h:33:29: error: freeling/traces.h: No such file or directory
 
</pre>
 
 
Then change the header files in <code>src/config.h</code> to:
 
 
<pre>
 
//#include "freeling/traces.h"
 
#include "traces.h"
 
</pre>
 
 
If you get this error:
 
 
<pre>
 
$ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg
 
Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 2. Syntax error: Unexpected 'SETS' found.
 
Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 7. Syntax error: Unexpected 'DetFem' found.
 
Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 10. Syntax error: Unexpected 'VerbPron' found.
 
</pre>
 
 
You can change the tagger from the RelaxCG to HMM, edit the file <code><prefix>/share/matxin/config/es-eu.cfg</code>, and change:
 
 
<pre>
 
#### Tagger options
 
#Tagger=relax
 
Tagger=hmm
 
</pre>
 
 
Then there might be a problem in the dependency grammar:
 
 
<pre>
 
$ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg
 
DEPENDENCIES: Error reading dependencies from '/home/fran/local//share/matxin/freeling/es/dep/dependences.dat'. Unregistered function d:sn.tonto
 
</pre>
 
 
The easiest thing to do here is to just remove references to the stuff it complains about:
 
 
<pre>
 
cat <prefix>/share/matxin/freeling/es/dep/dependences.dat | grep -v d:grup-sp.lemma > newdep
 
cat newdep | grep -v d\.class > newdep2
 
cat newdep2 | grep -v d:sn.tonto > <prefix>/share/matxin/freeling/es/dep/dependences.dat
 
</pre>
 
===Error in db===
 
 
If you get:
 
*SEMDB: Error 13 while opening database /usr/local/share/matxin/freeling/es/dep/../senses16.db
 
 
rebuild senses16.deb from source:
 
*cat senses16.src | indexdict senses16.db
 
* (remove senses16.db before rebuild)
 
 
===Error when reading xml files===
 
 
If xml files read does not work, you get error like:
 
<i>ERROR: invalid document: found <corpus i> when <corpus> was expected...</i>,
 
do following in src/XML_reader.cc do:
 
 
1. add following subroutine after line 43:
 
<pre>
 
wstring
 
mystows(string const &str)
 
{
 
wchar_t* result = new wchar_t[str.size()+1];
 
size_t retval = mbstowcs(result, str.c_str(), str.size());
 
result[retval] = L'\0';
 
wstring result2 = result;
 
delete[] result;
 
return result2;
 
}
 
</pre>
 
2. replace all occurencies of
 
<pre>
 
XMLParseUtil::stows
 
</pre>
 
 
with
 
<pre>
 
mystows
 
</pre>
 
 
==Results of the individual steps:==
 
<pre>
 
--------------------Step1
 
en@anonymous:/usr/local/bin$ echo "Esto es una prueba" | ./Analyzer -f
 
$MATXIN_DIR/share/matxin/config/es-eu.cfg
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus>
 
<SENTENCE ord='1' alloc='0'>
 
<CHUNK ord='2' alloc='5' type='grup-verb' si='top'>
 
<NODE ord='2' alloc='5' form='es' lem='ser' mi='VSIP3S0'>
 
</NODE>
 
<CHUNK ord='1' alloc='0' type='sn' si='subj'>
 
<NODE ord='1' alloc='0' form='Esto' lem='este' mi='PD0NS000'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ord='3' alloc='8' type='sn' si='att'>
 
<NODE ord='4' alloc='12' form='prueba' lem='prueba' mi='NCFS000'>
 
<NODE ord='3' alloc='8' form='una' lem='uno' mi='DI0FS0'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
</corpus>
 
</pre>
 
 
<pre>
 
---------------------Step2
 
[glabaka@siuc05 bin]$ cat /tmp/x | ./LT -f
 
$MATXIN_DIR/share/matxin/config/es-eu.cfg
 
<?xml version='1.0' encoding='UTF-8'?>
 
<corpus >
 
<SENTENCE ref='1' alloc='0'>
 
<CHUNK ref='2' type='adi-kat' alloc='5' si='top'>
 
<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
 
</NODE>
 
<CHUNK ref='1' type='is' alloc='0' si='subj'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ref='3' type='is' alloc='8' si='att'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
</corpus>
 
</pre>
 
 
<pre>
 
----------- step3
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus >
 
<SENTENCE ref='1' alloc='0'>
 
<CHUNK ref='2' type='adi-kat' alloc='5' si='top'>
 
<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
 
</NODE>
 
<CHUNK ref='1' type='is' alloc='0' si='subj'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ref='3' type='is' alloc='8' si='att'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP4
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus >
 
<SENTENCE ref='1' alloc='0'>
 
<CHUNK ref='2' type='adi-kat' alloc='5' si='top' length='1' trans='DU' cas='[ABS]'>
 
<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
 
</NODE>
 
<CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ref='3' type='is' alloc='8' si='att' length='2' cas='[ABS]'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP5
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus >
 
<SENTENCE ref='1' alloc='0'>
 
<CHUNK ref='2' type='adi-kat' alloc='5' si='top' length='1' trans='DU' cas='[ABS]'>
 
<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
 
</NODE>
 
<CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ref='3' type='is' alloc='8' si='att' length='2' cas='[ABS]'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP6
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus >
 
<SENTENCE ref='1' alloc='0'>
 
<CHUNK ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
 
<NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
 
</NODE>
 
<CHUNK ref='1' type='is' alloc='0' si='subj' cas='[ERG]' length='1'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP7
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus >
 
<SENTENCE ref='1' alloc='0'>
 
<CHUNK ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
 
<NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
 
</NODE>
 
<CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP8
 
<?xml version='1.0' encoding='UTF-8'?>
 
<corpus >
 
<SENTENCE ord='1' ref='1' alloc='0'>
 
<CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
 
<NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
 
</NODE>
 
<CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
 
<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
 
<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP9
 
<?xml version='1.0' encoding='UTF-8' ?>
 
<corpus >
 
<SENTENCE ord='1' ref='1' alloc='0'>
 
<CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
 
<NODE ord='0' ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
 
</NODE>
 
<CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
 
<NODE ord='0' ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
 
<NODE ord='0' ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE ord='1' ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------- step10
 
<?xml version='1.0' encoding='UTF-8'?>
 
<corpus >
 
<SENTENCE ord='1' ref='1' alloc='0'>
 
<CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
 
<NODE form='da' ref ='2' alloc ='5' ord='0' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
 
</NODE>
 
<CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
 
<NODE form='hau' ref ='1' alloc ='0' ord='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
 
</NODE>
 
</CHUNK>
 
<CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
 
<NODE form='proba' ref ='4' alloc ='12' ord='0' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
 
<NODE form='bat' ref ='3' alloc ='8' ord='1' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
 
</NODE>
 
</NODE>
 
</CHUNK>
 
</CHUNK>
 
</SENTENCE>
 
 
</corpus>
 
 
-------------STEP11
 
Hau proba bat da
 
 
</pre>
 
 
==Documentation==
 
 
* [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish)
 
* [[Documentation of Matxin]] (in English)
 
   
 
[[Category:Matxin|*]]
 
[[Category:Matxin|*]]

Latest revision as of 20:29, 7 May 2016

Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium.

This page describes how to install the system, see Matxin#Documentation below for how to create or maintain language pairs.

Installation[edit]

$ git clone https://github.com/matxin/matxin.git
$ cd matxin/
$ ./autogen.sh 
$ make
# make install

Language pairs[edit]

Troubleshooting[edit]

Can't find AP_MKINCLUDE



set your ACLOCAL_PATH to include the path to matxin.m4

Documentation[edit]

Contact[edit]

Questions and comments about Matxin can be sent to their mailing list matxin-devel, or to the apertium-stuff list.

External links[edit]