Difference between revisions of "Matxin"

Revision as of 12:18, 5 May 2016

Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium. The linguistic data available under a free licence is a fraction of the data that is used in the papers and descriptions of the subject, so naturally the translations from the pair will be less good than you can find results in the papers.

This page describes how to install the system, see Matxin#Documentation below for how to create or maintain language pairs.

Contact

Questions and comments about Matxin can be sent to their mailing list matxin-devel, or to the apertium-stuff list.

Installation

Documentation

Descripción del sistema de traducción es-eu Matxin (in Spanish)
Documentation of Matxin (in English)
Matxin New Language Pair HOWTO

External links

Difference between revisions of "Matxin"

Revision as of 12:18, 5 May 2016

Contents

Contact

Installation

Documentation

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 8: / Line 8: @@
 Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list.
-==Prerequisites==
+==Installation==
-===Debian/buntu===
-Install freeling-3.1 from the tarball; prerequisites include
-<pre>
-sudo apt-get install libboost-system-dev libicu-dev libboost-regex-dev \
-   libboost-program-options-dev libboost-thread-dev
-</pre>
-Add -lboost_system to the dicc2phon_LDADD line in src/utilities/Makefile.am, should look like:
-<pre>dicc2phon_LDADD = -lfreeling $(FREELING_DEPS) -lboost_system</pre>
-Then <pre>autoreconf -fi
-./configure --prefix=$HOME/PREFIX/freeling
-make
-make install
-</pre>
-Add the [[Debian|nightly repo]] and do
-<pre>
-sudo apt-get install apertium-all-dev foma-bin libfoma0-dev
-</pre>
-Then just
-<pre>
-git clone https://github.com/matxin/matxin
-cd matxin
-export PATH="${PATH}:$HOME/PREFIX/freeling/bin
-export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:$HOME/PREFIX/freeling/lib"
-export PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:$HOME/PREFIX/freeling/share/pkgconfig:$HOME/PREFIX/freeling/lib/pkgconfig"
-export ACLOCAL_PATH="${ACLOCAL_PATH}:$HOME/PREFIX/freeling/share/aclocal"
-autoreconf -fi
-./configure --prefix=$HOME/PREFIX/matxin
-make CPPFLAGS="-I$HOME/PREFIX/freeling/include -I/usr/include -I/usr/include/lttoolbox-3.3 -I/usr/include/libxml2" LDFLAGS="-L$HOME/PREFIX/freeling/lib -L/usr/lib"
-</pre>
-: having to send CPPFLAGS/LDFLAGS to make here seems like an autotools bug?
-=== old prerequisites ===
-* BerkleyDB &mdash; sudo apt-get install libdb4.6++-dev (or libdb4.8++-dev)
-* libpcre3 &mdash; sudo apt-get install libpcre3-dev
-Install the following libraries in <prefix>,
-* libcfg+ &mdash; http://platon.sk/upload/_projects/00003/libcfg+-0.6.2.tar.gz
-* libomlet (from SVN) &mdash; (<code>svn co http://devel.cpl.upc.edu/freeling/svn/latest/omlet</code>)
-* libfries (from SVN) &mdash; (<code>svn co http://devel.cpl.upc.edu/freeling/svn/latest/fries</code>)
-* FreeLing (from SVN) &mdash; (<code>svn co http://devel.cpl.upc.edu/freeling/svn/latest/freeling</code>)
-:If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix>
-* [[lttoolbox]] (from SVN) &mdash; (<code>svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox</code>) Take as a minimum version 3.1.1; 3.1.0 and lower versions cause data error and error messages in Matxin due to a missing string close.
-==Building==
-;Checkout
-<pre>
-$ svn co http://matxin.svn.sourceforge.net/svnroot/matxin
-</pre>
-Then do the usual:
-<pre>
-$ ./configure --prefix=<prefix>
-$ make
-</pre>
-After you've got it built, do:
-<pre>
-$ su
-# export LD_LIBRARY_PATH=/usr/local/lib
-# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
-# make install
-</pre>
-===Mac OS X===
-If you've installed boost etc. with Macports, for the configure step do:
- env LDFLAGS="-L/opt/local/lib -L/opt/local/lib/db46" CPPFLAGS="-I/opt/local/include -I/opt/local/include/db46 -I/path/to/freeling/libcfg+" ./configure
-(their configure script doesn't complain if it can't find db46 or cfg+.h, but make does)
-Also, comment out any references to {txt,html,rtf}-deformat.cc in src/Makefile.am and change data/Makefile.am so that you use gzcat instead of zcat.
-== Executing ==
-The default for <code>MATXIN_DIR</code>, if you have not specified a prefix is <code>/usr/local/bin</code>, if you have not specified a prefix, then you should <code>cd /usr/local/bin</code> to make the tests.
-Bundled with Matxin there's a script called <code>Matxin_translator</code> which calls all the necessary modules and interconnects them using UNIX pipes. This is the recommended way of running Matxin for getting translations.
-<pre>
-$ echo "Esto es una prueba" | ./Matxin_translator -c $MATXIN_DIR/share/matxin/config/es-eu.cfg
-</pre>
-There exists a program txt-deformat calling sequence: txt-deformat format-file input-file. txt-deformat creates an xml file from a normal txt input file. This can be used before ./Analyzer.
-txt-deformat is a plain text format processor. Data should be passed through this processor before being piped to /Analyzer.
-Calling it with -h or --help displays help information.
-You could write the following to show how the word "gener" is analysed:
- echo "gener" | ./txt-deformat | ./Analyzer -f $MATXIN_DIR/share/matxin/config/es-eu.cfg
-For advanced uses, you can run each part of the pipe separately and save the output to temporary files for feeding the next modules.
-=== Spanish-Basque ===
-<prefix> is typically /usr/local
-<pre>
-$ export MATXIN_DIR=<prefix>
-$ echo "Esto es una prueba" |  \
-./Analyzer -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./LT -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./ST_intra -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./ST_prep -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./ST_verb   -f $MATXIN_DIR/share/matxin/config/es-eu.cfg  | \
-./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./SG_inter -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./SG_intra -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./MG -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \
-./reFormat
-Da proba bat hau
-</pre>
-=== English-Basque ===
-Using the above example for English-Basque looks:
-<pre>
-$ cat src/matxinallen.sh
-src/Analyzer -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/LT -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/ST_intra -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/ST_prep -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/ST_verb   -f $MATXIN_DIR/share/matxin/config/en-eu.cfg  | \
-src/ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/SG_inter -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/SG_intra -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/MG -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \
-src/reFormat
-$ echo "This is a test" |  sh src/matxin_allen.sh
-Hau proba da
-$ echo "How are you?" |  sh src/matxin_allen.sh
-Nola zu da?
-$ echo "Otto plays football and tennis" | sh src/matxin_allen.sh
-Otto-ak jokatzen du futbola tenis-a eta
-</pre>
-==Speed==
-Between 25--30 words per second.
-==Troubleshooting==
-===libdb===
-<pre>
-g++  -g -O2 -ansi -march=i686 -O3 -fno-pic
--fomit-frame-pointer  -L/usr/local/lib -L/usr/lib
--o Analyzer Analyzer.o IORedirectHandler.o -lmorfo -lcfg+ -ldb_cxx -lfries -lomlet
--lboost_filesystem -L/usr/local/lib -llttoolbox3 -lxml2 -lpcre
-/usr/local/lib/libmorfo.so: undefined reference to `Db::set_partition_dirs(char const**)
-            [and a lot of similar lines]
-</pre>
-Try installing libdb4.8++-dev[http://sourceforge.net/mailarchive/forum.php?thread_name=1313552553.4706.7316.camel%40eki.dlsi.ua.es&forum_name=matxin-devel]
-===libcfg+===
-If you get the following error:
-<pre>
-ld: ../src/cfg+.o: relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
-</pre>
-Delete the directory, and start from scratch, this time when you call make, call it with <code>make CFLAGS=-fPIC</code>
-===Various errors===
-If you get the error:
-<pre>
-g++ -DHAVE_CONFIG_H -I. -I..   -I/usr/local/include -I/usr/local/include/lttoolbox-2.0 -I/usr/include/libxml2  -g -O2 -ansi -march=i686 -O3
--fno-pic              -fomit-frame-pointer -MT Analyzer.o -MD -MP -MF .deps/Analyzer.Tpo -c -o Analyzer.o Analyzer.C
---->Analyzer.C:10:22: error: freeling.h: Datei oder Verzeichnis nicht gefunden
- In file included from Analyzer.C:9:
- config.h: In constructor 'config::config(char**)':
- config.h:413: warning: deprecated conversion from string constant to 'char*'
- Analyzer.C: In function 'void PrintResults(std::list<sentence, std::allocator<sentence> >&, const config&, int&)':
- Analyzer.C:123: error: aggregate 'std::ofstream log_file' has incomplete type and cannot be defined
- Analyzer.C:126: error: incomplete type 'std::ofstream' used in nested name s...
-</pre>
-Then change the header files in <code>src/Analyzer.C</code> to:
-<pre>
-//#include "freeling.h"
-#include "util.h"
-#include "tokenizer.h"
-#include "splitter.h"
-#include "maco.h"
-#include "nec.h"
-#include "senses.h"
-#include "tagger.h"
-#include "hmm_tagger.h"
-#include "relax_tagger.h"
-#include "chart_parser.h"
-#include "maco_options.h"
-#include "dependencies.h"
-</pre>
-Upon finding yourself battling the following compile problem,
-<pre>
-Analyzer.C: In function ‘int main(int, char**)’:
-Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’
-/home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool)
-/home/fran/local/include/hmm_tagger.h:84: note:                 hmm_tagger::hmm_tagger(const hmm_tagger&)
-Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’
-/home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool)
-/home/fran/local/include/relax_tagger.h:51: note:                 relax_tagger::relax_tagger(const relax_tagger&)
-Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’
-/home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&)
-/home/fran/local/include/senses.h:45: note:                 senses::senses(const senses&)
-</pre>
-Make the following changes in the file <code>src/Analyzer.C</code>:
-<pre>
-   if (cfg.TAGGER_which == HMM)
--    tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect);
-+    tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false);
-   else if (cfg.TAGGER_which == RELAX)
--    tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter,
-+    tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter,
- 			      cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon,
--			      cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect);
-+			      false);
-   if (cfg.NEC_NEClassification)
-     neclass = new nec("NP", cfg.NEC_FilePrefix);
-   if (cfg.SENSE_SenseAnnotation!=NONE)
--    sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis);
-+    sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis);
-</pre>
-Then probably there will be issues with actually running Matxin.
-If you get the error:
-<pre>
-config.h:33:29: error: freeling/traces.h: No such file or directory
-</pre>
-Then change the header files in <code>src/config.h</code> to:
-<pre>
-//#include "freeling/traces.h"
-#include "traces.h"
-</pre>
-If you get this error:
-<pre>
-$ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg
-Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 2. Syntax error: Unexpected 'SETS' found.
-Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 7. Syntax error: Unexpected 'DetFem' found.
-Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 10. Syntax error: Unexpected 'VerbPron' found.
-</pre>
-You can change the tagger from the RelaxCG to HMM, edit the file <code><prefix>/share/matxin/config/es-eu.cfg</code>, and change:
-<pre>
-#### Tagger options
-#Tagger=relax
-Tagger=hmm
-</pre>
-Then there might be a problem in the dependency grammar:
-<pre>
-$ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg
-DEPENDENCIES: Error reading dependencies from '/home/fran/local//share/matxin/freeling/es/dep/dependences.dat'. Unregistered function d:sn.tonto
-</pre>
-The easiest thing to do here is to just remove references to the stuff it complains about:
-<pre>
-cat <prefix>/share/matxin/freeling/es/dep/dependences.dat | grep -v d:grup-sp.lemma > newdep
-cat newdep | grep -v d\.class > newdep2
-cat newdep2 | grep -v d:sn.tonto > <prefix>/share/matxin/freeling/es/dep/dependences.dat
-</pre>
-===Error in db===
-If you get:
-*SEMDB: Error 13 while opening database /usr/local/share/matxin/freeling/es/dep/../senses16.db
-rebuild senses16.deb from source:
-*cat senses16.src | indexdict senses16.db
-* (remove senses16.db before rebuild)
-===Error when reading xml files===
-If xml files read does not work, you get error like:
-<i>ERROR: invalid document: found <corpus i> when <corpus> was expected...</i>,
-do following in src/XML_reader.cc do:
-. add following subroutine after line 43:
-<pre>
-wstring
-mystows(string const &str)
-{
-   wchar_t* result = new wchar_t[str.size()+1];
-   size_t retval = mbstowcs(result, str.c_str(), str.size());
-   result[retval] = L'\0';
-   wstring result2 = result;
-   delete[] result;
-   return result2;
-}
-</pre>
-. replace all occurencies of
-<pre>
-XMLParseUtil::stows
-</pre>
-with
-<pre>
-mystows
-</pre>
-Version 3.1.1 of lttoolbox does not have this error any more.
-==Results of the individual steps:==
-<pre>
---------------------Step1
-en@anonymous:/usr/local/bin$ echo "Esto es una prueba" | ./Analyzer -f
-$MATXIN_DIR/share/matxin/config/es-eu.cfg
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus>
-<SENTENCE ord='1' alloc='0'>
-<CHUNK ord='2' alloc='5' type='grup-verb' si='top'>
-  <NODE ord='2' alloc='5' form='es' lem='ser' mi='VSIP3S0'>
-  </NODE>
-  <CHUNK ord='1' alloc='0' type='sn' si='subj'>
-    <NODE ord='1' alloc='0' form='Esto' lem='este' mi='PD0NS000'>
-    </NODE>
-  </CHUNK>
-  <CHUNK ord='3' alloc='8' type='sn' si='att'>
-    <NODE ord='4' alloc='12' form='prueba' lem='prueba' mi='NCFS000'>
-      <NODE ord='3' alloc='8' form='una' lem='uno' mi='DI0FS0'>
-      </NODE>
-    </NODE>
-  </CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
-</pre>
-<pre>
----------------------Step2
-[glabaka@siuc05 bin]$ cat /tmp/x | ./LT -f
-$MATXIN_DIR/share/matxin/config/es-eu.cfg
-<?xml version='1.0' encoding='UTF-8'?>
-<corpus >
-  <SENTENCE ref='1' alloc='0'>
-    <CHUNK ref='2' type='adi-kat' alloc='5' si='top'>
-       <NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0'  pos='[ADI][SIN]'>
-       </NODE>
-      <CHUNK ref='1' type='is' alloc='0' si='subj'>
-         <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-         </NODE>
-      </CHUNK>
-      <CHUNK ref='3' type='is' alloc='8' si='att'>
-         <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]'  mi='[NUMS]' sem='[BIZ-]'>
-           <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-           </NODE>
-         </NODE>
-      </CHUNK>
-    </CHUNK>
-  </SENTENCE>
-</corpus>
-</pre>
-<pre>
------------ step3
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus >
-<SENTENCE ref='1' alloc='0'>
-<CHUNK ref='2' type='adi-kat' alloc='5' si='top'>
-<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
-</NODE>
-<CHUNK ref='1' type='is' alloc='0' si='subj'>
-<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ref='3' type='is' alloc='8' si='att'>
-<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP4
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus >
-<SENTENCE ref='1' alloc='0'>
-<CHUNK ref='2' type='adi-kat' alloc='5' si='top' length='1' trans='DU' cas='[ABS]'>
-<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
-</NODE>
-<CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
-<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ref='3' type='is' alloc='8' si='att' length='2' cas='[ABS]'>
-<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP5
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus >
-<SENTENCE ref='1' alloc='0'>
-<CHUNK ref='2' type='adi-kat' alloc='5' si='top' length='1' trans='DU' cas='[ABS]'>
-<NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'>
-</NODE>
-<CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
-<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ref='3' type='is' alloc='8' si='att' length='2' cas='[ABS]'>
-<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP6
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus >
-<SENTENCE ref='1' alloc='0'>
-<CHUNK ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
-<NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
-</NODE>
-<CHUNK ref='1' type='is' alloc='0' si='subj' cas='[ERG]' length='1'>
-<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
-<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP7
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus >
-<SENTENCE ref='1' alloc='0'>
-<CHUNK ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
-<NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
-</NODE>
-<CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
-<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
-<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP8
-<?xml version='1.0' encoding='UTF-8'?>
-<corpus >
-<SENTENCE ord='1' ref='1' alloc='0'>
-<CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
-<NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
-</NODE>
-<CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
-<NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
-<NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP9
-<?xml version='1.0' encoding='UTF-8' ?>
-<corpus >
-<SENTENCE ord='1' ref='1' alloc='0'>
-<CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
-<NODE ord='0' ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
-</NODE>
-<CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
-<NODE ord='0' ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
-<NODE ord='0' ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE ord='1' ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------- step10
-<?xml version='1.0' encoding='UTF-8'?>
-<corpus >
-<SENTENCE ord='1' ref='1' alloc='0'>
-<CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'>
-<NODE form='da' ref ='2' alloc ='5' ord='0' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'>
-</NODE>
-<CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'>
-<NODE form='hau' ref ='1' alloc ='0' ord='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'>
-</NODE>
-</CHUNK>
-<CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'>
-<NODE form='proba' ref ='4' alloc ='12' ord='0' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'>
-<NODE form='bat' ref ='3' alloc ='8' ord='1' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'>
-</NODE>
-</NODE>
-</CHUNK>
-</CHUNK>
-</SENTENCE>
-</corpus>
--------------STEP11
-Hau proba bat da
-</pre>
 ==Documentation==