Difference between revisions of "Matxin"
Jump to navigation
Jump to search
(26 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
{{TOCD}} |
{{TOCD}} |
||
− | '''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium. |
+ | '''Matxin''' is a free software machine translation engine related to [[Apertium]]. It allows for deeper transfer than can be found in Apertium. |
+ | This page describes how to install the system, see [[Matxin#Documentation]] below for how to create or maintain language pairs. |
||
− | ==Contact== |
||
− | |||
− | Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list. |
||
− | |||
− | ==Prerequisites== |
||
− | |||
− | * BerkleyDB — sudo apt-get install libdb4.6++-dev |
||
− | * libpcre3 — sudo apt-get install libpcre3-dev |
||
− | |||
− | Install the following libraries in <prefix>, |
||
− | |||
− | * libcfg+ — http://platon.sk/upload/_projects/00003/libcfg+-0.6.2.tar.gz |
||
− | * libomlet — https://lafarga.cpl.upc.edu/frs/download.php/130/libomlet-0.97.tar.gz |
||
− | * libfries — https://lafarga.cpl.upc.edu/frs/download.php/129/libfries-0.95.tar.gz |
||
− | * FreeLing (from SVN) — (<code>svn co http://devel.cpl.upc.edu/freeling/svn/latest/freeling</code>) |
||
− | :If you're installing into a prefix, you'll need to set two environment variables: CPPFLAGS=-I<prefix>/include LDFLAGS=-L<prefix>/lib ./configure --prefix=<prefix> |
||
− | * [[lttoolbox]] (from SVN) — (<code>svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox</code>) Take as a minimum version 3.1.1; 3.1.0 and lower versions cause data error and error messages in Matxin due to a missing string close. |
||
− | |||
− | ==Building== |
||
− | |||
− | ;Checkout |
||
− | |||
− | <pre> |
||
− | $ svn co http://matxin.svn.sourceforge.net/svnroot/matxin |
||
− | </pre> |
||
+ | ==Installation== |
||
− | Then do the usual: |
||
<pre> |
<pre> |
||
+ | $ git clone https://github.com/matxin/matxin.git |
||
− | $ ./configure --prefix=<prefix> |
||
+ | $ cd matxin/ |
||
+ | $ ./autogen.sh |
||
$ make |
$ make |
||
− | </pre> |
||
− | |||
− | After you've got it built, do: |
||
− | |||
− | <pre> |
||
− | $ su |
||
− | # export LD_LIBRARY_PATH=/usr/local/lib |
||
− | # export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig |
||
# make install |
# make install |
||
</pre> |
</pre> |
||
− | == |
+ | ==Language pairs== |
+ | * [[matxin-spa-eus]] |
||
− | The default for <code>MATXIN_DIR</code>, if you have not specified a prefix is <code>/usr/local/bin</code>, if you have not specified a prefix, then you should <code>cd /usr/local/bin</code> to make the tests. |
||
+ | * [[matxin-eng-eus]] |
||
− | |||
− | Bundled with Matxin there's a script called <code>Matxin_translator</code> which calls all the necessary modules and interconnects them using UNIX pipes. This is the recommended way of running Matxin for getting translations. <b>This does not work in the given form.</b> |
||
− | |||
− | <pre> |
||
− | $ echo "Esto es una prueba" | ./Matxin_translator -f $MATXIN_DIR/share/matxin/config/es-eu.cfg |
||
− | </pre> |
||
− | |||
− | There exists a program txt-deformat calling sequence: txt-deformat format-file input-file. txt-deformat creates an xml file from a normal txt input file. This can be used before ./Analyzer. |
||
− | |||
− | txt-deformat is an HTML format processor. Data should be passed through this |
||
− | processor before being piped to /Analyzer. The program takes input |
||
− | in the form of an HTML document and produces output suitable for |
||
− | processing with lt-proc. HTML tags and other format information are |
||
− | enclosed in brackets so that lt-proc treats them as whitespace between |
||
− | words. |
||
− | |||
− | Calling it with -h or --help displays help information. |
||
− | You could write the following to show how the word "gener" is analysed: |
||
− | |||
− | echo "<b>gener</b>" | ./txt-deformat | ./Analyzer -f $MATXIN_DIR/share/matxin/config/es-eu.cfg |
||
− | |||
− | For advanced uses, you can run each part of the pipe separately and save the output to temporary files for feeding the next modules. <b>At the moment this is the method of choice</b> |
||
− | ===Spanish-Basque=== |
||
− | <prefix> is typically /usr/local |
||
− | |||
− | <pre> |
||
− | $ export MATXIN_DIR=<prefix> |
||
− | $ echo "Esto es una prueba" | \ |
||
− | ./Analyzer -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./LT -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./ST_intra -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./ST_prep -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./ST_verb -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./SG_inter -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./SG_intra -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./MG -f $MATXIN_DIR/share/matxin/config/es-eu.cfg | \ |
||
− | ./reFormat |
||
− | |||
− | Da proba bat hau |
||
− | |||
− | </pre> |
||
− | ===English-Basque=== |
||
− | Using the above example for English-Basque looks: |
||
− | <pre> |
||
− | |||
− | $ cat src/matxinallen.sh |
||
− | src/Analyzer -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/LT -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/ST_intra -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/ST_inter --inter 1 -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/ST_prep -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/ST_inter --inter 2 -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/ST_verb -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/ST_inter --inter 3 -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/SG_inter -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/SG_intra -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/MG -f $MATXIN_DIR/share/matxin/config/en-eu.cfg | \ |
||
− | src/reFormat |
||
− | |||
− | $ echo "This is a test" | sh src/matxin_allen.sh |
||
− | Hau proba da |
||
− | |||
− | $ echo "How are you?" | sh src/matxin_allen.sh |
||
− | Nola zu da? |
||
− | |||
− | $ echo "Otto plays football and tennis" | sh src/matxin_allen.sh |
||
− | Otto-ak jokatzen du futbola tenis-a eta |
||
− | |||
− | </pre> |
||
− | |||
− | ==Speed== |
||
− | |||
− | Between 25--30 words per second. |
||
==Troubleshooting== |
==Troubleshooting== |
||
+ | ;Can't find AP_MKINCLUDE |
||
− | ;libfries |
||
− | If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the file <code>src/include/fries/language.h</code>. |
||
− | If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libfries/RGF.cc</code>. |
||
− | Sometimes, people get the error: |
||
+ | set your <code>ACLOCAL_PATH</code> to include the path to <code>matxin.m4</code> |
||
− | <pre> |
||
− | configure:2668: error: C++ compiler cannot create executables |
||
− | </pre> |
||
+ | ==Documentation== |
||
− | Try installing libpcre3-dev and trying again. |
||
+ | * [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish) |
||
− | ;libomlet |
||
+ | * [[Documentation of Matxin]] (in English) |
||
+ | * [[Matxin New Language Pair HOWTO]] |
||
+ | ==Contact== |
||
− | If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the file <code>src/libomlet/adaboost.cc</code>. |
||
+ | Questions and comments about Matxin can be sent to their mailing list [https://lists.sourceforge.net/lists/listinfo/matxin-devel matxin-devel], or to the [https://lists.sourceforge.net/lists/listinfo/apertium-stuff apertium-stuff] list. |
||
− | ;FreeLing |
||
+ | ==External links== |
||
− | If you get the error to do with "exit was not declared in this scope", add <code>#include <stdlib.h></code> to the files <code>src/utilities/indexdict.cc</code>, <code>src/libmorfo/accents.cc</code>, <code>src/libmorfo/accents_modules.cc</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/tagger.cc</code>, <code>src/libmorfo/punts.cc</code>, <code>src/libmorfo/maco_options.cc</code>, <code>src/libmorfo/splitter.cc</code> <code>src/libmorfo/suffixes.cc</code> <code>src/libmorfo/senses.cc</code> <code>src/libmorfo/hmm_tagger.cc</code>. |
||
+ | *[http://ixa.si.ehu.es/Ixa IXA Research Group] |
||
− | If you get the error to do with "strlen was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/libmorfo/automat.cc</code>, <code>src/libmorfo/dates.cc</code>, <code>src/libmorfo/locutions.cc</code>, <code>src/libmorfo/maco.cc</code>, <code>src/libmorfo/np.cc</code>, <code>src/libmorfo/nec.cc</code>, <code>src/libmorfo/numbers.cc</code>, <code>src/libmorfo/numbers_modules.cc</code>, <code>src/libmorfo/quantities.cc</code>, <code>src/libmorfo/tokenizer.cc</code> and <code>src/libmorfo/dates_modules.cc</code>. |
||
− | |||
− | If you get the error to do with "memcpy was not declared in this scope", add <code>#include <string.h></code> to the files <code>src/include/traces.h</code>, <code>src/libmorfo/dictionary.cc</code>, <code>src/libmorfo/traces.cc</code> <code>src/libmorfo/senses.cc</code>, <code>src/libmorfo/feature_extractor/fex.cc</code> |
||
− | |||
− | If you get the error to do with "set_union was not declared in this scope", add <code>#include <algorithm></code> to the file <code>src/libmorfo/feature_extractor/RGF.cc</code>. |
||
− | |||
− | To add the following strings |
||
− | |||
− | <pre> |
||
− | #include <stdlib.h> |
||
− | #include <string.h> |
||
− | #include <algorithm> |
||
− | </pre> |
||
− | |||
− | in the top of every .cc file in the FreeLing-1.5 directory, you can use the following command: |
||
− | |||
− | <pre> |
||
− | pasquale@dell:~/stuff/matxin/FreeLing-1.5$ ./configure |
||
− | .. |
||
− | pasquale@dell:~/stuff/matxin/FreeLing-1.5$ find . -type f -name "*.cc" | awk '{ print "echo \"#include <stdlib.h>\n#include <string.h>\n\ |
||
− | #include <algorithm>\n\" > " $1 ".new && cat " $1 " >> " $1 ".new && mv " $1 ".new " $1 }' > k |
||
− | pasquale@dell:~/stuff/matxin/FreeLing-1.5$ sh k |
||
− | pasquale@dell:~/stuff/matxin/FreeLing-1.5$ make |
||
− | .. |
||
− | </pre> |
||
− | |||
− | If you get the error: |
||
− | |||
− | <pre> |
||
− | In file included from analyzer.cc:72: |
||
− | config.h:32:18: error: cfg+.h: No such file or directory |
||
− | </pre> |
||
− | |||
− | Run <code>make</code> like <code>make CXXFLAGS=-I<prefix>/include</code> |
||
− | |||
− | ;Matxin |
||
− | |||
− | If you get the error: |
||
− | |||
− | <pre> |
||
− | g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include -I/usr/local/include/lttoolbox-2.0 -I/usr/include/libxml2 -g -O2 -ansi -march=i686 -O3 |
||
− | -fno-pic -fomit-frame-pointer -MT Analyzer.o -MD -MP -MF .deps/Analyzer.Tpo -c -o Analyzer.o Analyzer.C |
||
− | |||
− | --->Analyzer.C:10:22: error: freeling.h: Datei oder Verzeichnis nicht gefunden |
||
− | In file included from Analyzer.C:9: |
||
− | config.h: In constructor 'config::config(char**)': |
||
− | config.h:413: warning: deprecated conversion from string constant to 'char*' |
||
− | Analyzer.C: In function 'void PrintResults(std::list<sentence, std::allocator<sentence> >&, const config&, int&)': |
||
− | Analyzer.C:123: error: aggregate 'std::ofstream log_file' has incomplete type and cannot be defined |
||
− | Analyzer.C:126: error: incomplete type 'std::ofstream' used in nested name s... |
||
− | </pre> |
||
− | |||
− | Then change the header files in <code>src/Analyzer.C</code> to: |
||
− | |||
− | <pre> |
||
− | //#include "freeling.h" |
||
− | |||
− | #include "util.h" |
||
− | #include "tokenizer.h" |
||
− | #include "splitter.h" |
||
− | #include "maco.h" |
||
− | #include "nec.h" |
||
− | #include "senses.h" |
||
− | #include "tagger.h" |
||
− | #include "hmm_tagger.h" |
||
− | #include "relax_tagger.h" |
||
− | #include "chart_parser.h" |
||
− | #include "maco_options.h" |
||
− | #include "dependencies.h" |
||
− | </pre> |
||
− | |||
− | Upon finding yourself battling the following compile problem, |
||
− | |||
− | <pre> |
||
− | Analyzer.C: In function ‘int main(int, char**)’: |
||
− | Analyzer.C:226: error: no matching function for call to ‘hmm_tagger::hmm_tagger(std::string, char*&, int&, int&)’ |
||
− | /home/fran/local/include/hmm_tagger.h:108: note: candidates are: hmm_tagger::hmm_tagger(const std::string&, const std::string&, bool) |
||
− | /home/fran/local/include/hmm_tagger.h:84: note: hmm_tagger::hmm_tagger(const hmm_tagger&) |
||
− | Analyzer.C:230: error: no matching function for call to ‘relax_tagger::relax_tagger(char*&, int&, double&, double&, int&, int&)’ |
||
− | /home/fran/local/include/relax_tagger.h:74: note: candidates are: relax_tagger::relax_tagger(const std::string&, int, double, double, bool) |
||
− | /home/fran/local/include/relax_tagger.h:51: note: relax_tagger::relax_tagger(const relax_tagger&) |
||
− | Analyzer.C:236: error: no matching function for call to ‘senses::senses(char*&, int&)’ |
||
− | /home/fran/local/include/senses.h:52: note: candidates are: senses::senses(const std::string&) |
||
− | /home/fran/local/include/senses.h:45: note: senses::senses(const senses&) |
||
− | </pre> |
||
− | |||
− | Make the following changes in the file <code>src/Analyzer.C</code>: |
||
− | |||
− | <pre> |
||
− | if (cfg.TAGGER_which == HMM) |
||
− | - tagger = new hmm_tagger(cfg.Lang, cfg.TAGGER_HMMFile, cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); |
||
− | + tagger = new hmm_tagger(string(cfg.Lang), string(cfg.TAGGER_HMMFile), false); |
||
− | else if (cfg.TAGGER_which == RELAX) |
||
− | - tagger = new relax_tagger(cfg.TAGGER_RelaxFile, cfg.TAGGER_RelaxMaxIter, |
||
− | + tagger = new relax_tagger(string(cfg.TAGGER_RelaxFile), cfg.TAGGER_RelaxMaxIter, |
||
− | cfg.TAGGER_RelaxScaleFactor, cfg.TAGGER_RelaxEpsilon, |
||
− | - cfg.TAGGER_Retokenize, cfg.TAGGER_ForceSelect); |
||
− | + false); |
||
− | |||
− | if (cfg.NEC_NEClassification) |
||
− | neclass = new nec("NP", cfg.NEC_FilePrefix); |
||
− | |||
− | if (cfg.SENSE_SenseAnnotation!=NONE) |
||
− | - sens = new senses(cfg.SENSE_SenseFile, cfg.SENSE_DuplicateAnalysis); |
||
− | + sens = new senses(string(cfg.SENSE_SenseFile)); //, cfg.SENSE_DuplicateAnalysis); |
||
− | </pre> |
||
− | |||
− | Then probably there will be issues with actually running Matxin. |
||
− | |||
− | If you get the error: |
||
− | |||
− | <pre> |
||
− | config.h:33:29: error: freeling/traces.h: No such file or directory |
||
− | </pre> |
||
− | |||
− | Then change the header files in <code>src/config.h</code> to: |
||
− | |||
− | <pre> |
||
− | //#include "freeling/traces.h" |
||
− | #include "traces.h" |
||
− | </pre> |
||
− | |||
− | If you get this error: |
||
− | |||
− | <pre> |
||
− | $ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg |
||
− | Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 2. Syntax error: Unexpected 'SETS' found. |
||
− | Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 7. Syntax error: Unexpected 'DetFem' found. |
||
− | Constraint Grammar '/home/fran/local//share/matxin/freeling/es/constr_gram.dat'. Line 10. Syntax error: Unexpected 'VerbPron' found. |
||
− | </pre> |
||
− | |||
− | You can change the tagger from the RelaxCG to HMM, edit the file <code><prefix>/share/matxin/config/es-eu.cfg</code>, and change: |
||
− | |||
− | <pre> |
||
− | #### Tagger options |
||
− | #Tagger=relax |
||
− | Tagger=hmm |
||
− | </pre> |
||
− | |||
− | Then there might be a problem in the dependency grammar: |
||
− | |||
− | <pre> |
||
− | $ echo "Esto es una prueba" | ./Analyzer -f /home/fran/local/share/matxin/config/es-eu.cfg |
||
− | DEPENDENCIES: Error reading dependencies from '/home/fran/local//share/matxin/freeling/es/dep/dependences.dat'. Unregistered function d:sn.tonto |
||
− | </pre> |
||
− | |||
− | The easiest thing to do here is to just remove references to the stuff it complains about: |
||
− | |||
− | <pre> |
||
− | cat <prefix>/share/matxin/freeling/es/dep/dependences.dat | grep -v d:grup-sp.lemma > newdep |
||
− | cat newdep | grep -v d\.class > newdep2 |
||
− | cat newdep2 | grep -v d:sn.tonto > <prefix>/share/matxin/freeling/es/dep/dependences.dat |
||
− | </pre> |
||
− | ===Error in db=== |
||
− | |||
− | If you get: |
||
− | *SEMDB: Error 13 while opening database /usr/local/share/matxin/freeling/es/dep/../senses16.db |
||
− | |||
− | rebuild senses16.deb from source: |
||
− | *cat senses16.src | indexdict senses16.db |
||
− | * (remove senses16.db before rebuild) |
||
− | |||
− | ===Error when reading xml files=== |
||
− | |||
− | If xml files read does not work, you get error like: |
||
− | <i>ERROR: invalid document: found <corpus i> when <corpus> was expected...</i>, |
||
− | do following in src/XML_reader.cc do: |
||
− | |||
− | 1. add following subroutine after line 43: |
||
− | <pre> |
||
− | wstring |
||
− | mystows(string const &str) |
||
− | { |
||
− | wchar_t* result = new wchar_t[str.size()+1]; |
||
− | size_t retval = mbstowcs(result, str.c_str(), str.size()); |
||
− | result[retval] = L'\0'; |
||
− | wstring result2 = result; |
||
− | delete[] result; |
||
− | return result2; |
||
− | } |
||
− | </pre> |
||
− | 2. replace all occurencies of |
||
− | <pre> |
||
− | XMLParseUtil::stows |
||
− | </pre> |
||
− | |||
− | with |
||
− | <pre> |
||
− | mystows |
||
− | </pre> |
||
− | |||
− | ==Results of the individual steps:== |
||
− | <pre> |
||
− | --------------------Step1 |
||
− | en@anonymous:/usr/local/bin$ echo "Esto es una prueba" | ./Analyzer -f |
||
− | $MATXIN_DIR/share/matxin/config/es-eu.cfg |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus> |
||
− | <SENTENCE ord='1' alloc='0'> |
||
− | <CHUNK ord='2' alloc='5' type='grup-verb' si='top'> |
||
− | <NODE ord='2' alloc='5' form='es' lem='ser' mi='VSIP3S0'> |
||
− | </NODE> |
||
− | <CHUNK ord='1' alloc='0' type='sn' si='subj'> |
||
− | <NODE ord='1' alloc='0' form='Esto' lem='este' mi='PD0NS000'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ord='3' alloc='8' type='sn' si='att'> |
||
− | <NODE ord='4' alloc='12' form='prueba' lem='prueba' mi='NCFS000'> |
||
− | <NODE ord='3' alloc='8' form='una' lem='uno' mi='DI0FS0'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | </corpus> |
||
− | </pre> |
||
− | |||
− | <pre> |
||
− | ---------------------Step2 |
||
− | [glabaka@siuc05 bin]$ cat /tmp/x | ./LT -f |
||
− | $MATXIN_DIR/share/matxin/config/es-eu.cfg |
||
− | <?xml version='1.0' encoding='UTF-8'?> |
||
− | <corpus > |
||
− | <SENTENCE ref='1' alloc='0'> |
||
− | <CHUNK ref='2' type='adi-kat' alloc='5' si='top'> |
||
− | <NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'> |
||
− | </NODE> |
||
− | <CHUNK ref='1' type='is' alloc='0' si='subj'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ref='3' type='is' alloc='8' si='att'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | </corpus> |
||
− | </pre> |
||
− | |||
− | <pre> |
||
− | ----------- step3 |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus > |
||
− | <SENTENCE ref='1' alloc='0'> |
||
− | <CHUNK ref='2' type='adi-kat' alloc='5' si='top'> |
||
− | <NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'> |
||
− | </NODE> |
||
− | <CHUNK ref='1' type='is' alloc='0' si='subj'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ref='3' type='is' alloc='8' si='att'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP4 |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus > |
||
− | <SENTENCE ref='1' alloc='0'> |
||
− | <CHUNK ref='2' type='adi-kat' alloc='5' si='top' length='1' trans='DU' cas='[ABS]'> |
||
− | <NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'> |
||
− | </NODE> |
||
− | <CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ref='3' type='is' alloc='8' si='att' length='2' cas='[ABS]'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP5 |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus > |
||
− | <SENTENCE ref='1' alloc='0'> |
||
− | <CHUNK ref='2' type='adi-kat' alloc='5' si='top' length='1' trans='DU' cas='[ABS]'> |
||
− | <NODE ref='2' alloc='5' UpCase='none' lem='_izan_' mi='VSIP3S0' pos='[ADI][SIN]'> |
||
− | </NODE> |
||
− | <CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ref='3' type='is' alloc='8' si='att' length='2' cas='[ABS]'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP6 |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus > |
||
− | <SENTENCE ref='1' alloc='0'> |
||
− | <CHUNK ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'> |
||
− | <NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'> |
||
− | </NODE> |
||
− | <CHUNK ref='1' type='is' alloc='0' si='subj' cas='[ERG]' length='1'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP7 |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus > |
||
− | <SENTENCE ref='1' alloc='0'> |
||
− | <CHUNK ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'> |
||
− | <NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'> |
||
− | </NODE> |
||
− | <CHUNK ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP8 |
||
− | <?xml version='1.0' encoding='UTF-8'?> |
||
− | <corpus > |
||
− | <SENTENCE ord='1' ref='1' alloc='0'> |
||
− | <CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'> |
||
− | <NODE ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'> |
||
− | </NODE> |
||
− | <CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'> |
||
− | <NODE ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'> |
||
− | <NODE ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP9 |
||
− | <?xml version='1.0' encoding='UTF-8' ?> |
||
− | <corpus > |
||
− | <SENTENCE ord='1' ref='1' alloc='0'> |
||
− | <CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'> |
||
− | <NODE ord='0' ref='2' alloc='5' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'> |
||
− | </NODE> |
||
− | <CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'> |
||
− | <NODE ord='0' ref='1' alloc='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'> |
||
− | <NODE ord='0' ref='4' alloc='12' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE ord='1' ref='3' alloc='8' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------- step10 |
||
− | <?xml version='1.0' encoding='UTF-8'?> |
||
− | <corpus > |
||
− | <SENTENCE ord='1' ref='1' alloc='0'> |
||
− | <CHUNK ord='2' ref='2' type='adi-kat' alloc='5' si='top' cas='[ABS]' trans='DU' length='1'> |
||
− | <NODE form='da' ref ='2' alloc ='5' ord='0' lem='izan' pos='[NAG]' mi='[ADT][A1][NR_HU]'> |
||
− | </NODE> |
||
− | <CHUNK ord='0' ref='1' type='is' alloc='0' si='subj' length='1' cas='[ERG]'> |
||
− | <NODE form='hau' ref ='1' alloc ='0' ord='0' UpCase='none' lem='hau' pos='[DET][ERKARR]'> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | <CHUNK ord='1' ref='3' type='is' alloc='8' si='att' cas='[ABS]' length='2'> |
||
− | <NODE form='proba' ref ='4' alloc ='12' ord='0' UpCase='none' lem='proba' pos='[IZE][ARR]' mi='[NUMS]' sem='[BIZ-]'> |
||
− | <NODE form='bat' ref ='3' alloc ='8' ord='1' UpCase='none' lem='bat' pos='[DET][DZH]' vpost='IZO'> |
||
− | </NODE> |
||
− | </NODE> |
||
− | </CHUNK> |
||
− | </CHUNK> |
||
− | </SENTENCE> |
||
− | |||
− | </corpus> |
||
− | |||
− | -------------STEP11 |
||
− | Hau proba bat da |
||
− | |||
− | </pre> |
||
− | |||
− | ==Documentation== |
||
− | |||
− | * [http://matxin.svn.sourceforge.net/viewvc/matxin/trunk/doc/documentation-es.pdf Descripción del sistema de traducción es-eu Matxin] (in Spanish) |
||
− | * [[Documentation of Matxin]] (in English) |
||
[[Category:Matxin|*]] |
[[Category:Matxin|*]] |
Latest revision as of 20:29, 7 May 2016
Matxin is a free software machine translation engine related to Apertium. It allows for deeper transfer than can be found in Apertium.
This page describes how to install the system, see Matxin#Documentation below for how to create or maintain language pairs.
Installation[edit]
$ git clone https://github.com/matxin/matxin.git $ cd matxin/ $ ./autogen.sh $ make # make install
Language pairs[edit]
Troubleshooting[edit]
- Can't find AP_MKINCLUDE
set your ACLOCAL_PATH
to include the path to matxin.m4
Documentation[edit]
- Descripción del sistema de traducción es-eu Matxin (in Spanish)
- Documentation of Matxin (in English)
- Matxin New Language Pair HOWTO
Contact[edit]
Questions and comments about Matxin can be sent to their mailing list matxin-devel, or to the apertium-stuff list.