Getting started with induction tools
Contents |
These are partial directions to getting started with Apertium and related tools (such as GIZA++), with the end goal of creating bilingual dictionaries. A few steps are ubuntu-specific. Everything except mkcls is built from the SVN sources.
Installing the necessary programs
Prerequisite Ubuntu packages
This is, unfortunately, not a complete list; I'll attempt to add one once I've set this up on a clean ubuntu install. However, the following should still be of some help; all are required:
sudo apt-get install automake libtool libxml2-dev flex libpcre3-dev
Installing Crossdics
Crossdics explains this correctly.
You may need to export a new value for JAVA_HOME before running ant jar.
$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.03
ls /usr/lib/jvm/ if in doubt as to what the exact version on your system is; it may be different than the above.
Installing Apertium
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium $ ./autogen.sh $ ./configure $ make $ sudo make install
Installing lttoolbox
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox $ ./autogen.sh $ ./configure $ make $ sudo make install
Installing ReTraTos
$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/retratos $ ./autogen.sh $ ./configure $ make $ sudo make install
Installing mkcls
Get the deb package appropriate for your version of ubuntu at ubuntu-nlp.
$ dpkg --install mkcls*.deb
Installing GIZA++
Unfortunately, the deb package at ubuntu-nlp was built with -DBINARY_SEARCH_FOR_TTABLE. You need to prepare the input files differently for this, or it dies with "ERROR: NO COOCURRENCE FILE GIVEN!". I don't know how to do that, so here are the instructions on compiling a version that will work with the rest of this page.
$ wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.1.tar.gz $ tar xvzf giza-pp-v1.0.1.tar.gz $ cd giza-pp/GIZA++-v2 $ cat Makefile | sed -e 's/-DBINARY_SEARCH_FOR_TTABLE//' | sed -e 's/mkdir/mkdir -p/g' > tmp $ mv Makefile Makefile.orig $ mv tmp Makefile $ make $ sudo make install