Getting started with induction tools

From Apertium
Revision as of 09:29, 28 March 2008 by Kb (talk | contribs) (Initial half-done checkin, so as not to tempt Murphy.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

These are partial directions to getting started with Apertium and related tools (such as GIZA++), with the end goal of creating bilingual dictionaries. A few steps are ubuntu-specific. Everything except mkcls is built from the SVN sources.

Installing the necessary programs

Prerequisite Ubuntu packages

This is, unfortunately, not a complete list; I'll attempt to add one once I've set this up on a clean ubuntu install. However, the following should still be of some help; all are required:

sudo apt-get install automake libtool libxml2-dev flex libpcre3-dev

Installing Crossdics

Crossdics explains this correctly.

You may need to export a new value for JAVA_HOME before running ant jar.

$ export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.03

ls /usr/lib/jvm/ if in doubt as to what the exact version on your system is; it may be different than the above.


Installing Apertium

$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium
$ ./autogen.sh
$ ./configure
$ make
$ sudo make install

Installing lttoolbox

$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox
$ ./autogen.sh
$ ./configure
$ make
$ sudo make install

Installing ReTraTos

$ svn co https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/retratos
$ ./autogen.sh
$ ./configure
$ make
$ sudo make install

Installing mkcls

Get the deb package appropriate for your version of ubuntu at ubuntu-nlp.

$ dpkg --install mkcls*.deb

Installing GIZA++

Unfortunately, the deb package at ubuntu-nlp was built with -DBINARY_SEARCH_FOR_TTABLE. You need to prepare the input files differently for this, or it dies with "ERROR: NO COOCURRENCE FILE GIVEN!". I don't know how to do that, so here are the instructions on compiling a version that will work with the rest of this page.

$ wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.1.tar.gz
$ tar xvzf giza-pp-v1.0.1.tar.gz
$ cd giza-pp/GIZA++-v2
$ cat Makefile | sed -e 's/-DBINARY_SEARCH_FOR_TTABLE//' | sed -e 's/mkdir/mkdir -p/g' > tmp
$ mv Makefile Makefile.orig
$ mv tmp Makefile
$ make
$ sudo make install


Creating bilingual dictionaries.

Obtaining corpera (and getAlignmentWithText.pl)