Minimal installation from SVN

From Apertium
Revision as of 10:15, 23 September 2016 by Rcrowther (talk | contribs) (New link for grammar libraries)
Jump to navigation Jump to search

En français

This guide shows you how to download, configure, compile and install core apertium packages and language data. It assumes you've already installed the prerequisites for your system – if you have not, see the system-specific guides under Installation. If you run into trouble, see Installation troubleshooting.

Note: some pairs require more than the four packages describe here. See the bottom of this page if your language pair complains about lacking CG, HFST or language data like apertium-rus.

Before You Do Anything!

Do you really need the core tools from svn? Ask yourself, what do you want to work on?

  • Translation, language pairs, source/target languages: Return to Installation and see if you can use the binary packages for the core tools, and then skip the lttoolbox, apertium, apertium-lex-tools, cg3, hfst parts of this page and instead follow the next section.
  • Core C++ shared tools: Go right ahead...
  • Don't know / Not sure: Ask on IRC what you should install.

Installing just the SVN language data

If you've already got the core tools installed (apertium, cg, hfst; or the apertium-all-dev package), then there's a script that can download and setup language data (pair + possible monolingual dependencies) from SVN for you. Just go to the directory where you want your apertium data to be, and run

wget https://raw.githubusercontent.com/unhammer/apertium-get/master/apertium-get
chmod +x apertium-get
./apertium-get fie-bar

where "fie-bar" is the name of the language pair you want to work on, and you'll have the data correctly set up under your current directory.

Ask on IRC if there are problems.

Installing apertium and a language pair

Download

For most language pairs, these are the packages you need:

  • lttoolbox
  • apertium
  • apertium-lex-tools
  • the language pair(s) your are interested in

NB: Use binaries for lttoolbox, apertium, and apertium-lex-tools unless you know what you're doing

Here are the commands if you would like the Esperanto-English pair:

svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/lttoolbox
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-lex-tools
svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-eo-en

Note: please make sure that the directory where you put these files (i.e. where you run the svn command) doesn't contain spaces and other special characters. That may cause errors while compiling/linking.

If you want another pair than eo-en, only the last line needs changing. To see the available 'released' language pairs, go to https://svn.code.sf.net/p/apertium/svn/trunk/ (pairs which are in development are in the incubator/nursery/staging subdirectories of https://svn.code.sf.net/p/apertium/svn/).

If a language pair has more dependencies than the three shown above, the README should mention it (and the autogen.sh step should fail with a message about what is missing). The bottom of this page has pointers on how to install other possible dependencies.

Set up environment

By default, Apertium is installed under the directory /usr/local, which requires root (sudo) access when installing. If that's fine with you, begin by pasting these lines into your terminal:

LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:${PKG_CONFIG_PATH}
export PKG_CONFIG_PATH

You should also put those lines in your ~/.bashrc so you don't have to paste them into every terminal you open.

However, if you want it installed somewhere else or don't want to install it as root, instead paste these lines into your terminal:

PREFIX=$HOME/local # or wherever you want apertium stuff installed
LD_LIBRARY_PATH=$PREFIX/lib:${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH
PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig:${PKG_CONFIG_PATH}
export PKG_CONFIG_PATH

You should also put those lines in your ~/.bashrc so you don't have to paste them into every terminal you open.

Configure, build and install

The next step is to configure, build and install each of the modules you checked out, in this order:

  1. lttoolbox
  2. apertium
  3. apertium-lex-tools
  4. the language pair (e.g. apertium-eo-en)

cd to each of the directories before you run the the commands shown below.

If you didn't specify $PREFIX above, or don't know what this means, then do this in each directory:

./autogen.sh
make

Then, for all programs apart from the language pair, do:

sudo make install
sudo ldconfig

If you specified a $PREFIX (e.g. to avoid installing as root), then do this in each directory:

./autogen.sh --prefix=$PREFIX
make

Then, for all programs apart from the language pair, do:

make install
ldconfig -n $PREFIX/lib


(If you're on a Mac, you don't need to do ldconfig, don't worry that it fails.)


If you had any trouble, see Installation troubleshooting.

Test

Now test that it works.

You can see development translation modes if you do ls modes from the language pair directory. If you're in the language pair directory, and there is e.g. a file modes/eo-en-tagger.mode, you can run the translator up until the tagger by typing

echo 'This is a test sentence.' | apertium -d . eo-en-tagger

The full pipeline is typically named e.g. eo-en:

echo 'This is a test sentence.' | apertium -d . eo-en

The -d . means "use the language data in this directory".


If you are a user who just wants to translate, and not hack on the language pair, you can do (sudo) make install from the language pair directory – this lets you do echo 'This is a test sentence' | apertium eo-en without the -d, from whatever directory you're in.

Developers should not do this, since most new/incubator language pairs don't work with installation :-)

For language pairs that depend on monolingual packages (apertium-XYZ)

Many language pairs now have their monolingual data in separate packages (so that when several pairs have one language in common, we don't have to duplicate the data). If a pair depends on a monolingual package, the README should say so, and also the autogen.sh step should fail with a message like

No package 'apertium-XYZ' found

(where XYZ is some language code).

Monolingual packages are typically kept in https://svn.code.sf.net/p/apertium/svn/languages/ (more info at Languages) and compiled like the other packages. If a monolingual package installs a dictionary, the language pair uses that installed dictionary when compiling. However, to avoid having to type make install in the monolingual directory after every change there, you can tell the language pair the exact location to the monolingual package, and it will use the dictionary from that directory instead of the installed one. This is recommended for developers.

Imagine the language pair is called apertium-fie-bar, and it depends on the monolingual packages apertium-fie and apertium-bar. Assuming we have already installed lttoolbox, apertium and apertium-lex-tools as shown above, these would be the steps to download, configure and install apertium-fie-bar:

svn checkout https://svn.code.sf.net/p/apertium/svn/trunk/apertium-fie-bar
svn checkout https://svn.code.sf.net/p/apertium/svn/languages/apertium-fie
svn checkout https://svn.code.sf.net/p/apertium/svn/languages/apertium-bar

cd apertium-fie
./autogen.sh
cd ..

cd apertium-bar
./autogen.sh
cd ..

cd apertium-fie-bar
./autogen.sh --with-lang1=../apertium-fie --with-lang2=../apertium-bar
# Now you can compile; using "make langs" in the pair will first compile the monolingual data, then the pair itself:
make -j3 langs

The --with-lang1 is used to give the path to where you checked out apertium-fie. If you do ./autogen.sh --help, it will tell you the possible --with-langN options and what they correspond to.

The process is similar for other language pairs that use monolingual packages.

For language pairs that use CG (vislcg3 / cg-proc / cg-comp)

Many language pairs now use Constraint Grammar (e.g. Macedonian→English, Breton→French, Nynorsk-Bokmål, …). For these, you need vislcg3 beforehand. See Vislcg3#Installing_VISL_CG3 for installation (use ./cmake.sh -DCMAKE_INSTALL_PREFIX=<prefix> if you're installing to a prefix).

Note that you have to have ICU installed beforehand (available through most GNU/Linux package managers, in Arch Linux as icu, in Debian/Ubuntu as libicu-dev, in Macports as icu).


For language pairs that use HFST (hfst-proc / hfst-lexc / hfst-twolc)

Many language pairs now use HFST (e.g. the Turkic and Saami ones). For these, you need hfst beforehand. Follow the installation guides first for HFST. HFST is actually created as a set of wrappers over several possible back-ends, Foma, OpenFST, SFST, …. The latest versions of HFST include the back-ends you need, so there's no reason to install any of these backends separately.

See also