Apertium on Windows

From Apertium
Revision as of 13:24, 23 November 2008 by Jimregan (talk | contribs) (rant!)
Jump to navigation Jump to search

Introductory notes

Apertium is Unix software. It's not formally supported on Windows - though we would all like it to run on Windows, it's not supported.

Apertium, it seems, is a difficult beast to get working, which is odd for us, because in a unix environment, it "just works".

Cygwin, as a 'Unix-like' environment for Windows, should be an easy target for us; however, Apertium makes heavy use of wstring in its C++ code, which is not supported by Windows versions of GCC earlier than 4.0 - this rules out MinGW for us, too, unfortunately.

Lttoolbox should build with Visual C++ without additional changes - provided that you follow the instructions on this page. Apertium itself, however, still has a few problems wih Visual C++. We're trying to iron them out, but it's not ready yet. Even when the individual components work, you will also need a Unix-type shell, such as bash, to run it. Batch files simply will not work, as Windows does not have true support for program pipes ('|')[1]

Here be dragons!

Getting the code

The lttoolbox changes are in the main SVN branch. Current win32 changes for apertium are in the branch apertium/win32

See instructions for using SVN, or the simplest way :

Dependencies

You will need to get Windows versions of the following tools and libraries to compile the code under Windows:

  • Microsoft's C++ compiler - the compiler is part of Visual Studio 2008 Express, which is available free of charge.
  • CMake - a cross platform build tool
  • libxml - the Windows binaries are at http://xmlsoft.org/sources/win32/; you will need:
    • libxml2
    • zlib
    • iconv
    • libxslt
  • pcre - there are no Windows binaries available and you will have to build this from the source code, or try download the "precompiled version for Windows" at the given link.
  • flex for Windows - it's probably easiest if you just download the Windows installer version.

Setting up the required tools and libraries

Installing Visual Studio

Be sure to start downloading Visual Studio 2008 first, since it is a fairly big download.

Installing CMake

Next, download and install the Windows version of CMake. Let the installer place CMake in your path, since it will be much easier to run CMake from the command line if you do this.

Installing the libxml libraries

The zip files of libxml2, zlib, libiconv and libxslt all have the familiar UNIX layout:

|- bin
|- include
|- lib
|- share

Unzip each of these archives and copy the bin, include, lib and share directories in each package into a common directory (the Windows branch in the GIT repository assumes C:\Program Files\LibXML as a default).

Building and installing libpcre

Now, unzip the libpcre archive to a temporary folder. To compile libpcre, you will need have Visual Studio AND CMake already installed on your system. Now, open the Visual Studio 8.0 Command Prompt (Start -> Visual C++ 9.0 Express Edition -> Visual Studio Tools -> Visual Studio 8.0 Command Prompt). In this command prompt, go to the directory where you unpacked libpcre. Execute:

cmakesetup .

Click the button labelled "Configure". CMake will prompt you with a dialog box where you must choose the kind of make files it will output; select NMake. Normally, after the first time you click "Configure", CMake will show you a number of build variables, all highlighted in red. Click "Configure" once more; if those variables are highlighted in gray, you can click "Ok".

To compile libpcre, run:

nmake

After the compilation has succeeded, run:

nmake install

By default, libpcre installs to C:\Program Files\PCRE. You can change this in the cmakesetup by modifying the build variable CMAKE_INSTALL_PREFIX.

Installing flex

If you have downloaded the binary archive, unzip them into something like C:\Program Files\Flex. If you downloaded the Windows installer, then just follow the installation wizard :).

Building lttoolbox

Before you can build lttoolbox, you need to tell CMake where to find the libxml files (under Linux, CMake will use pkg-config to find where the packages are installed). Open lttoolbox\CMakeLists.txt and modify the lines

SET (LIBXML2_BASE        "C:/Program Files/LibXML")
SET (LIBXML2_INCLUDE_DIR "${LIBXML2_BASE}/include")
SET (LIBXML2_LIBRARIES   "${LIBXML2_BASE}/lib/libxml2.lib")

to match your installation.

Now, in the Visual Studio command prompt (again, Start -> Visual C++ 9.0 Express Edition -> Visual Studio Tools -> Visual Studio 8.0 Command Prompt) change the directory to lttoolbox and execute

cmakesetup .

Click on the configure button. As with libpcre, CMake will ask you to choose the type of build file to generate. Again, choose "Nmake". If you get an error message about LIBXML, then it means that LIBXML2_INCLUDE_DIR or LIBXML2_LIBRARIES in lttoolbox\CMakeLists.txt has an incorrect value.


If everything completes without errors after the first time that you clicked the "Configure" button, you will again see a few lines in the appear in the cmakesetup window. They should be highlighted with red. You have the opportunity to change the installation directory of lttoolbox by modifying the build variable CMAKE_INSTALL_PREFIX. To continue, click a second time on "Configure"; if everything is correct, then everything that was highlighted with red should now be highlighted with gray and the OK button becomes active.

Next, click OK; CMake should generate NMake files. To compile, simply execute:

nmake

To install lttoolbox, run:

nmake install

Building apertium

In order to build apertium, CMake needs to use the following tools and libraries:

  • libxml
  • libpcre
  • lttoolbox
  • xsltproc (you should have this since it is distributed with libxslt)
  • flex

At this point you should have all of the above and you simply need to set the correct CMake variables in order to build apertium. Open apertium/CMakeLists.txt and find the section which should resemble the following:

IF (WIN32)
        SET (WIN32_DIR ${PROJECT_SOURCE_DIR}/apertium/win32)
        INCLUDE_DIRECTORIES (${WIN32_DIR})
        LIST (APPEND EXTRA_SOURCES ${WIN32_DIR}/getopt.c ${WIN32_DIR}/libgen.c)

        SET(FLEX_EXECUTABLE        C:/Program\ Files/Flex/bin/flex.exe)

        SET(LIBXML2_BASE_DIR       C:/Program\ Files/LibXML)
        SET(LIBXML2_INCLUDE_DIR    ${LIBXML2_BASE_DIR}/include)
        SET(LIBXML2_LIBRARIES      ${LIBXML2_BASE_DIR}/lib/libxml2.lib)
        
        SET(XSLTPROC_EXECUTABLE    ${LIBXML2_BASE_DIR}/bin/xsltproc.exe)

        SET(LTTOOLBOX3_BASE_DIR    C:/Program\ Files/apertium-3.0)
        SET(LTTOOLBOX3_INCLUDE_DIR ${LTTOOLBOX3_BASE_DIR}/include/lttoolbox-3.0)
        SET(LTTOOLBOX3_LIBRARIES   ${LTTOOLBOX3_BASE_DIR}/lib/lttoolbox3.lib)

        SET(LIBPCRE_BASE_DIR       C:/Program\ Files/PCRE)
        SET(LIBPCRE_INCLUDE_DIR    "${LIBPCRE_BASE_DIR}/include")
        SET(LIBPCRE_LIBRARIES      "${LIBPCRE_BASE_DIR}/lib/pcrecpp.lib"
                                   "${LIBPCRE_BASE_DIR}/lib/pcreposix.lib"
                                   "${LIBPCRE_BASE_DIR}/lib/pcre.lib")

        ADD_DEFINITIONS (/D _CRT_SECURE_NO_WARNINGS /D STDC_HEADERS /D PCRE_STATIC)
ENDIF (WIN32)

The only variables which you should need to modify are:

  • FLEX_EXECUTABLE
  • LIBXML2_BASE_DIR
  • LTTOOLBOX3_BASE_DIR
  • LIBPCRE_BASE_DIR

You will also probably have to modify XSLTPROC_EXECUTABLE.

These variables point to the base directories where the respective required tools and libraries are installed.

To build apertium, open the Visual Studio command prompt (as described earlier in this document) and change the directory to apertium. Now as before, run

cmakesetup .

Follow all of the build steps as described in the previous section which details how lttoolbox is built.

After the build completes, you should be able to install apertium by typing

nmake install
  1. Rather than pipe directly between processes, the 'DOS box' uses temporary files to emulate it: ls|more becomes the equivalent of ls>tmp;more<tmp