Difference between revisions of "Become a language pair developer for Apertium"

From Apertium
Jump to navigation Jump to search
m
Line 1: Line 1:
{{TOCD}}
{{TOCD}}
This is a 3-part, step-by-step guide on how to use a development version of Apertium to make a change in a language pair. These instructions assume that you are using [http://wiki.apertium.orgApertium_on_Ubuntu Ubuntu or Debian]; if not then please see the [[Installation]] page for installation on other OS's such as on [http://wiki.apertium.orgApertium_on_Mac_OS_X Mac OS X local] and [http://wiki.apertium.orgApertium_on_Mac_OS_X_(System) systemwide ] or [http://wiki.apertium.orgApertium_on_Windows Windows] .
This is a 3-part, step-by-step guide on how to use a development version of Apertium to make a change in a language pair. These instructions assume that you are using [http://wiki.apertium.org/Apertium_on_Ubuntu Ubuntu or Debian]; if not then please see the [[Installation]] page for installation on other OS's such as on [[Apertium_on_Mac_OS_X_(Local)|Mac OS X local]] and [[Apertium_on_Mac_OS_X_(System)|systemwide]] or [[Apertium_on_Windows|Windows]].


== Introduction ==
== Introduction ==

Revision as of 22:56, 10 April 2013

This is a 3-part, step-by-step guide on how to use a development version of Apertium to make a change in a language pair. These instructions assume that you are using Ubuntu or Debian; if not then please see the Installation page for installation on other OS's such as on Mac OS X local and systemwide or Windows.

Introduction

When becoming an Apertium developer there are 2 options as to how you can get Apertium. You can use either the terminal to get the most up-to-date versions or the Synaptic package manager can be used to get development versions that aren't quite as up-to-date. There are pros and cons to each. However, the terminal method is more for those that intend to submit their work, while using the package manager is normally easier and you allows you to use a graphical interface instead of a command line. You will also need a text or XML editor, a thorough understanding of the languages you wish to develop for , and for those that wish to contribute to their chosen languages committer access on SourceForge is needed.

Getting Ready

Method 1: TERMINAL

Step 1: Get the Prerequisites

A development version of Apertium and the language pair you want to change has to be installed on your computer first before you can change something about the language pair. If you're looking at installing on a different distribution of Linux than Ubuntu you can find instructions here on the wiki for Apertium on Arch Linux, Apertium on Fedora, and Apertium on openSUSE. There is currently no method for installing on Mandriva.

Start by opening a new terminal.

Then, use this command to install the prerequisites:

sudo apt-get install subversion build-essential g++ pkg-config gawk libxml2 \
> libxml2-dev libxml2-utils xsltproc flex automake autoconf libtool libpcre3-dev

The terminal will then ask you for your password to begin.

Note: Keep track of how you type your password in your head. The terminal will not display characters entered for your password for security reasons.

After you have entered your password, press the "Enter" key and wait for your computer to download and install the packages.

Step 2: Get Apertium, lttoolbox, and other usefull tools

Using the same terminal, you can download the entire language pairs tree from SVN using the command:

svn checkout https://apertium.svn.sourceforge.net/svnroot/apertium

Keep in mind that the full tree is over 4GB. If you have a slow connection, limited disk space, or a limited data transfer amount, installing the whole tree is not recommended. In addition, language pairs are changing rapidly and if you want to change one of them, you have to work with the latest version!

These next commands download Apertium, and lttoolbox, that you need as well for developing language pairs than for using Apertium as a translator:

svn checkout http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox
svn checkout http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium

As you will develop language pairs, it is a good idea to get apertium-dixtools which allows automatic processing on dictionary files, like sorting words in alphabetical order:

svn checkout http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-dixtools

Finally, some language pairs such as apertium-eo-en (Esperanto and French) require to install the java version for lttoolbox:

svn checkout http://apertium.svn.sourceforge.net/svnroot/apertium/trunk/lttoolbox-java

Step 3 : Compilation and Installation of Apertium, lttoolbox, and other tools

First, you need to compile lttoolbox and Apertium. For this we will use:

cd apertium
cd lttoolbox/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig

, for lttoolbox. Then something similar:

cd ..
cd apertium/
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig ./autogen.sh
make
sudo make install
sudo ldconfig

for Apertium.

Note: If you are the only user of the computer, you may prefer to run the command make install as a regular user (without the sudo command). In this case, you will need write access on /usr/local, and some subdirectories. The easiest is to take ownership of these directories:

cd /usr/local
sudo chown <your_login>:<your_group> . bin lib share

For example on my own computer:

cd /usr/local
sudo chown bernard:user . bin lib share

Lttoolbox-java compilation is similar to two previous compilations. But you need on your computer a version of the java JDK from version 1.6. If you have not, you can directly download and install JDK 1.7 (careful not the JRE!) which works very well.

Here is a link for jdk 1.7 : http://www.oracle.com/technetwork/java/javase/downloads/java-se-jdk-7-download-432154.html

After what compile the same way:

cd ..    # assuming you are in a sub directory of apertium, not in /usr/local
cd lttoolbox-java
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
./autogen.sh
make
sudo make install  # or make install if you chose to follow the previous note
sudo ldconfig

Now you just have to compile apertium-dixtools (recommended but not essential). The way to do it is different:

cd ..
cd apertium-dixtools
ant jar

You can also build and install using Maven 2 (http://maven.apache.org), by typing:

cd ..
cd apertium-dixtools
mvn install

More details in this paragraph.

Unless major changes, you should not need to reinstall often these tools.

Step 4 : Get your language pair(s)

Using the same terminal, you can easily download and add the language pairs you want using a command like:

svn checkout https://apertium.svn.sourceforge.net/svnroot/apertium/<branche_name>/<pair_name>

In the area where it says <branche_name>, replace this with the the name of the svn subdirectory where the chosen language pair is.

In the area where it says <pair_name>, replace this with the name of the chosen language pair.

For example, if you wanted to retrieve the language pair Spanish/English (which is in trunk) and French/Portuguese (which is in staging in June 2012) you could type:

svn checkout https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-en-es
svn checkout https://apertium.svn.sourceforge.net/svnroot/apertium/staging/apertium-fr-pt

Note: You can find a full list of released language pairs at https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/. But there are other less developed language pairs. See Branches and List of language pairs.

Final Step: Compilation and installation of your language pair(s)

Compilation and installation of a language pair is similar to lttoolbox and Apertium one. Il y a juste l'instruction sudo ldconfig en moins. Pour ça on utilisera :

cd apertium  # or cd .. depending on the directory you are
cd <pair_name>
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
./autogen.sh
make
sudo make install  # or make install if you chose to follow the previous note

for each language pair recovered replacing the text <pair_name> by the appropriate name.

For further instruction, if necessary, see Apertium on Ubuntu.

Method 2: PACKAGE MANAGER

Using the Synaptic package manager to download and install Apertium, lttoolbox, and your language pair is considerably easier than the terminal method, however, your choice of language pairs is limited, you may be unable to commit changes, and there may possibly be other flaws or minor bugs. Synaptic is available in Ubuntu and Debian and its derivatives. Package manager equivalents on other systems include PackageKit and KPackage. If your distribution of Linux does not have Apertium packages in its manager then they can be downloaded through SourceForge.

Step 1: Find your Packages

To begin, start by finding the Synaptic package manager and opening it.

Then, use the search box (or type ctrl+F) and type in "apertium".

Synaptic should bring up a list of everything related to Apertium. This list should include language pairs, the development versions of lttoolbox and libapertium, as well as the Apertium base package and various others.

Final Step: Compilation & Installation

Luckily, Synaptic takes care of getting the prerequisites, dependencies, and other required packages. All you have to do is select which packages you need and have Synaptic download and install them. It may be possible to have Synaptic download your language pair, Apertium, lttoolbox, and the corresponding dependencies at one time, however, the amount of time saved is minimal overall.

Start by selecting the "apertium" checkbox and choose "Mark for Installation" from the drop-down menu. (Left-clicking and right-clicking both bring down the menu.)

Synaptic will inform you of Apertium's dependencies and will ask if you want to mark them as well. Click "Mark" in the lower-right of the pop-up box.

The required packages (lttoolbox, libapertim, and liblttoolbox) will now be marked as well as Apertium.

Now you can select your language pair. Note: A lot of language pairs aren't available through this method. Those that are available include: en-es fr-es es-pt es-ca es-gl pt-gl eo-ca eo-es en-ca oc-es fr-ca es-ro eu-es oc-ca.

Download and install the selected packages.

Synaptic will inform you when it is done.

Now you can install the development packages (libapertium3-3.1-0-dev and liblttoolbox3-3.1-0-dev) using the same procedures.

IMPORTANT: Available versions of packages may be limited by what version of your OS you are running.

VERY IMPORTANT : If you got a language pair by Synaptic (or more generaly anywhere out of Apertium SVN repository), vous can use them for your own translations, but you will be lucky if you got the last version. As this page intends to tell you how to improve a language pair, you should first avoid modifying a deprecated pair! For this reason, only use method 1 to get language pairs you want to modify.


Changing Things Around

When you want to make a change in Apertium, you more than likely want to add a word to an existing language pair. For a full explanation go to Contributing to an existing pair. You can also check out the Contact page for Apertium mailing lists and live help through IRC.

IMPORTANT: Adding a word won't do you any good if you don't recompile the modules after the change is made. Simply use the terminal like before and enter: make <modulenamehere> and press the "Enter" key and your computer will create the new files necessary.


There are 3 major steps in adding a new word to a language pair:

1. Add an entry to the dictionary for the first language that will be used.

2. Add an entry to the bilingual dictionary for the pair.

3. Add an entry to the dictionary for the second language that will be used.

You will need to find the module you want to work with on your computer and open the three dictionaries; for example: apertium-es-ca.es.dix, apertium-es-ca.es-ca.dix, and apertium-es-ca.ca.dix. Note: Each dictionary will have the suffix ".dix" You should open these files in a text editor or specialized XML editor.

Step 1: Adding to the First Dictionary

When adding entries, you have to enter the lemma (word as you would read it in a dictionary),the part between <i> and </i> that contains the prefix of the word that is common to all inflected forms, and the element in <par> that refers to the inflection paradigm of this word. All entries will have a basic structure like:

      <e lm="(lemma)">
        <i>(prefix)</i>
        <par n="(paradigm)"/>
      </e>

A good example of this would be:

      <e lm="cósmico">
        <i>cósmic</i>
        <par n="absolut/o__adj"/>
      </e>

Start by opening your first language's dictionary file. For example: apertium-en-es.es.dix (an XML file).

Then, create a new entry with the basic structure next to a similar entry in the dictionary. The order of entries doesn't matter.

Now, between the quotes in the area where it says (lemma) replace (lemma) with your word. Note: Do not include () in entries, but place input between "".

Next, you can enter the prefix in the space between <i> and </i> and replace (prefix).

Finally, enter the paradigm in <par> between the quotations. The paradigm should consist of the prefix of another word that has the same inflection and is already in the dictionary and has the morphological analysis: adj m sg, adj f sg, adj m pl and adj f pl respectively. For example: <par n="absolut/o__adj"/> for cósmico. This entry means that the adjective "cósmico" inflects like the adjective "absoluto" and has the same morphological analysis: the forms cósmico, cósmica, cósmicos, and cósmicas are equivalent to the forms absoluto, absoluta, absolutos, and absolutas and have the morphological analysis: adj m sg, adj f sg, adj m pl and adj f pl respectively.

Now, save your altered dictionary, and DO NOT change file name, directory, or file type.

To finish, use the terminal and navigate to the directory that your module is housed in and enter make. Now press the "Enter" key and allow you computer to recompile the module with the changes you just made.

Step 2: Adding to the Second Dictionary

Using the same structure, you can create the entry in your second language's dictionary that is the equivalent to your entry in the first dictionary.

The second language dictionary file name should be something such as apertium-en-es.en.dix.

Save your changes and recompile the module.

Final Step: The Bilingual Dictionary

Adding entries to the bilingual dictionary is considerably easier than adding to the other two dictionaries. An entry in this dictionary has this basic structure:

     <e> 
        <p>
          <l>(lemmafromfirst)<s n="(partofspeech)"/></l>
          <r>(lemmafromsecond)<s n="(partofspeech)"/></r>
        </p>
      </e>

Simply add an entry and replace (lemmafromfirst) with the lemma you added to the first dictionary, (lemmafromsecond) with the lemma from the second, and (partofspeech) with the part of speech for each word.

Save this dictionary and recompile the module one last time.

Adding rules to a language pair can also be done, however, that will not be discussed in this guide. See Contributing to an existing pair for a more detailed and full explanation.

Errors

It is very possible that you may encounter an error in you changes.

To know how a word is analysed by the translator and find an error, type the following in the terminal (example from Contributing to an existing pair. Follow link for more help.):

$ echo "gener" | apertium-destxt | lt-proc ca-es.automorf.bin

You can replace ca-es with the translation direction you want to test.

The output in Apertium should be:

^gener/gener<n><m><sg>$^./.<sent>$[][]

The string structure is: ^word/lemma<morphological analysis>$. The <sent> tag is the analysis of the full stop, as every sentence end is represented as a full stop by the system, whether or not explicitly indicated in the sentence.

The analysis of an unknown word is (ignoring the full stop information):

^genoma/*genoma$

and the analysis of an ambiguous word:

^casa/casa<n><f><sg>/casar<vblex><pri><p3><sg>/casar<vblex><imp><p2><sg>$

Each lexical form (lemma plus morphological analysis) is presented as a possible analysis of the word casa.

If you are still stuck remember that you can always ask questions on IRC.


Show it to the World

Now that you have added to a language pair you have the option to commit your changes to SourceForge (If you used Method 1 for installation). Committing a change to a language pair is even easier than making the change.

Firstly, you need to register for a free SourceForge account. Then, contact an Apertium administrator here and request access to commit to SVN on SourceForge.

Once granted access, simply open the terminal, navigate to your language pair that was changed, and enter:

svn commit

Remember to include a log message detailing what was changed or added.

It is also possible that svn commit may be followed by -m "message”. That is sometimes easier than having an editor opened automatically.

You have now become a language pair developer for Apertium!