How to bootstrap a new pair

From Apertium
Revision as of 01:43, 8 March 2018 by Sushain (talk | contribs)
Jump to navigation Jump to search

How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).

Prerequisites

You need to get this installed first:

With two existing monolingual packages

Do the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:


First compile the monolingual packages:

svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With one existing monolingual package

Does just one of the two languagues you're making a pair of already have a monolingual module in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch.
ISO 639-3 codes can be found here: http://www-01.sil.org/iso639-3/codes.asp

First make a new monolingual package:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

Then get and compile the existing monolingual package:

svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

cd apertium-XXX-YYY
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With no existing monolingual packages

Do none of the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ or https://svn.code.sf.net/p/apertium/svn/incubator ?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:

First make and compile the new monolingual packages:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

python3 apertium-init.py YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

HFST and other alternative setups

If you're making a monolingual module that should use HFST/lexc, pass the option --analyser=hfst to apertium-init.py.

If you're making a pair where the "left" side (XXX in the above examples) uses HFST/lexc, pass the option --analyser1=hfst to apertium-init.py.

If you're making a pair where the "right" side (YYY in the above examples) uses HFST/lexc, pass the option --analyser2=hfst to apertium-init.py.

If you're making a pair where the both sides use HFST/lexc, pass the option --analysers=hfst to apertium-init.py.

See https://github.com/apertium/apertium-init for all more documentation, or run ./apertium-init.py --help for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator).