Difference between revisions of "How to bootstrap a new pair"

From Apertium
Jump to navigation Jump to search
(18 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
  +
How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).
   
 
==Prerequisites==
 
==Prerequisites==
   
  +
''You need to get this installed first:''
* [[apertium-init]]
 
  +
  +
* apertium/lttoolbox/hfst, see [[Installation]], in particular the ''prerequisites'' parts. (You most likely don't need to go all the way, since you should get this stuff from Tino's repositories. If you're on Windows, get the [[Apertium VirtualBox]].)
  +
* [[apertium-init]].py – put this script in your working directory where you will be downloading language data (you only need https://raw.githubusercontent.com/apertium/apertium-init/master/apertium-init.py – not the whole repository –, although the information in the [https://github.com/apertium/apertium-init/blob/master/README.md README.md] file may be useful for troubleshooting)
   
 
==With two existing monolingual packages==
 
==With two existing monolingual packages==
   
  +
Do the two languages you're making a pair of already have monolingual modules in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]?
First compile the monolingual packages:
 
   
  +
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
   
Then generate the pair:
 
   
  +
First compile the monolingual packages:
  +
<pre>
  +
git clone https://github.com/apertium/apertium-XXX.git
  +
cd apertium-XXX
  +
./autogen.sh
  +
make -j
  +
cd ..
  +
git clone https://github.com/apertium/apertium-YYY.git
  +
cd apertium-YYY
  +
./autogen.sh
  +
make -j
  +
cd ..
  +
</pre>
  +
  +
Then generate the pair:
  +
<pre>
  +
python3 apertium-init.py XXX-YYY
  +
</pre>
   
 
Then compile the pair:
 
Then compile the pair:
   
 
<pre>
 
<pre>
  +
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
./autogen.sh
 
  +
make -j
./configure --with-lang1=/path/to/apertium-xxx --with-lang2=/path/to/apertium-yyy
 
 
</pre>
 
</pre>
   
 
And test:
 
And test:
  +
<pre>
  +
echo house | apertium -d . XXX-YYY
  +
echo Haus | apertium -d . YYY-XXX
  +
</pre>
  +
  +
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
  +
  +
<pre>
  +
make -j
  +
echo house | apertium -d . XXX-YYY
  +
echo Haus | apertium -d . YYY-XXX
  +
</pre>
  +
  +
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
   
 
==With one existing monolingual package==
 
==With one existing monolingual package==
  +
  +
Does just one of the two languagues you're making a pair of already have a monolingual module in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]?
  +
  +
  +
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch. <br>
  +
ISO 639-3 codes can be found here: http://www-01.sil.org/iso639-3/codes.asp
  +
  +
First make a new monolingual package:
  +
<pre>
  +
python3 apertium-init.py XXX
  +
cd apertium-XXX
  +
./autogen.sh
  +
make -j
  +
cd ..
  +
</pre>
  +
  +
Then get and compile the existing monolingual package:
  +
<pre>
  +
git clone https://github.com/apertium/apertium-YYY.git
  +
cd apertium-YYY
  +
./autogen.sh
  +
make -j
  +
cd ..
  +
</pre>
  +
  +
Then generate the pair:
  +
<pre>
  +
python3 apertium-init.py XXX-YYY
  +
</pre>
  +
  +
Then compile the pair:
  +
  +
<pre>
  +
cd apertium-XXX-YYY
  +
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
  +
make -j
  +
</pre>
  +
  +
And test:
  +
<pre>
  +
echo house | apertium -d . XXX-YYY
  +
echo Haus | apertium -d . YYY-XXX
  +
</pre>
  +
  +
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
  +
  +
<pre>
  +
make -j
  +
echo house | apertium -d . XXX-YYY
  +
echo Haus | apertium -d . YYY-XXX
  +
</pre>
  +
  +
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
   
 
==With no existing monolingual packages==
 
==With no existing monolingual packages==
   
  +
Do none of the two languagues you're making a pair of already have monolingual modules in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]?
  +
  +
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
  +
  +
First make and compile the new monolingual packages:
  +
<pre>
  +
python3 apertium-init.py XXX
  +
cd apertium-XXX
  +
./autogen.sh
  +
make -j
  +
cd ..
  +
  +
python3 apertium-init.py YYY
  +
cd apertium-YYY
  +
./autogen.sh
  +
make -j
  +
cd ..
  +
</pre>
  +
  +
Then generate the pair:
  +
<pre>
  +
python3 apertium-init.py XXX-YYY
  +
</pre>
  +
  +
Then compile the pair:
  +
  +
<pre>
  +
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
  +
make -j
  +
</pre>
  +
  +
And test:
  +
<pre>
  +
echo house | apertium -d . XXX-YYY
  +
echo Haus | apertium -d . YYY-XXX
  +
</pre>
  +
  +
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
  +
  +
<pre>
  +
make -j
  +
echo house | apertium -d . XXX-YYY
  +
echo Haus | apertium -d . YYY-XXX
  +
</pre>
  +
  +
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
  +
  +
==HFST and other alternative setups==
  +
If you're making a monolingual module that should use HFST/lexc, pass the option <code>--analyser=hfst</code> to apertium-init.py.
  +
  +
If you're making a pair where the "left" side (XXX in the above examples) uses HFST/lexc, pass the option <code>--analyser1=hfst</code> to apertium-init.py.
  +
  +
If you're making a pair where the "right" side (YYY in the above examples) uses HFST/lexc, pass the option <code>--analyser2=hfst</code> to apertium-init.py.
  +
  +
If you're making a pair where the both sides use HFST/lexc, pass the option <code>--analysers=hfst</code> to apertium-init.py.
   
  +
See https://github.com/apertium/apertium-init for all more documentation, or run <code>./apertium-init.py --help</code> for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator).
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Installation]]
  +
[[Category:Documentation in English]]

Revision as of 05:35, 24 February 2020

How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).

Prerequisites

You need to get this installed first:

With two existing monolingual packages

Do the two languages you're making a pair of already have monolingual modules in the repository?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:


First compile the monolingual packages:

git clone https://github.com/apertium/apertium-XXX.git
cd apertium-XXX
./autogen.sh
make -j
cd ..
git clone https://github.com/apertium/apertium-YYY.git
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With one existing monolingual package

Does just one of the two languagues you're making a pair of already have a monolingual module in the repository?


Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch.
ISO 639-3 codes can be found here: http://www-01.sil.org/iso639-3/codes.asp

First make a new monolingual package:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

Then get and compile the existing monolingual package:

git clone https://github.com/apertium/apertium-YYY.git
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

cd apertium-XXX-YYY
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With no existing monolingual packages

Do none of the two languagues you're making a pair of already have monolingual modules in the repository?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:

First make and compile the new monolingual packages:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

python3 apertium-init.py YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

HFST and other alternative setups

If you're making a monolingual module that should use HFST/lexc, pass the option --analyser=hfst to apertium-init.py.

If you're making a pair where the "left" side (XXX in the above examples) uses HFST/lexc, pass the option --analyser1=hfst to apertium-init.py.

If you're making a pair where the "right" side (YYY in the above examples) uses HFST/lexc, pass the option --analyser2=hfst to apertium-init.py.

If you're making a pair where the both sides use HFST/lexc, pass the option --analysers=hfst to apertium-init.py.

See https://github.com/apertium/apertium-init for all more documentation, or run ./apertium-init.py --help for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator).