Difference between revisions of "How to bootstrap a new pair"

From Apertium
Jump to navigation Jump to search
(11 intermediate revisions by 6 users not shown)
Line 1: Line 1:
 
{{TOCD}}
 
{{TOCD}}
  +
How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).
   
 
==Prerequisites==
 
==Prerequisites==
Line 5: Line 6:
 
''You need to get this installed first:''
 
''You need to get this installed first:''
   
* apertium/lttoolbox/hfst, see [[Installation]], in particular the ''prerequisites'' parts. (You most likely don't need to go all the way to "minimal installation from svn", since you should get this stuff from Tino's repositories. If you're on Windows, get the virtualbox)
+
* apertium/lttoolbox/hfst, see [[Installation]], in particular the ''prerequisites'' parts. (You most likely don't need to go all the way, since you should get this stuff from Tino's repositories. If you're on Windows, get the [[Apertium VirtualBox]].)
* [[apertium-init]] – put this script in your working directory where you will be downloading language data
+
* [[apertium-init]].py – put this script in your working directory where you will be downloading language data (you only need https://raw.githubusercontent.com/apertium/apertium-init/master/apertium-init.py – not the whole repository)
   
 
==With two existing monolingual packages==
 
==With two existing monolingual packages==
   
Do the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?
+
Do the two languages you're making a pair of already have monolingual modules in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]?
   
Then follow this part, replacing LANG1 and LANG2 for the ISO 639-3 codes of your languages:
+
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
   
   
 
First compile the monolingual packages:
 
First compile the monolingual packages:
 
<pre>
 
<pre>
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-LANG1
+
git clone https://github.com/apertium/apertium-XXX.git
cd apertium-LANG1
+
cd apertium-XXX
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
 
cd ..
 
cd ..
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-LANG2
+
git clone https://github.com/apertium/apertium-YYY.git
cd apertium-LANG2
+
cd apertium-YYY
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 31: Line 32:
 
Then generate the pair:
 
Then generate the pair:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1-LANG2
+
python3 apertium-init.py XXX-YYY
 
</pre>
 
</pre>
   
Line 37: Line 38:
   
 
<pre>
 
<pre>
./autogen.sh --with-lang1=../apertium-LANG1 --with-lang2=../apertium-LANG2
+
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
 
make -j
 
make -j
 
</pre>
 
</pre>
Line 43: Line 44:
 
And test:
 
And test:
 
<pre>
 
<pre>
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Now you can add words to apertium-LANG1-LANG2.LANG1-LANG2.dix, then test again:
+
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
   
 
<pre>
 
<pre>
 
make -j
 
make -j
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Line 59: Line 60:
 
==With one existing monolingual package==
 
==With one existing monolingual package==
   
Does just one of the two languagues you're making a pair of already have a monolingual module in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?
+
Does just one of the two languagues you're making a pair of already have a monolingual module in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]?
  +
   
Then follow this part, replacing LANG1 and LANG2 for the ISO 639-3 codes of your languages; here we assume LANG1 needs to be made from scratch.
+
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch. <br>
  +
ISO 639-3 codes can be found here: http://www-01.sil.org/iso639-3/codes.asp
   
 
First make a new monolingual package:
 
First make a new monolingual package:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1
+
python3 apertium-init.py XXX
cd apertium-LANG1
+
cd apertium-XXX
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 74: Line 77:
 
Then get and compile the existing monolingual package:
 
Then get and compile the existing monolingual package:
 
<pre>
 
<pre>
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-LANG2
+
git clone https://github.com/apertium/apertium-YYY.git
cd apertium-LANG2
+
cd apertium-YYY
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 83: Line 86:
 
Then generate the pair:
 
Then generate the pair:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1-LANG2
+
python3 apertium-init.py XXX-YYY
 
</pre>
 
</pre>
   
Line 89: Line 92:
   
 
<pre>
 
<pre>
  +
cd apertium-XXX-YYY
./autogen.sh --with-lang1=../apertium-LANG1 --with-lang2=../apertium-LANG2
+
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
 
make -j
 
make -j
 
</pre>
 
</pre>
Line 95: Line 99:
 
And test:
 
And test:
 
<pre>
 
<pre>
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Now you can add words to apertium-LANG1-LANG2.LANG1-LANG2.dix, then test again:
+
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
   
 
<pre>
 
<pre>
 
make -j
 
make -j
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Line 111: Line 115:
 
==With no existing monolingual packages==
 
==With no existing monolingual packages==
   
Do none of the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ or https://svn.code.sf.net/p/apertium/svn/incubator ?
+
Do none of the two languagues you're making a pair of already have monolingual modules in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]?
   
Then follow this part, replacing LANG1 and LANG2 for the ISO 639-3 codes of your languages:
+
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
   
 
First make and compile the new monolingual packages:
 
First make and compile the new monolingual packages:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1
+
python3 apertium-init.py XXX
cd apertium-LANG1
+
cd apertium-XXX
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
 
cd ..
 
cd ..
   
python3 apertium-init.py LANG2
+
python3 apertium-init.py YYY
cd apertium-LANG2
+
cd apertium-YYY
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 132: Line 136:
 
Then generate the pair:
 
Then generate the pair:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1-LANG2
+
python3 apertium-init.py XXX-YYY
 
</pre>
 
</pre>
   
Line 138: Line 142:
   
 
<pre>
 
<pre>
./autogen.sh --with-lang1=../apertium-LANG1 --with-lang2=../apertium-LANG2
+
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
 
make -j
 
make -j
 
</pre>
 
</pre>
Line 144: Line 148:
 
And test:
 
And test:
 
<pre>
 
<pre>
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Now you can add words to apertium-LANG1-LANG2.LANG1-LANG2.dix, then test again:
+
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
   
 
<pre>
 
<pre>
 
make -j
 
make -j
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
 
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
 
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
   
  +
==HFST and other alternative setups==
  +
If you're making a monolingual module that should use HFST/lexc, pass the option <code>--analyser=hfst</code> to apertium-init.py.
   
  +
If you're making a pair where the "left" side (XXX in the above examples) uses HFST/lexc, pass the option <code>--analyser1=hfst</code> to apertium-init.py.
  +
  +
If you're making a pair where the "right" side (YYY in the above examples) uses HFST/lexc, pass the option <code>--analyser2=hfst</code> to apertium-init.py.
  +
  +
If you're making a pair where the both sides use HFST/lexc, pass the option <code>--analysers=hfst</code> to apertium-init.py.
  +
  +
See https://github.com/apertium/apertium-init for all more documentation, or run <code>./apertium-init.py --help</code> for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator).
   
 
[[Category:Documentation]]
 
[[Category:Documentation]]
  +
[[Category:Installation]]
  +
[[Category:Documentation in English]]

Revision as of 21:50, 27 October 2018

How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).

Prerequisites

You need to get this installed first:

With two existing monolingual packages

Do the two languages you're making a pair of already have monolingual modules in the repository?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:


First compile the monolingual packages:

git clone https://github.com/apertium/apertium-XXX.git
cd apertium-XXX
./autogen.sh
make -j
cd ..
git clone https://github.com/apertium/apertium-YYY.git
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With one existing monolingual package

Does just one of the two languagues you're making a pair of already have a monolingual module in the repository?


Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch.
ISO 639-3 codes can be found here: http://www-01.sil.org/iso639-3/codes.asp

First make a new monolingual package:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

Then get and compile the existing monolingual package:

git clone https://github.com/apertium/apertium-YYY.git
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

cd apertium-XXX-YYY
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With no existing monolingual packages

Do none of the two languagues you're making a pair of already have monolingual modules in the repository?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:

First make and compile the new monolingual packages:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

python3 apertium-init.py YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

HFST and other alternative setups

If you're making a monolingual module that should use HFST/lexc, pass the option --analyser=hfst to apertium-init.py.

If you're making a pair where the "left" side (XXX in the above examples) uses HFST/lexc, pass the option --analyser1=hfst to apertium-init.py.

If you're making a pair where the "right" side (YYY in the above examples) uses HFST/lexc, pass the option --analyser2=hfst to apertium-init.py.

If you're making a pair where the both sides use HFST/lexc, pass the option --analysers=hfst to apertium-init.py.

See https://github.com/apertium/apertium-init for all more documentation, or run ./apertium-init.py --help for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator).