Difference between revisions of "How to bootstrap a new pair"

From Apertium
Jump to navigation Jump to search
Line 14: Line 14:
 
Do the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?
 
Do the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?
   
Then follow this part, replacing LANG1 and LANG2 for the ISO 639-3 codes of your languages:
+
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
   
   
 
First compile the monolingual packages:
 
First compile the monolingual packages:
 
<pre>
 
<pre>
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-LANG1
+
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-XXX
cd apertium-LANG1
+
cd apertium-XXX
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
 
cd ..
 
cd ..
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-LANG2
+
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-YYY
cd apertium-LANG2
+
cd apertium-YYY
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 33: Line 33:
 
Then generate the pair:
 
Then generate the pair:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1-LANG2
+
python3 apertium-init.py XXX-YYY
 
</pre>
 
</pre>
   
Line 39: Line 39:
   
 
<pre>
 
<pre>
./autogen.sh --with-lang1=../apertium-LANG1 --with-lang2=../apertium-LANG2
+
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
 
make -j
 
make -j
 
</pre>
 
</pre>
Line 45: Line 45:
 
And test:
 
And test:
 
<pre>
 
<pre>
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Now you can add words to apertium-LANG1-LANG2.LANG1-LANG2.dix, then test again:
+
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
   
 
<pre>
 
<pre>
 
make -j
 
make -j
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Line 63: Line 63:
 
Does just one of the two languagues you're making a pair of already have a monolingual module in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?
 
Does just one of the two languagues you're making a pair of already have a monolingual module in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?
   
Then follow this part, replacing LANG1 and LANG2 for the ISO 639-3 codes of your languages; here we assume LANG1 needs to be made from scratch.
+
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch.
   
 
First make a new monolingual package:
 
First make a new monolingual package:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1
+
python3 apertium-init.py XXX
cd apertium-LANG1
+
cd apertium-XXX
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 76: Line 76:
 
Then get and compile the existing monolingual package:
 
Then get and compile the existing monolingual package:
 
<pre>
 
<pre>
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-LANG2
+
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-YYY
cd apertium-LANG2
+
cd apertium-YYY
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 85: Line 85:
 
Then generate the pair:
 
Then generate the pair:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1-LANG2
+
python3 apertium-init.py XXX-YYY
 
</pre>
 
</pre>
   
Line 91: Line 91:
   
 
<pre>
 
<pre>
./autogen.sh --with-lang1=../apertium-LANG1 --with-lang2=../apertium-LANG2
+
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
 
make -j
 
make -j
 
</pre>
 
</pre>
Line 97: Line 97:
 
And test:
 
And test:
 
<pre>
 
<pre>
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Now you can add words to apertium-LANG1-LANG2.LANG1-LANG2.dix, then test again:
+
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
   
 
<pre>
 
<pre>
 
make -j
 
make -j
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Line 115: Line 115:
 
Do none of the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ or https://svn.code.sf.net/p/apertium/svn/incubator ?
 
Do none of the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ or https://svn.code.sf.net/p/apertium/svn/incubator ?
   
Then follow this part, replacing LANG1 and LANG2 for the ISO 639-3 codes of your languages:
+
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
   
 
First make and compile the new monolingual packages:
 
First make and compile the new monolingual packages:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1
+
python3 apertium-init.py XXX
cd apertium-LANG1
+
cd apertium-XXX
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
 
cd ..
 
cd ..
   
python3 apertium-init.py LANG2
+
python3 apertium-init.py YYY
cd apertium-LANG2
+
cd apertium-YYY
 
./autogen.sh
 
./autogen.sh
 
make -j
 
make -j
Line 134: Line 134:
 
Then generate the pair:
 
Then generate the pair:
 
<pre>
 
<pre>
python3 apertium-init.py LANG1-LANG2
+
python3 apertium-init.py XXX-YYY
 
</pre>
 
</pre>
   
Line 140: Line 140:
   
 
<pre>
 
<pre>
./autogen.sh --with-lang1=../apertium-LANG1 --with-lang2=../apertium-LANG2
+
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
 
make -j
 
make -j
 
</pre>
 
</pre>
Line 146: Line 146:
 
And test:
 
And test:
 
<pre>
 
<pre>
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   
Now you can add words to apertium-LANG1-LANG2.LANG1-LANG2.dix, then test again:
+
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
   
 
<pre>
 
<pre>
 
make -j
 
make -j
echo house | apertium -d . LANG1-LANG2
+
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . LANG2-LANG1
+
echo Haus | apertium -d . YYY-XXX
 
</pre>
 
</pre>
   

Revision as of 09:27, 8 December 2015

How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).

Prerequisites

You need to get this installed first:

  • apertium/lttoolbox/hfst, see Installation, in particular the prerequisites parts. (You most likely don't need to go all the way to "minimal installation from svn", since you should get this stuff from Tino's repositories. If you're on Windows, get the virtualbox)
  • apertium-init – put this script in your working directory where you will be downloading language data

With two existing monolingual packages

Do the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:


First compile the monolingual packages:

svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..
svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With one existing monolingual package

Does just one of the two languagues you're making a pair of already have a monolingual module in https://svn.code.sf.net/p/apertium/svn/languages/ (or perhaps https://svn.code.sf.net/p/apertium/svn/incubator )?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch.

First make a new monolingual package:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

Then get and compile the existing monolingual package:

svn co https://svn.code.sf.net/p/apertium/svn/languages/apertium-YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.

With no existing monolingual packages

Do none of the two languagues you're making a pair of already have monolingual modules in https://svn.code.sf.net/p/apertium/svn/languages/ or https://svn.code.sf.net/p/apertium/svn/incubator ?

Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:

First make and compile the new monolingual packages:

python3 apertium-init.py XXX
cd apertium-XXX
./autogen.sh
make -j
cd ..

python3 apertium-init.py YYY
cd apertium-YYY
./autogen.sh
make -j
cd ..

Then generate the pair:

python3 apertium-init.py XXX-YYY

Then compile the pair:

./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY
make -j

And test:

echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:

make -j
echo house | apertium -d . XXX-YYY
echo Haus | apertium -d . YYY-XXX

If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.