Difference between revisions of "How to bootstrap a new pair"
Popcorndude (talk | contribs) (update download link for -init) |
Popcorndude (talk | contribs) |
||
Line 7: | Line 7: | ||
* apertium/lttoolbox/hfst, see [[Installation]], in particular the ''prerequisites'' parts. (You most likely don't need to go all the way, since you should get this stuff from Tino's repositories. If you're on Windows, get the [[Apertium VirtualBox]].) |
* apertium/lttoolbox/hfst, see [[Installation]], in particular the ''prerequisites'' parts. (You most likely don't need to go all the way, since you should get this stuff from Tino's repositories. If you're on Windows, get the [[Apertium VirtualBox]].) |
||
* [[apertium-init]].py – put this script in your working directory where you will be downloading language data |
* [[apertium-init]].py – put this script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init |
||
==With two existing monolingual packages== |
==With two existing monolingual packages== |
||
Line 38: | Line 38: | ||
<pre> |
<pre> |
||
cd apertium-XXX-YYY |
|||
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY |
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY |
||
make -j |
make -j |
||
Line 61: | Line 62: | ||
Does just one of the two languagues you're making a pair of already have a monolingual module in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]? |
Does just one of the two languagues you're making a pair of already have a monolingual module in [https://apertium.github.io/apertium-on-github/source-browser.html the repository]? |
||
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch. <br> |
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch. <br> |
||
Line 171: | Line 171: | ||
If you're making a pair where the both sides use HFST/lexc, pass the option <code>--analysers=hfst</code> to apertium-init.py. |
If you're making a pair where the both sides use HFST/lexc, pass the option <code>--analysers=hfst</code> to apertium-init.py. |
||
See https://github.com/apertium/apertium-init for |
See https://github.com/apertium/apertium-init for more documentation, or run <code>./apertium-init.py --help</code> for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator). |
||
[[Category:Documentation]] |
[[Category:Documentation]] |
Revision as of 21:00, 13 January 2021
How to use apertium-init to bootstrap a new language pair (optionally with new monolingual data packages as well).
Prerequisites
You need to get this installed first:
- apertium/lttoolbox/hfst, see Installation, in particular the prerequisites parts. (You most likely don't need to go all the way, since you should get this stuff from Tino's repositories. If you're on Windows, get the Apertium VirtualBox.)
- apertium-init.py – put this script in your working directory where you will be downloading language data. You can get the script from https://apertium.org/apertium-init
With two existing monolingual packages
Do the two languages you're making a pair of already have monolingual modules in the repository?
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
First compile the monolingual packages:
git clone https://github.com/apertium/apertium-XXX.git cd apertium-XXX ./autogen.sh make -j cd .. git clone https://github.com/apertium/apertium-YYY.git cd apertium-YYY ./autogen.sh make -j cd ..
Then generate the pair:
python3 apertium-init.py XXX-YYY
Then compile the pair:
cd apertium-XXX-YYY ./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY make -j
And test:
echo house | apertium -d . XXX-YYY echo Haus | apertium -d . YYY-XXX
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
make -j echo house | apertium -d . XXX-YYY echo Haus | apertium -d . YYY-XXX
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
With one existing monolingual package
Does just one of the two languagues you're making a pair of already have a monolingual module in the repository?
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages; here we assume XXX needs to be made from scratch.
ISO 639-3 codes can be found here: http://www-01.sil.org/iso639-3/codes.asp
First make a new monolingual package:
python3 apertium-init.py XXX cd apertium-XXX ./autogen.sh make -j cd ..
Then get and compile the existing monolingual package:
git clone https://github.com/apertium/apertium-YYY.git cd apertium-YYY ./autogen.sh make -j cd ..
Then generate the pair:
python3 apertium-init.py XXX-YYY
Then compile the pair:
cd apertium-XXX-YYY ./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY make -j
And test:
echo house | apertium -d . XXX-YYY echo Haus | apertium -d . YYY-XXX
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
make -j echo house | apertium -d . XXX-YYY echo Haus | apertium -d . YYY-XXX
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
With no existing monolingual packages
Do none of the two languagues you're making a pair of already have monolingual modules in the repository?
Then follow this part, replacing XXX and YYY for the ISO 639-3 codes of your languages:
First make and compile the new monolingual packages:
python3 apertium-init.py XXX cd apertium-XXX ./autogen.sh make -j cd .. python3 apertium-init.py YYY cd apertium-YYY ./autogen.sh make -j cd ..
Then generate the pair:
python3 apertium-init.py XXX-YYY
Then compile the pair:
./autogen.sh --with-lang1=../apertium-XXX --with-lang2=../apertium-YYY make -j
And test:
echo house | apertium -d . XXX-YYY echo Haus | apertium -d . YYY-XXX
Now you can add words to apertium-XXX-YYY.XXX-YYY.dix, then test again:
make -j echo house | apertium -d . XXX-YYY echo Haus | apertium -d . YYY-XXX
If you had to add words to the monolingual dictionaries, you will have to type "make" in those directories first. Alternatively, there is a shortcut from the pair directory: "make langs" should make the monolingual dictionaries even if you're in the pair directory.
HFST and other alternative setups
If you're making a monolingual module that should use HFST/lexc, pass the option --analyser=hfst
to apertium-init.py.
If you're making a pair where the "left" side (XXX in the above examples) uses HFST/lexc, pass the option --analyser1=hfst
to apertium-init.py.
If you're making a pair where the "right" side (YYY in the above examples) uses HFST/lexc, pass the option --analyser2=hfst
to apertium-init.py.
If you're making a pair where the both sides use HFST/lexc, pass the option --analysers=hfst
to apertium-init.py.
See https://github.com/apertium/apertium-init for more documentation, or run ./apertium-init.py --help
for all options (you can e.g. also make pairs that don't use a statistical disambiguator, or don't use a Constraint Grammar disambiguator).