Difference between revisions of "Compilation Speed"

From Apertium
Jump to navigation Jump to search
(Created page with "Compiling dictionaries takes time. There are some tricks for speeding it up. Each language pair is different so you may want to time the effect yourself, but the following t...")
 
 
(11 intermediate revisions by the same user not shown)
Line 3: Line 3:
Each language pair is different so you may want to time the effect yourself, but the following tricks have all had real speedups on apertium-nno-nob at least. All timings are real/wall clock:
Each language pair is different so you may want to time the effect yourself, but the following tricks have all had real speedups on apertium-nno-nob at least. All timings are real/wall clock:


1. Use <code>make -j</code> instead of just <code>make</code>, in order to use several processors for make goals that can be parallellised. (Note that if you're low on memory and your dictionaries are very large, you may want to cap it to <code>make -j2</code> or similar.)
* <code>make clean && make </code> in nno-nob takes 2m33s
* <code>make clean && make -j</code> in nno-nob takes 1m24s


1) '''PARALLELLISE''': Use <code>make -j</code> instead of just <code>make</code>, in order to use several processors for make goals that can be parallellised. (Note that if you're low on memory and your dictionaries are very large, you may want to cap it to <code>make -j2</code> or similar.)
2. Put <nowiki><code>export LT_JOBS=yes</code></nowiki> in your ~/.bashrc (or ~/.bash_profile) – this will let lttoolbox split dictionaries into sections of at most 50k entries and minimise them in parallel (and large sections get exponentially slower to minimise). You can also tweak the threshold with <nowiki><code>export LT_MAX_SECTION_ENTRIES=50000</code></nowiki>.
* <code>make clean && export LT_JOBS=no && make -j nob.automorf.bin</code> in nob takes 41s (typical situation where only monodix has been changed)
* <code>make clean && make </code> in nno-nob takes 1m55s
* <code>make clean && export LT_JOBS=yes && make -j nob.automorf.bin</code> in nob takes 17s
* <code>make clean && make -j</code> in nno-nob takes 1m02s



3. Try another malloc, e.g. tcmalloc or jemalloc. On Debian/Ubuntu, you would <code>sudo apt install libtcmalloc-minimal4</code> and then put <nowiki><code>function m () { export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4; make "$@"; }</code></nowiki> into your ~/.bashrc (or ~/.bash_profile). Now you can use <code>m</code> instead of <code>make</code> to compile with tcmalloc.
2) '''SPLIT SECTIONS''': Put <code><nowiki>export LT_JOBS=yes</nowiki></code> in your ~/.bashrc (or ~/.bash_profile) – this will let lttoolbox split dictionaries into sections of at most 50k entries and minimise them in parallel (and large sections get exponentially slower to minimise). You can also tweak the threshold with <code><nowiki>export LT_MAX_SECTION_ENTRIES=50000</nowiki></code>.
* <code>make clean && export LT_JOBS=yes && make -j</code> in nno-nob takes 1m24s with regular malloc (glibc)
* <code>make clean && export LT_JOBS=yes && m -j</code> in nno-nob takes 1m03s with tcmalloc
* <code>make clean && export LT_JOBS=no && make -j nob.automorf.bin</code> in nob takes 16s (typical situation where only monodix has been changed)
* <code>make clean && export LT_JOBS=yes && make -j nob.automorf.bin</code> in nob takes 8s


3) '''ALLOCATE FASTER''': Try another malloc, e.g. tcmalloc or jemalloc. On Debian/Ubuntu, you would <code>sudo apt install libtcmalloc-minimal4</code> and then put <code><nowiki>function m () { export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4; make "$@"; }</nowiki></code> into your ~/.bashrc (or ~/.bash_profile). Now you can use <code>m</code> instead of <code>make</code> to compile with tcmalloc.
* <code>make clean && export LT_JOBS=yes && make -j</code> in nno-nob takes 57s with regular malloc (glibc)
* <code>make clean && export LT_JOBS=yes && m -j</code> in nno-nob takes 45s with tcmalloc


4) '''BUILD LESS''': If you work mostly on one direction, you may want to make a debug goal in your Makefile.am for just that direction ([https://github.com/apertium/apertium-nno-nob/blob/d9c07db56a8ca3748039ff48db331082a0a33c3a/Makefile.am#L52..L56 example], used as <code>m -j e</code>)
* make clean in all three dirs, then <code>export LT_JOBS=yes && m -j langs</code> in nno-nob takes 1m20s
* make clean in all three dirs, then <code>export LT_JOBS=yes && m -j e </code> in nno-nob takes 42s


5) '''TRIM SMARTER''': If you have sections that you already know contain matching elements in the analyser and bidix, you can ensure they're only trimmed with each other (ignoring other sections). Large regular expressions in particular can slow down trimming if you have to check every single bidix entry for a match – by putting <code>&lt;re&gt;</code> entries in a <code>&lt;section id="regex" type="standard"</code> in both files and changing Makefile.am to <code>lt-trim --match-section=regex@standard $^ $@</code>, lt-trim will ignore entries from sections other than regex@standard in bidix when trimming regex@standard from the analyser. (This will of course work with any kind of entry, not just regexes.) Time how long the lt-trim step takes first – for most pairs it's fast enough already.


and for comparison, with ''none'' of the above tricks:
* make clean in all three dirs, then <code>export LT_JOBS=no && make langs</code> in nno-nob takes 3m14s





Latest revision as of 11:47, 4 October 2022

Compiling dictionaries takes time. There are some tricks for speeding it up.

Each language pair is different so you may want to time the effect yourself, but the following tricks have all had real speedups on apertium-nno-nob at least. All timings are real/wall clock:


1) PARALLELLISE: Use make -j instead of just make, in order to use several processors for make goals that can be parallellised. (Note that if you're low on memory and your dictionaries are very large, you may want to cap it to make -j2 or similar.)

  • make clean && make in nno-nob takes 1m55s
  • make clean && make -j in nno-nob takes 1m02s


2) SPLIT SECTIONS: Put export LT_JOBS=yes in your ~/.bashrc (or ~/.bash_profile) – this will let lttoolbox split dictionaries into sections of at most 50k entries and minimise them in parallel (and large sections get exponentially slower to minimise). You can also tweak the threshold with export LT_MAX_SECTION_ENTRIES=50000.

  • make clean && export LT_JOBS=no && make -j nob.automorf.bin in nob takes 16s (typical situation where only monodix has been changed)
  • make clean && export LT_JOBS=yes && make -j nob.automorf.bin in nob takes 8s


3) ALLOCATE FASTER: Try another malloc, e.g. tcmalloc or jemalloc. On Debian/Ubuntu, you would sudo apt install libtcmalloc-minimal4 and then put function m () { export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4; make "$@"; } into your ~/.bashrc (or ~/.bash_profile). Now you can use m instead of make to compile with tcmalloc.

  • make clean && export LT_JOBS=yes && make -j in nno-nob takes 57s with regular malloc (glibc)
  • make clean && export LT_JOBS=yes && m -j in nno-nob takes 45s with tcmalloc


4) BUILD LESS: If you work mostly on one direction, you may want to make a debug goal in your Makefile.am for just that direction (example, used as m -j e)

  • make clean in all three dirs, then export LT_JOBS=yes && m -j langs in nno-nob takes 1m20s
  • make clean in all three dirs, then export LT_JOBS=yes && m -j e in nno-nob takes 42s


5) TRIM SMARTER: If you have sections that you already know contain matching elements in the analyser and bidix, you can ensure they're only trimmed with each other (ignoring other sections). Large regular expressions in particular can slow down trimming if you have to check every single bidix entry for a match – by putting <re> entries in a <section id="regex" type="standard" in both files and changing Makefile.am to lt-trim --match-section=regex@standard $^ $@, lt-trim will ignore entries from sections other than regex@standard in bidix when trimming regex@standard from the analyser. (This will of course work with any kind of entry, not just regexes.) Time how long the lt-trim step takes first – for most pairs it's fast enough already.


and for comparison, with none of the above tricks:

  • make clean in all three dirs, then export LT_JOBS=no && make langs in nno-nob takes 3m14s