Various developers have created ad-hoc client libraries based on how the web interface works.
The goal of offering an OpenAPI-based API is to provide developers with automatically generated client libraries for a large number of programming languages, that target a stable interface independent of the web interface’s implementation details.
Visit https://codesearch.debian.net/apikeys/ to download your personal API key. Login via Debian’s GitLab instance salsa.debian.org; register there if you have no account yet.
Find the Debian Code Search client library for your programming language. If none exists yet, auto-generate a client library on editor.swagger.io: click “Generate Client”.
Search all code in Debian from your own analysis tool, migration tracking dashboard, etc.
curl \
-H "x-dcs-apikey: $(cat dcs-apikey-stapelberg.txt)" \
-X GET \
"https://codesearch.debian.net/api/v1/search?query=i3Font&match_mode=regexp"
You can try out the API in your web browser in the OpenAPI documentation.
Here’s an example program that demonstrates how to set up an auto-generated Go client for the Debian Code Search OpenAPI, run a query, and aggregate the results:
func burndown() error {
cfg := openapiclient.NewConfiguration()
cfg.AddDefaultHeader("x-dcs-apikey", apiKey)
client := openapiclient.NewAPIClient(cfg)
ctx := context.Background()
// Search through the full Debian Code Search corpus, blocking until all
// results are available:
results, _, err := client.SearchApi.Search(ctx, "fmt.Sprint(err)", &openapiclient.SearchApiSearchOpts{
// Literal searches are faster and do not require escaping special
// characters, regular expression searches are more powerful.
MatchMode: optional.NewString("literal"),
})
if err != nil {
return err
}
// Print to stdout a CSV file with the path and number of occurrences:
wr := csv.NewWriter(os.Stdout)
header := []string{"path", "number of occurrences"}
if err := wr.Write(header); err != nil {
return err
}
occurrences := make(map[string]int)
for _, result := range results {
occurrences[result.Path]++
}
for _, result := range results {
o, ok := occurrences[result.Path]
if !ok {
continue
}
// Print one CSV record per path:
delete(occurrences, result.Path)
record := []string{result.Path, strconv.Itoa(o)}
if err := wr.Write(record); err != nil {
return err
}
}
wr.Flush()
return wr.Error()
}
The full example can be found under
burndown.go
.
File a GitHub issue on
github.com/Debian/dcs
please!
I’m aware of the following third-party projects using Debian Code Search:
Tool | Migration status |
---|---|
Debian Code Search CLI tool | Updated to OpenAPI |
identify-incomplete-xs-go-import-path | Update pending |
gnome-codesearch | makes no API queries |
If you find any others, please point them to this post in case they are not using Debian Code Search’s OpenAPI yet.
]]>The focus of this release lies on:
a better developer experience, allowing users to debug any installed package without extra setup steps
performance improvements in all areas (starting programs, building distri packages, generating distri images)
better tooling for keeping track of upstream versions
See the release notes for more details.
The distri research linux distribution project was started in 2019 to research whether a few architectural changes could enable drastically faster package management.
While the package managers in common Linux distributions (e.g. apt, dnf, …) top out at data rates of only a few MB/s, distri effortlessly saturates 1 Gbit, 10 Gbit and even 40 Gbit connections, resulting in fast installation and update speeds.
]]>emacs
) are hermetic. By
hermetic, I mean that the dependencies a package uses (e.g. libusb
) don’t
change, even when newer versions are installed.
For example, if package libusb-amd64-1.0.22-7
is available at build time, the
package will always use that same version, even after the newer
libusb-amd64-1.0.23-8
will be installed into the package store.
Another way of saying the same thing is: packages in distri are always co-installable.
This makes the package store more robust: additions to it will not break the system. On a technical level, the package store is implemented as a directory containing distri SquashFS images and metadata files, into which packages are installed in an atomic way.
One exception where hermeticity is not desired are plugin mechanisms: optionally loading out-of-tree code at runtime obviously is not hermetic.
As an example, consider glibc’s Name Service Switch
(NSS)
mechanism. Page 29.4.1 Adding another Service to
NSS
describes how glibc searches $prefix/lib
for shared libraries at runtime.
Debian ships about a dozen NSS libraries for a variety of purposes, and enterprise setups might add their own into the mix.
systemd (as of v245) accounts for 4 NSS libraries,
e.g. nss-systemd
for user/group name resolution for users allocated through systemd’s
DynamicUser=
option.
Having packages be as hermetic as possible remains a worthwhile goal despite any exceptions: I will gladly use a 99% hermetic system over a 0% hermetic system any day.
Side note: Xorg’s driver model (which can be characterized as a plugin mechanism) does not fall under this category because of its tight API/ABI coupling! For this case, where drivers are only guaranteed to work with precisely the Xorg version for which they were compiled, distri uses per-package exchange directories.
On a technical level, the requirement is: all paths used by the program must
always result in the same contents. This is implemented in distri via the
read-only package store mounted at /ro
, e.g. files underneath
/ro/emacs-amd64-26.3-15
never change.
To change all paths used by a program, in practice, three strategies cover most paths:
Programs on Linux use the ELF file format, which contains two kinds of references:
First, the ELF interpreter (PT_INTERP
segment), which is used to start the
program. For dynamically linked programs on 64-bit systems, this is typically
ld.so(8)
.
Many distributions use system-global paths such as
/lib64/ld-linux-x86-64.so.2
, but distri compiles programs with
-Wl,--dynamic-linker=/ro/glibc-amd64-2.31-4/out/lib/ld-linux-x86-64.so.2
so
that the full path ends up in the binary.
The ELF interpreter is shown by file(1)
, but you can also use readelf -a $BINARY | grep 'program interpreter'
to display it.
And secondly, the rpath, a run-time search
path for dynamic libraries. Instead of
storing full references to all dynamic libraries, we set the rpath so that
ld.so(8)
will find the correct dynamic libraries.
Originally, we used to just set a long rpath, containing one entry for each
dynamic library dependency. However, we have since switched to using a single
lib
subdirectory per
package
as its rpath, and placing symlinks with full path references into that lib
directory, e.g. using -Wl,-rpath=/ro/grep-amd64-3.4-4/lib
. This is better for
performance, as ld.so
uses a per-directory cache.
Note that program load times are significantly influenced by how quickly you can
locate the dynamic libraries. distri uses a FUSE file system to load programs
from, so getting proper -ENOENT
caching into
place
drastically sped up program load times.
Instead of compiling software with the -Wl,--dynamic-linker
and -Wl,-rpath
flags, one can also modify these fields after the fact using patchelf(1)
. For
closed-source programs, this is the only possibility.
The rpath can be inspected by using e.g. readelf -a $BINARY | grep RPATH
.
Many programs are influenced by environment variables: to start another program,
said program is often found by checking each directory in the PATH
environment
variable.
Such search paths are prevalent in scripting languages, too, to find
modules. Python has PYTHONPATH
, Perl has PERL5LIB
, and so on.
To set up these search path environment variables at run time, distri employs an
indirection. Instead of e.g. teensy-loader-cli
, you run a small wrapper
program that calls precisely one execve
system call with the desired
environment variables.
Initially, I used shell scripts as wrapper programs because they are easily inspectable. This turned out to be too slow, so I switched to compiled programs. I’m linking them statically for fast startup, and I’m linking them against musl libc for significantly smaller file sizes than glibc (per-executable overhead adds up quickly in a distribution!).
Note that the wrapper programs prepend to the PATH
environment variable, they
don’t replace it in its entirely. This is important so that users have a way to
extend the PATH
(and other variables) if they so choose. This doesn’t hurt
hermeticity because it is only relevant for programs that were not present at
build time, i.e. plugin mechanisms which, by design, cannot be hermetic.
The Shebang of scripts contains a path, too, and hence needs to be changed.
We don’t do this in distri yet (the number of packaged scripts is small), but we should.
The performance improvements in the previous sections are not just good to have,
but practically required when many processes are involved: without them, you’ll
encounter second-long delays in magit which spawns many git
processes under the covers, or in
dracut, which spawns one
cp(1)
process per file.
Linux distributions such as Debian consider it an advantage to roll out security
fixes to the entire system by updating a single shared library package
(e.g. openssl
).
The flip side of that coin is that changes to a single critical package can break the entire system.
With hermetic packages, all reverse dependencies must be rebuilt when a
library’s changes should be picked up by the whole system. E.g., when openssl
changes, curl
must be rebuilt to pick up the new version of openssl
.
This approach trades off using more bandwidth and more disk space (temporarily) against reducing the blast radius of any individual package update.
This can be partially mitigated by removing empty directories at build time, which will result in shorter variables.
In general, there is no getting around this. One little trick is to use tr : '\n'
, e.g.:
distri0# echo $PATH
/usr/bin:/bin:/usr/sbin:/sbin:/ro/openssh-amd64-8.2p1-11/out/bin
distri0# echo $PATH | tr : '\n'
/usr/bin
/bin
/usr/sbin
/sbin
/ro/openssh-amd64-8.2p1-11/out/bin
The implementation outlined above works well in hundreds of packages, and only a small handful exhibited problems of any kind. Here are some issues I encountered:
NSS libraries built against glibc 2.28 and newer cannot be loaded by glibc 2.27. In all likelihood, such changes do not happen too often, but it does illustrate that glibc’s published interface spec is not sufficient for forwards and backwards compatibility.
In distri, we could likely use a per-package exchange directory for glibc’s NSS mechanism to prevent the above problem from happening in the future.
Some programs try to arrange for themselves to be re-executed outside of their
current process tree. For example, consider building a program with the meson
build system:
When meson
first configures the build, it generates ninja
files (think
Makefiles) which contain command lines that run the meson --internal
helper.
Once meson
returns, ninja
is called as a separate process, so it will not
have the environment which the meson
wrapper sets up. ninja
then runs the
previously persisted meson
command line. Since the command line uses the
full path to meson
(not to its wrapper), it bypasses the wrapper.
Luckily, not many programs try to arrange for other process trees to run them. Here is a table summarizing how affected programs might try to arrange for re-execution, whether the technique results in a wrapper bypass, and what we do about it in distri:
technique to execute itself | uses wrapper | mitigation |
---|---|---|
run-time: find own basename in PATH |
yes | wrapper program |
compile-time: embed expected path | no; bypass! | configure or patch |
run-time: argv[0] or /proc/self/exe |
no; bypass! | patch |
One might think that setting argv[0]
to the wrapper location seems like a way
to side-step this problem. We tried doing this in distri, but had to
revert
and go the other
way.
-
character prepended to
argv[0]
, so shells like
bash or zsh cannot use wrapper
programs.At a very high level, adopting hermetic packages will require two steps:
Using fully qualified paths whose contents don’t change
(e.g. /ro/emacs-amd64-26.3-15
) generally requires rebuilding programs,
e.g. with --prefix
set.
Once you use fully qualified paths you need to make the packages able to
exchange data. distri solves this with exchange directories, implemented in the
/ro
file system which is backed by a FUSE daemon.
The first step is pretty simple, whereas the second step is where I expect controversy around any suggested mechanism.
This appendix contains commands and their outputs, run on upcoming distri
version supersilverhaze
, but verified to work on older versions, too.
Large outputs have been collapsed and can be expanded by clicking on the output.
The /bin
directory contains symlinks for the union of all package’s bin
subdirectories:
distri0# readlink -f /bin/teensy_loader_cli
/ro/teensy-loader-cli-amd64-2.1+g20180927-7/bin/teensy_loader_cli
The wrapper program in the bin
subdirectory is small:
distri0# ls -lh $(readlink -f /bin/teensy_loader_cli)
-rwxr-xr-x 1 root root 46K Apr 21 21:56 /ro/teensy-loader-cli-amd64-2.1+g20180927-7/bin/teensy_loader_cli
Wrapper programs execute quickly:
distri0# strace -fvy /bin/teensy_loader_cli |& head | cat -n
1 execve("/bin/teensy_loader_cli", ["/bin/teensy_loader_cli"], ["USER=root", "LOGNAME=root", "HOME=/root", "PATH=/ro/bash-amd64-5.0-4/bin:/r"..., "SHELL=/bin/zsh", "TERM=screen.xterm-256color", "XDG_SESSION_ID=c1", "XDG_RUNTIME_DIR=/run/user/0", "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_SESSION_TYPE=tty", "XDG_SESSION_CLASS=user", "SSH_CLIENT=10.0.2.2 42556 22", "SSH_CONNECTION=10.0.2.2 42556 10"..., "SSH_TTY=/dev/pts/0", "SHLVL=1", "PWD=/root", "OLDPWD=/root", "_=/usr/bin/strace", "LD_LIBRARY_PATH=/ro/bash-amd64-5"..., "PERL5LIB=/ro/bash-amd64-5.0-4/ou"..., "PYTHONPATH=/ro/bash-amd64-5.b0-4/"...]) = 0
2 arch_prctl(ARCH_SET_FS, 0x40c878) = 0
3 set_tid_address(0x40ca9c) = 715
4 brk(NULL) = 0x15b9000
5 brk(0x15ba000) = 0x15ba000
6 brk(0x15bb000) = 0x15bb000
7 brk(0x15bd000) = 0x15bd000
8 brk(0x15bf000) = 0x15bf000
9 brk(0x15c1000) = 0x15c1000
10 execve("/ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli", ["/ro/teensy-loader-cli-amd64-2.1+"...], ["USER=root", "LOGNAME=root", "HOME=/root", "PATH=/ro/bash-amd64-5.0-4/bin:/r"..., "SHELL=/bin/zsh", "TERM=screen.xterm-256color", "XDG_SESSION_ID=c1", "XDG_RUNTIME_DIR=/run/user/0", "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_SESSION_TYPE=tty", "XDG_SESSION_CLASS=user", "SSH_CLIENT=10.0.2.2 42556 22", "SSH_CONNECTION=10.0.2.2 42556 10"..., "SSH_TTY=/dev/pts/0", "SHLVL=1", "PWD=/root", "OLDPWD=/root", "_=/usr/bin/strace", "LD_LIBRARY_PATH=/ro/bash-amd64-5"..., "PERL5LIB=/ro/bash-amd64-5.0-4/ou"..., "PYTHONPATH=/ro/bash-amd64-5.0-4/"...]) = 0
Confirm which ELF interpreter is set for a binary using readelf(1)
:
distri0# readelf -a /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli | grep 'program interpreter'
[Requesting program interpreter: /ro/glibc-amd64-2.31-4/out/lib/ld-linux-x86-64.so.2]
Confirm the rpath is set to the package’s lib subdirectory using readelf(1)
:
distri0# readelf -a /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli | grep RPATH
0x000000000000000f (RPATH) Library rpath: [/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib]
…and verify the lib subdirectory has the expected symlinks and target versions:
distri0# find /ro/teensy-loader-cli-amd64-*/lib -type f -printf '%P -> %l\n'
libc.so.6 -> /ro/glibc-amd64-2.31-4/out/lib/libc-2.31.so
libpthread.so.0 -> /ro/glibc-amd64-2.31-4/out/lib/libpthread-2.31.so
librt.so.1 -> /ro/glibc-amd64-2.31-4/out/lib/librt-2.31.so
libudev.so.1 -> /ro/libudev-amd64-245-11/out/lib/libudev.so.1.6.17
libusb-0.1.so.4 -> /ro/libusb-compat-amd64-0.1.5-7/out/lib/libusb-0.1.so.4.4.4
libusb-1.0.so.0 -> /ro/libusb-amd64-1.0.23-8/out/lib/libusb-1.0.so.0.2.0
To verify the correct libraries are actually loaded, you can set the LD_DEBUG
environment variable for ld.so(8)
:
distri0# LD_DEBUG=libs teensy_loader_cli
[…]
678: find library=libc.so.6 [0]; searching
678: search path=/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib (RPATH from file /ro/teensy-loader-cli-amd64-2.1+g20180927-7/out/bin/teensy_loader_cli)
678: trying file=/ro/teensy-loader-cli-amd64-2.1+g20180927-7/lib/libc.so.6
678:
[…]
NSS libraries that distri ships:
find /lib/ -name "libnss_*.so.2" -type f -printf '%P -> %l\n'
libnss_myhostname.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_myhostname.so.2
libnss_mymachines.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_mymachines.so.2
libnss_resolve.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_resolve.so.2
libnss_systemd.so.2 -> ../systemd-amd64-245-11/out/lib/libnss_systemd.so.2
libnss_compat.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_compat.so.2
libnss_db.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_db.so.2
libnss_dns.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_dns.so.2
libnss_files.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_files.so.2
libnss_hesiod.so.2 -> ../glibc-amd64-2.31-4/out/lib/libnss_hesiod.so.2
“[…] initrd is a scheme for loading a temporary root file system into memory, which may be used as part of the Linux startup process […] to make preparations before the real root file system can be mounted.”
Many Linux distributions do not compile all file system drivers into the kernel, but instead load them on-demand from an initramfs, which saves memory.
Another common scenario, in which an initramfs is required, is full-disk encryption: the disk must be unlocked from userspace, but since userspace is encrypted, an initramfs is used.
Thus far, building a distri disk image was quite slow:
This is on an AMD Ryzen 3900X 12-core processor (2019):
distri % time make cryptimage serial=1
80.29s user 13.56s system 186% cpu 50.419 total # 19s image, 31s initrd
Of these 50 seconds,
dracut
’s initramfs
generation accounts for 31 seconds (62%)!
Initramfs generation time drops to 8.7 seconds once dracut
no longer needs to
use the single-threaded gzip(1)
, but the
multi-threaded replacement pigz(1)
:
This brings the total time to build a distri disk image down to:
distri % time make cryptimage serial=1
76.85s user 13.23s system 327% cpu 27.509 total # 19s image, 8.7s initrd
Clearly, when you use dracut
on any modern computer, you should make pigz
available. dracut
should fail to compile unless one explicitly opts into the
known-slower gzip. For more thoughts on optional dependencies, see “Optional
dependencies don’t work”.
But why does it take 8.7 seconds still? Can we go faster?
The answer is Yes! I recently built a distri-specific initramfs I’m calling
minitrd
. I wrote both big parts from scratch:
distri initrd
)cmd/minitrd
), running as /init
in the initramfs.minitrd
generates the initramfs image in ≈400ms, bringing the total time down
to:
distri % time make cryptimage serial=1
50.09s user 8.80s system 314% cpu 18.739 total # 18s image, 400ms initrd
(The remaining time is spent in preparing the file system, then installing and configuring the distri system, i.e. preparing a disk image you can run on real hardware.)
How can minitrd
be 20 times faster than dracut
?
dracut
is mainly written in shell, with a C helper program. It drives the
generation process by spawning lots of external dependencies (e.g. ldd
or the
dracut-install
helper program). I assume that the combination of using an
interpreted language (shell) that spawns lots of processes and precludes a
concurrent architecture is to blame for the poor performance.
minitrd
is written in Go, with speed as a goal. It leverages concurrency and
uses no external dependencies; everything happens within a single process (but
with enough threads to saturate modern hardware).
Measuring early boot time using qemu, I measured the dracut
-generated
initramfs taking 588ms to display the full disk encryption passphrase prompt,
whereas minitrd
took only 195ms.
The rest of this article dives deeper into how minitrd
works.
Ultimately, the job of an initramfs is to make the root file system available and continue booting the system from there. Depending on the system setup, this involves the following 5 steps:
Depending on the system, the block devices with the root file system might
already be present when the initramfs runs, or some kernel modules might need to
be loaded first. On my Dell XPS 9360 laptop, the NVMe system disk is already
present when the initramfs starts, whereas in qemu, we need to load the
virtio_pci
module, followed by the virtio_scsi
module.
How will our userland program know which kernel modules to load? Linux kernel modules declare patterns for their supported hardware as an alias, e.g.:
initrd# grep virtio_pci lib/modules/5.4.6/modules.alias
alias pci:v00001AF4d*sv*sd*bc*sc*i* virtio_pci
Devices in sysfs
have a modalias
file whose content can be matched against
these declarations to identify the module to load:
initrd# cat /sys/devices/pci0000:00/*/modalias
pci:v00001AF4d00001005sv00001AF4sd00000004bc00scFFi00
pci:v00001AF4d00001004sv00001AF4sd00000008bc01sc00i00
[…]
Hence, for the initial round of module loading, it is sufficient to locate all
modalias
files within sysfs
and load the responsible modules.
Loading a kernel module can result in new devices appearing. When that happens,
the kernel sends a
uevent,
which the uevent consumer in userspace receives via a netlink socket. Typically,
this consumer is udev(7)
, but in our case, it’s
minitrd
.
For each uevent messages that comes with a MODALIAS
variable, minitrd
will
load the relevant kernel module(s).
When loading a kernel module, its dependencies need to be loaded
first. Dependency information is stored in the modules.dep
file in a
Makefile
-like syntax:
initrd# grep virtio_pci lib/modules/5.4.6/modules.dep
kernel/drivers/virtio/virtio_pci.ko: kernel/drivers/virtio/virtio_ring.ko kernel/drivers/virtio/virtio.ko
To load a module, we can open its file and then call the Linux-specific finit_module(2)
system call. Some modules are expected to
return an error code, e.g. ENODEV
or ENOENT
when some hardware device is not
actually present.
Side note: next to the textual versions, there are also binary versions of the
modules.alias
and modules.dep
files. Presumably, those can be queried more
quickly, but for simplicitly, I have not (yet?) implemented support in
minitrd
.
Setting a legible font is necessary for hi-dpi displays. On my Dell XPS 9360 (3200 x 1800 QHD+ display), the following works well:
initrd# setfont latarcyrheb-sun32
Setting the user’s keyboard layout is necessary for entering the LUKS full-disk encryption passphrase in their preferred keyboard layout. I use the NEO layout:
initrd# loadkeys neo
In the Linux kernel, block device enumeration order is not necessarily the same on each boot. Even if it was deterministic, device order could still be changed when users modify their computer’s device topology (e.g. connect a new disk to a formerly unused port).
Hence, it is good style to refer to disks and their partitions with stable
identifiers. This also applies to boot loader configuration, and so most
distributions will set a kernel parameter such as
root=UUID=1fa04de7-30a9-4183-93e9-1b0061567121
.
Identifying the block device or partition with the specified UUID
is the
initramfs’s job.
Depending on what the device contains, the UUID comes from a different
place. For example, ext4
file systems have a UUID field in their file system
superblock, whereas LUKS volumes have a UUID in their LUKS header.
Canonically, probing a device to extract the UUID is done by libblkid
from the
util-linux
package, but the logic can easily be re-implemented in other
languages
and changes rarely. minitrd
comes with its own implementation to avoid
cgo or running the blkid(8)
program.
Unlocking a LUKS-encrypted volume is done in userspace. The kernel handles the crypto, but reading the metadata, obtaining the passphrase (or e.g. key material from a file) and setting up the device mapper table entries are done in user space.
initrd# modprobe algif_skcipher
initrd# cryptsetup luksOpen /dev/sda4 cryptroot1
After the user entered their passphrase, the root file system can be mounted:
initrd# mount /dev/dm-0 /mnt
Now that everything is set up, we need to pass execution to the init program on
the root file system with a careful sequence of chdir(2)
, mount(2)
, chroot(2)
, chdir(2)
and execve(2)
system calls that is explained in this busybox switch_root
comment.
initrd# mount -t devtmpfs dev /mnt/dev
initrd# exec switch_root -c /dev/console /mnt /init
To conserve RAM, the files in the temporary file system to which the initramfs archive is extracted are typically deleted.
An initramfs “image” (more accurately: archive) is a compressed cpio archive. Typically, gzip compression is used, but the kernel supports a bunch of different algorithms and distributions such as Ubuntu are switching to lz4.
Generators typically prepare a temporary directory and feed it to the cpio(1)
program. In minitrd
, we read the files into memory
and generate the cpio archive using the
go-cpio package. We use the
pgzip package for parallel gzip
compression.
The following files need to go into the cpio archive:
The minitrd
binary is copied into the cpio archive as /init
and will be run
by the kernel after extracting the archive.
Like the rest of distri, minitrd
is built statically without cgo, which means
it can be copied as-is into the cpio archive.
Aside from the modules.alias
and modules.dep
metadata files, the kernel
modules themselves reside in e.g. /lib/modules/5.4.6/kernel
and need to be
copied into the cpio archive.
Copying all modules results in a ≈80 MiB archive, so it is common to only copy modules that are relevant to the initramfs’s features. This reduces archive size to ≈24 MiB.
The filtering relies on hard-coded patterns and module names. For example, disk
encryption related modules are all kernel modules underneath kernel/crypto
,
plus kernel/drivers/md/dm-crypt.ko
.
When generating a host-only initramfs (works on precisely the computer that generated it), some initramfs generators look at the currently loaded modules and just copy those.
The kbd
package’s setfont(8)
and loadkeys(1)
programs load console fonts and keymaps from
/usr/share/consolefonts
and /usr/share/keymaps
, respectively.
Hence, these directories need to be copied into the cpio archive. Depending on whether the initramfs should be generic (work on many computers) or host-only (works on precisely the computer/settings that generated it), the entire directories are copied, or only the required font/keymap.
These programs are (currently) required because minitrd
does not implement
their functionality.
As they are dynamically linked, not only the programs themselves need to be
copied, but also the ELF dynamic linking loader (path stored in the .interp
ELF section) and any ELF library dependencies.
For example, cryptsetup
in distri declares the ELF interpreter
/ro/glibc-amd64-2.27-3/out/lib/ld-linux-x86-64.so.2
and declares dependencies
on shared libraries libcryptsetup.so.12
, libblkid.so.1
and others. Luckily,
in distri, packages contain a lib
subdirectory containing symbolic links to
the resolved shared library paths (hermetic packaging), so it is sufficient to
mirror the lib directory into the cpio archive, recursing into shared library
dependencies of shared libraries.
cryptsetup
also requires the GCC runtime library libgcc_s.so.1
to be present
at runtime, and will abort with an error message about not being able to call
pthread_cancel(3)
if it is unavailable.
To print log messages in the correct time zone, we copy /etc/localtime
from
the host into the cpio archive.
I currently have no desire to make minitrd
available outside of
distri. While the technical challenges (such as extending
the generator to not rely on distri’s hermetic packages) are surmountable, I
don’t want to support people’s initramfs remotely.
Also, I think that people’s efforts should in general be spent on rallying
behind dracut
and making it work faster, thereby benefiting all Linux
distributions that use dracut (increasingly more). With minitrd
, I have
demonstrated that significant speed-ups are achievable.
It was interesting to dive into how an initramfs really works. I had been working with the concept for many years, from small tasks such as “debug why the encrypted root file system is not unlocked” to more complicated tasks such as “set up a root file system on DRBD for a high-availability setup”. But even with that sort of experience, I didn’t know all the details, until I was forced to implement every little thing.
As I suspected going into this exercise, dracut
is much slower than it needs
to be. Re-implementing its generation stage in a modern language instead of
shell helps a lot.
Of course, my minitrd
does a bit less than dracut
, but not drastically
so. The overall architecture is the same.
I hope my effort helps with two things:
As a teaching implementation: instead of wading through the various components that make up a modern initramfs (udev, systemd, various shell scripts, …), people can learn about how an initramfs works in a single place.
I hope the significant time difference motivates people to improve dracut
.
Before writing any Go code, I did some manual prototyping. Learning how other people prototype is often immensely useful to me, so I’m sharing my notes here.
First, I copied all kernel modules and a statically built busybox binary:
% mkdir -p lib/modules/5.4.6
% cp -Lr /ro/lib/modules/5.4.6/* lib/modules/5.4.6/
% cp ~/busybox-1.22.0-amd64/busybox sh
To generate an initramfs from the current directory, I used:
% find . | cpio -o -H newc | pigz > /tmp/initrd
In distri’s Makefile
, I append these flags to the QEMU
invocation:
-kernel /tmp/kernel \
-initrd /tmp/initrd \
-append "root=/dev/mapper/cryptroot1 rdinit=/sh ro console=ttyS0,115200 rd.luks=1 rd.luks.uuid=63051f8a-54b9-4996-b94f-3cf105af2900 rd.luks.name=63051f8a-54b9-4996-b94f-3cf105af2900=cryptroot1 rd.vconsole.keymap=neo rd.vconsole.font=latarcyrheb-sun32 init=/init systemd.setenv=PATH=/bin rw vga=836"
The vga=
mode parameter is required for loading font latarcyrheb-sun32
.
Once in the busybox
shell, I manually prepared the required mount points and
kernel modules:
ln -s sh mount
ln -s sh lsmod
mkdir /proc /sys /run /mnt
mount -t proc proc /proc
mount -t sysfs sys /sys
mount -t devtmpfs dev /dev
modprobe virtio_pci
modprobe virtio_scsi
As a next step, I copied cryptsetup
and dependencies into the initramfs directory:
% for f in /ro/cryptsetup-amd64-2.0.4-6/lib/*; do full=$(readlink -f $f); rel=$(echo $full | sed 's,^/,,g'); mkdir -p $(dirname $rel); install $full $rel; done
% ln -s ld-2.27.so ro/glibc-amd64-2.27-3/out/lib/ld-linux-x86-64.so.2
% cp /ro/glibc-amd64-2.27-3/out/lib/ld-2.27.so ro/glibc-amd64-2.27-3/out/lib/ld-2.27.so
% cp -r /ro/cryptsetup-amd64-2.0.4-6/lib ro/cryptsetup-amd64-2.0.4-6/
% mkdir -p ro/gcc-libs-amd64-8.2.0-3/out/lib64/
% cp /ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1
% ln -s /ro/gcc-libs-amd64-8.2.0-3/out/lib64/libgcc_s.so.1 ro/cryptsetup-amd64-2.0.4-6/lib
% cp -r /ro/lvm2-amd64-2.03.00-6/lib ro/lvm2-amd64-2.03.00-6/
In busybox
, I used the following commands to unlock the root file system:
modprobe algif_skcipher
./cryptsetup luksOpen /dev/sda4 cryptroot1
mount /dev/dm-0 /mnt
]]>See the Conclusion for a summary if you’re impatient :-)
Over the last few months, I have been developing a new index format for Debian Code Search. This required a lot of careful refactoring, re-implementation, debug tool creation and debugging.
Multiple factors motivated my work on a new index format:
The existing index format has a 2G size limit, into which we have bumped a few times, requiring manual intervention to keep the system running.
Debugging the existing system required creating ad-hoc debugging tools, which made debugging sessions unnecessarily lengthy and painful.
I wanted to check whether switching to a different integer compression format would improve performance (it does not).
I wanted to check whether storing positions with the posting lists would improve performance of identifier queries (= queries which are not using any regular expression features), which make up 78.2% of all Debian Code Search queries (it does).
I figured building a new index from scratch was the easiest approach, compared to refactoring the existing index to increase the size limit (point ①).
I also figured it would be a good idea to develop the debugging tool in lock step with the index format so that I can be sure the tool works and is useful (point ②).
As a quick refresher, search engines typically store document IDs (representing source code files, in our case) in an ordered list (“posting list”). It usually makes sense to apply at least a rudimentary level of compression: our existing system used variable integer encoding.
TurboPFor, the self-proclaimed “Fastest
Integer Compression” library, combines an advanced on-disk format with a
carefully tuned SIMD implementation to reach better speeds (in micro benchmarks)
at less disk usage than Russ Cox’s varint implementation in
github.com/google/codesearch
.
If you are curious about its inner workings, check out my “TurboPFor: an analysis”.
Applied on the Debian Code Search index, TurboPFor indeed compresses integers better:
Switching to TurboPFor (via cgo) for storing and reading the index results in a
slight speed-up of a dcs replay
benchmark, which is more pronounced the more
i/o is required.
Overall, TurboPFor is an all-around improvement in efficiency, albeit with a high cost in implementation complexity.
This section builds on the previous section: all figures come from the TurboPFor index, which can optionally support positions.
Conceptually, we’re going from:
type docid uint32
type index map[trigram][]docid
…to:
type occurrence struct {
doc docid
pos uint32 // byte offset in doc
}
type index map[trigram][]occurrence
The resulting index consumes more disk space, but can be queried faster:
We can do fewer queries: instead of reading all the posting lists for all
the trigrams, we can read the posting lists for the query’s first and last
trigram only.
This is one of the tricks described in the paper
“AS-Index: A
Structure For String Search Using n-grams and Algebraic Signatures”
(PDF), and goes a long way without incurring the complexity, computational
cost and additional disk usage of calculating algebraic signatures.
Verifying the delta between the last and first position matches the length of the query term significantly reduces the number of files to read (lower false positive rate).
The matching phase is quicker: instead of locating the query term in the file, we only need to compare a few bytes at a known offset for equality.
More data is read sequentially (from the index), which is faster.
A positional index consumes significantly more disk space, but not so much as to pose a challenge: a Hetzner EX61-NVME dedicated server (≈ 64 €/month) provides 1 TB worth of fast NVMe flash storage.
The idea behind the positional index (posrel) is to not store a (doc,pos)
tuple on disk, but to store positions, accompanied by a stream of doc/pos
relationship bits: 1 means this position belongs to the next document, 0 means
this position belongs to the current document.
This is an easy way of saving some space without modifying the TurboPFor on-disk format: the posrel technique reduces the index size to about ¾.
With the increase in size, the Linux page cache hit ratio will be lower for the positional index, i.e. more data will need to be fetched from disk for querying the index.
As long as the disk can deliver data as fast as you can decompress posting lists, this only translates into one disk seek’s worth of additional latency. This is the case with modern NVMe disks that deliver thousands of MB/s, e.g. the Samsung 960 Pro (used in Hetzner’s aforementioned EX61-NVME server).
The values were measured by running dcs du -h /srv/dcs/shard*/full
without and with the -pos
argument.
A positional index requires fewer queries: reading only the first and last trigram’s posting lists and positions is sufficient to achieve a lower (!) false positive rate than evaluating all trigram’s posting lists in a non-positional index.
As a consequence, fewer files need to be read, resulting in fewer bytes required to read from disk overall.
As an additional bonus, in a positional index, more data is read sequentially (index), which is faster than random i/o, regardless of the underlying disk.
The values were measured by running iostat -d 25
just before running
bench.zsh
on an otherwise idle system.
Even though the positional index is larger and requires more data to be read at query time (see above), thanks to the C TurboPFor library, the 2 queries on a positional index are roughly as fast as the n queries on a non-positional index (≈4s instead of ≈3s).
This is more than made up for by the combined i/o matching stage, which shrinks from ≈18.5s (7.1s i/o + 11.4s matching) to ≈1.3s.
Note that identifier query i/o was sped up not just by needing to read fewer bytes, but also by only having to verify bytes at a known offset instead of needing to locate the identifier within the file.
The new index format is overall slightly more efficient. This disk space efficiency allows us to introduce a positional index section for the first time.
Most Debian Code Search queries are positional queries (78.2%) and will be answered much quicker by leveraging the positions.
Bottomline, it is beneficial to use a positional index on disk over a non-positional index in RAM.
]]>This article focuses on the package format and its advantages, but there is more to distri, which I will cover in upcoming blog posts.
I was a Debian Developer for the 7 years from 2012 to 2019, but using the distribution often left me frustrated, ultimately resulting in me winding down my Debian work.
Frequently, I was noticing a large gap between the actual speed of an operation (e.g. doing an update) and the possible speed based on back of the envelope calculations. I wrote more about this in my blog post “Package managers are slow”.
To me, this observation means that either there is potential to optimize the
package manager itself (e.g. apt
), or what the system does is just too
complex. While I remember seeing some low-hanging fruit¹, through my work on
distri, I wanted to explore whether all the complexity we currently have in
Linux distributions such as Debian or Fedora is inherent to the problem space.
I have completed enough of the experiment to conclude that the complexity is not inherent: I can build a Linux distribution for general-enough purposes which is much less complex than existing ones.
① Those were low-hanging fruit from a user perspective. I’m not saying that
fixing them is easy in the technical sense; I know too little about apt
’s code
base to make such a statement.
One key idea is to switch from using archives to using images for package
contents. Common package managers such as dpkg(1)
use tar(1)
archives with various compression
algorithms.
distri uses SquashFS images, a comparatively simple file system image format that I happen to be familiar with from my work on the gokrazy Raspberry Pi 3 Go platform.
This idea is not novel: AppImage and snappy also use images, but only for individual, self-contained applications. distri however uses images for distribution packages with dependencies. In particular, there is no duplication of shared libraries in distri.
A nice side effect of using read-only image files is that applications are immutable and can hence not be broken by accidental (or malicious!) modification.
Package contents are made available under a fully-qualified path. E.g., all
files provided by package zsh-amd64-5.6.2-3
are available under
/ro/zsh-amd64-5.6.2-3
. The mountpoint /ro
stands for read-only, which is
short yet descriptive.
Perhaps surprisingly, building software with custom prefix
values of
e.g. /ro/zsh-amd64-5.6.2-3
is widely supported, thanks to:
Linux distributions, which build software with prefix
set to /usr
,
whereas FreeBSD (and the autotools default), which build with prefix
set to
/usr/local
.
Enthusiast users in corporate or research environments, who install software into their home directories.
Because using a custom prefix
is a common scenario, upstream awareness for
prefix
-correctness is generally high, and the rarely required patch will be
quickly accepted.
Software packages often exchange data by placing or locating files in well-known directories. Here are just a few examples:
gcc(1)
locates the libusb(3)
headers via /usr/include
man(1)
locates the nginx(1)
manpage via /usr/share/man
.zsh(1)
locates executable programs via PATH
components such as /bin
In distri, these locations are called exchange directories and are provided
via FUSE in /ro
.
Exchange directories come in two different flavors:
global. The exchange directory, e.g. /ro/share
, provides the union of the
share
sub directory of all packages in the package store.
Global exchange directories are largely used for compatibility, see
below.
per-package. Useful for tight coupling: e.g. irssi(1)
does not provide any ABI guarantees, so plugins such as irssi-robustirc
can declare that they want
e.g. /ro/irssi-amd64-1.1.1-1/out/lib/irssi/modules
to be a per-package
exchange directory and contain files from their lib/irssi/modules
.
Programs which use exchange directories sometimes use search paths to access
multiple exchange directories. In fact, the examples above were taken from gcc(1)
’s INCLUDEPATH
, man(1)
’s MANPATH
and zsh(1)
’s PATH
. These are
prominent ones, but more examples are easy to find: zsh(1)
loads completion functions from its FPATH
.
Some search path values are derived from --datadir=/ro/share
and require no
further attention, but others might derive from
e.g. --prefix=/ro/zsh-amd64-5.6.2-3/out
and need to be pointed to an exchange
directory via a specific command line flag.
Global exchange directories are used to make distri provide enough of the Filesystem Hierarchy Standard (FHS) that third-party software largely just works. This includes a C development environment.
I successfully ran a few programs from their binary packages such as Google Chrome, Spotify, or Microsoft’s Visual Studio Code.
I previously wrote about how Linux distribution package managers are too slow.
distri’s package manager is extremely fast. Its main bottleneck is typically the network link, even at high speed links (I tested with a 100 Gbps link).
Its speed comes largely from an architecture which allows the package manager to do less work. Specifically:
Package images can be added atomically to the package store, so we can safely
skip fsync(2)
. Corruption will be cleaned up
automatically, and durability is not important: if an interactive
installation is interrupted, the user can just repeat it, as it will be fresh
on their mind.
Because all packages are co-installable thanks to separate hierarchies, there
are no conflicts at the package store level, and no dependency resolution (an
optimization problem requiring SAT
solving) is required at all.
In exchange directories, we resolve conflicts by selecting the package with the
highest monotonically increasing distri revision number.
distri proves that we can build a useful Linux distribution entirely without hooks and triggers. Not having to serialize hook execution allows us to download packages into the package store with maximum concurrency.
Because we are using images instead of archives, we do not need to unpack anything. This means installing a package is really just writing its package image and metadata to the package store. Sequential writes are typically the fastest kind of storage usage pattern.
Fast installation also make other use-cases more bearable, such as creating disk
images, be it for testing them in qemu(1)
, booting
them on real hardware from a USB drive, or for cloud providers such as Google
Cloud.
Contrary to how distribution package builders are usually implemented, the distri package builder does not actually install any packages into the build environment.
Instead, distri makes available a filtered view of the package store (only
declared dependencies are available) at /ro
in the build environment.
This means that even for large dependency trees, setting up a build environment happens in a fraction of a second! Such a low latency really makes a difference in how comfortable it is to iterate on distribution packages.
In distri, package images are installed from a remote package store into the
local system package store /roimg
, which backs the /ro
mount.
A package store is implemented as a directory of package images and their associated metadata files.
You can easily make available a package store by using distri export
.
To provide a mirror for your local network, you can periodically distri update
from the package store you want to mirror, and then distri export
your local
copy. Special tooling (e.g. debmirror
in Debian) is not required because
distri install
is atomic (and update
uses install
).
Producing derivatives is easy: just add your own packages to a copy of the package store.
The package store is intentionally kept simple to manage and distribute. Its files could be exchanged via peer-to-peer file systems, or synchronized from an offline medium.
distri works well enough to demonstrate the ideas explained above. I have
branched this state into branch
jackherer
, distri’s first
release code name. This way, I can keep experimenting in the distri repository
without breaking your installation.
From the branch contents, our autobuilder creates:
a package repository. Installations can pick up new packages with
distri update
.
The project website can be found at https://distr1.org. The website is just the README for now, but we can improve that later.
The repository can be found at https://github.com/distr1/distri
Right now, distri is mainly a vehicle for my spare-time Linux distribution research. I don’t recommend anyone use distri for anything but research, and there are no medium-term plans of that changing. At the very least, please contact me before basing anything serious on distri so that we can talk about limitations and expectations.
I expect the distri project to live for as long as I have blog posts to publish, and we’ll see what happens afterwards. Note that this is a hobby for me: I will continue to explore, at my own pace, parts that I find interesting.
My hope is that established distributions might get a useful idea or two from distri.
I don’t want to make this post too long, but there is much more!
Please subscribe to the following URL in your feed reader to get all posts about distri:
https://michael.stapelberg.ch/posts/tags/distri/feed.xml
Next in my queue are articles about hermetic packages and good package maintainer experience (including declarative packaging).
I’d love to discuss these ideas in case you’re interested!
Please send feedback to the distri mailing list so that everyone can participate!
]]>Pending feedback: Allan McRae pointed out that I should be more precise with my terminology: strictly speaking, distributions are slow, and package managers are only part of the puzzle.
I’ll try to be clearer in future revisions/posts.
I measured how long the most popular Linux distribution’s package manager take
to install small and large packages (the
ack(1p)
source code search Perl script
and qemu, respectively).
Where required, my measurements include metadata updates such as transferring an up-to-date package list. For me, requiring a metadata update is the more common case, particularly on live systems or within Docker containers.
All measurements were taken on an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
running Docker 20.10.8 on Linux 5.13.10, backed by a Corsair Force MP600 NVMe
drive boasting many hundreds of MB/s write performance. The machine is located
in Zürich and connected to the Internet with a 1 Gigabit fiber connection, so
the expected top download speed is ≈115 MB/s.
See Appendix D for details on the measurement method and command outputs.
Keep in mind that these are one-time measurements. They should be indicative of actual performance, but your experience may vary.
distribution | package manager | data | wall-clock time | rate |
---|---|---|---|---|
Fedora | dnf | 84 MB | 25s | 3.4 MB/s |
NixOS | Nix | 15 MB | 7s | 2.3 MB/s |
Debian | apt | 16 MB | 3s | 4.9 MB/s |
Arch Linux | pacman | 25 MB | 1s | 18.4 MB/s |
Alpine | apk | 10 MB | 1s | 11.9 MB/s |
distribution | package manager | data | wall-clock time | rate |
---|---|---|---|---|
Fedora | dnf | 350 MB | 56s | 6.25 MB/s |
Debian | apt | 256 MB | 39s | 6.5 MB/s |
NixOS | Nix | 251 MB | 36s | 6.8 MB/s |
Arch Linux | pacman | 128 MB | 10s | 12.1 MB/s |
Alpine | apk | 34 MB | 1.8s | 18.6 MB/s |
(Looking for older measurements? See Appendix B (2019) or Appendix C (2020)).
The difference between the slowest and fastest package managers is 30x!
How can Alpine’s apk and Arch Linux’s pacman be an order of magnitude faster than the rest? They are doing a lot less than the others, and more efficiently, too.
For example, Fedora transfers a lot more data than others because its main
package list is 60 MB (compressed!) alone. Compare that with Alpine’s 734 KB
APKINDEX.tar.gz
.
Of course the extra metadata which Fedora provides helps some use case, otherwise they hopefully would have removed it altogether. The amount of metadata seems excessive for the use case of installing a single package, which I consider the main use-case of an interactive package manager.
I expect any modern Linux distribution to only transfer absolutely required data to complete my task.
Because they need to sequence executing arbitrary package maintainer-provided code (hooks and triggers), all tested package managers need to install packages sequentially (one after the other) instead of concurrently (all at the same time).
In my blog post “Can we do without hooks and triggers?”, I outline that hooks and triggers are not strictly necessary to build a working Linux distribution.
Strictly speaking, the only required feature of a package manager is to make available the package contents so that the package can be used: a program can be started, a kernel module can be loaded, etc.
By only implementing what’s needed for this feature, and nothing more, a package
manager could likely beat apk
’s performance. It could, for example:
apk
)Here’s a table outlining how the various package managers listed on Wikipedia’s list of software package management systems fare:
name | scope | package file format | hooks/triggers |
---|---|---|---|
AppImage | apps | image: ISO9660, SquashFS | no |
snappy | apps | image: SquashFS | yes: hooks |
FlatPak | apps | archive: OSTree | no |
0install | apps | archive: tar.bz2 | no |
nix, guix | distro | archive: nar.{bz2,xz} | activation script |
dpkg | distro | archive: tar.{gz,xz,bz2} in ar(1) | yes |
rpm | distro | archive: cpio.{bz2,lz,xz} | scriptlets |
pacman | distro | archive: tar.xz | install |
slackware | distro | archive: tar.{gz,xz} | yes: doinst.sh |
apk | distro | archive: tar.gz | yes: .post-install |
Entropy | distro | archive: tar.bz2 | yes |
ipkg, opkg | distro | archive: tar{,.gz} | yes |
As per the current landscape, there is no distribution-scoped package manager which uses images and leaves out hooks and triggers, not even in smaller Linux distributions.
I think that space is really interesting, as it uses a minimal design to achieve significant real-world speed-ups.
I have explored this idea in much more detail, and am happy to talk more about it in my post distri: a Linux distribution to research fast package management.
There are a couple of recent developments going into the same direction:
You can expand each of these:
% docker run --security-opt=seccomp:unconfined -t -i fedora /bin/bash
[root@62d3cae2e2f9 /]# time dnf install -y ack
Fedora 35 - x86_64 25 MB/s | 61 MB
Fedora 35 openh264 (From Cisco) - x86_64 3.5 kB/s | 2.5 kB
Fedora Modular 35 - x86_64 5.0 MB/s | 2.6 MB
Fedora 35 - x86_64 - Updates 6.0 MB/s | 9.3 MB
Fedora Modular 35 - x86_64 - Updates 4.1 MB/s | 3.3 MB
Dependencies resolved.
[…]
real 0m24.882s
user 0m17.377s
sys 0m0.835s
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -iA nixpkgs.ack'
unpacking channels...
created 1 symlinks in user environment
installing 'perl5.34.0-ack-3.5.0'
these paths will be fetched (15.78 MiB download, 86.82 MiB unpacked):
/nix/store/11xpmmwy95396nkhih3qc3814lqhqb8f-libunistring-0.9.10
/nix/store/1h18nl3gisw89znbzbmnxhd7jk20xlff-perl5.34.0-File-Next-1.18
/nix/store/1mpxs3109cjrbhmi3q1vmvc0djz102pl-libidn2-2.3.2
/nix/store/jr35z7n8jbv9q89my50vhyndqd3y541i-attr-2.5.1
/nix/store/krc4xirbvjnff8m62snqdbayg46z5l5b-acl-2.3.1
/nix/store/mij848h2x5wiqkwhg027byvmf9x3gx7y-glibc-2.33-50
/nix/store/wq38iqzdh40dzfsndb927kh7y5bqh457-perl5.34.0-ack-3.5.0-man
/nix/store/xyn0240zrpprnspg3n0fi8c8aw5bq0mr-coreutils-8.32
/nix/store/y8r9ymbz59yjm1bwr3fdvd23jvcb2bzj-perl5.34.0-ack-3.5.0
/nix/store/ypr273yvmr07n5n1w1gbcqnhpw7lbbvz-perl-5.34.0
copying path '/nix/store/wq38iqzdh40dzfsndb927kh7y5bqh457-perl5.34.0-ack-3.5.0-man' from 'https://cache.nixos.org'...
copying path '/nix/store/11xpmmwy95396nkhih3qc3814lqhqb8f-libunistring-0.9.10' from 'https://cache.nixos.org'...
copying path '/nix/store/1h18nl3gisw89znbzbmnxhd7jk20xlff-perl5.34.0-File-Next-1.18' from 'https://cache.nixos.org'...
copying path '/nix/store/1mpxs3109cjrbhmi3q1vmvc0djz102pl-libidn2-2.3.2' from 'https://cache.nixos.org'...
copying path '/nix/store/mij848h2x5wiqkwhg027byvmf9x3gx7y-glibc-2.33-50' from 'https://cache.nixos.org'...
copying path '/nix/store/jr35z7n8jbv9q89my50vhyndqd3y541i-attr-2.5.1' from 'https://cache.nixos.org'...
copying path '/nix/store/krc4xirbvjnff8m62snqdbayg46z5l5b-acl-2.3.1' from 'https://cache.nixos.org'...
copying path '/nix/store/xyn0240zrpprnspg3n0fi8c8aw5bq0mr-coreutils-8.32' from 'https://cache.nixos.org'...
copying path '/nix/store/ypr273yvmr07n5n1w1gbcqnhpw7lbbvz-perl-5.34.0' from 'https://cache.nixos.org'...
copying path '/nix/store/y8r9ymbz59yjm1bwr3fdvd23jvcb2bzj-perl5.34.0-ack-3.5.0' from 'https://cache.nixos.org'...
building '/nix/store/pwlxhy7kry56z6593rh397fc49x5avlw-user-environment.drv'...
created 49 symlinks in user environment
real 0m 6.82s
user 0m 3.47s
sys 0m 2.11s
% docker run -t -i debian:sid
root@40a3899b1f2f:/# time (apt update && apt install -y ack-grep)
Get:1 http://deb.debian.org/debian sid InRelease [165 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 Packages [8800 kB]
Fetched 8965 kB in 1s (9495 kB/s)
[…]
The following NEW packages will be installed:
ack libfile-next-perl libgdbm-compat4 libgdbm6 libperl5.32 netbase perl perl-modules-5.32
0 upgraded, 8 newly installed, 0 to remove and 24 not upgraded.
Need to get 7479 kB of archives.
After this operation, 47.7 MB of additional disk space will be used.
[…]
real 0m3.260s
user 0m2.463s
sys 0m0.352s
% docker run -t -i archlinux:base
[root@9f6672688a64 /]# time (pacman -Sy && pacman -S --noconfirm ack)
:: Synchronizing package databases...
core 138.8 KiB 1542 KiB/s
extra 1569.8 KiB 26.9 MiB/s
community 5.8 MiB 92.2 MiB/s
resolving dependencies...
looking for conflicting packages...
Packages (5) db-5.3.28-5 gdbm-1.22-1 perl-5.34.0-2 perl-file-next-1.18-3 ack-3.5.0-2
Total Download Size: 16.77 MiB
Total Installed Size: 66.21 MiB
[…]
real 0m1.403s
user 0m0.484s
sys 0m0.211s
% docker run -t -i alpine
# time apk add ack
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/community/x86_64/APKINDEX.tar.gz
(1/4) Installing libbz2 (1.0.8-r1)
(2/4) Installing perl (5.32.1-r0)
(3/4) Installing perl-file-next (1.18-r2)
(4/4) Installing ack (3.5.0-r1)
Executing busybox-1.33.1-r3.trigger
OK: 43 MiB in 18 packages
real 0m 0.76s
user 0m 0.27s
sys 0m 0.09s
You can expand each of these:
% docker run -t -i fedora /bin/bash
[root@6a52ecfc3afa /]# time dnf install -y qemu
Fedora 35 - x86_64 15 MB/s | 61 MB
Fedora 35 openh264 (From Cisco) - x86_64 3.0 kB/s | 2.5 kB
Fedora Modular 35 - x86_64 5.2 MB/s | 2.6 MB
Fedora 35 - x86_64 - Updates 6.6 MB/s | 9.3 MB
Fedora Modular 35 - x86_64 - Updates 2.2 MB/s | 3.3 MB
Dependencies resolved.
[…]
Total download size: 274 M
Downloading Packages:
[…]
real 0m56.031s
user 0m31.275s
sys 0m3.868s
% docker run -t -i nixos/nix
83971cf79f7e:/# time sh -c 'nix-channel --update && nix-env -iA nixpkgs.qemu'
unpacking channels...
created 1 symlinks in user environment
installing 'qemu-6.1.0'
these paths will be fetched (230.72 MiB download, 1424.84 MiB unpacked):
[…]
real 0m 36.55s
user 0m 19.83s
sys 0m 3.34s
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86)
Get:1 http://deb.debian.org/debian sid InRelease [146 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 Packages [8400 kB]
Fetched 8965 kB in 1s (9048 kB/s)
[…]
Fetched 247 MB in 4s (64.9 MB/s)
[…]
real 0m38.875s
user 0m21.282s
sys 0m5.298s
% docker run -t -i archlinux:base
[root@58c78bda08e8 /]# time (pacman -Sy && pacman -S --noconfirm qemu)
:: Synchronizing package databases...
core 138.7 KiB 1541 KiB/s
extra 1569.8 KiB 35.7 MiB/s
community 5.8 MiB 92.2 MiB/s
[…]
Total Download Size: 118.97 MiB
Total Installed Size: 586.68 MiB
[…]
real 0m10.542s
user 0m3.092s
sys 0m1.569s
% docker run -t -i alpine
/ # time apk add qemu-system-x86_64
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/community/x86_64/APKINDEX.tar.gz
[…]
OK: 281 MiB in 66 packages
real 0m 1.83s
user 0m 0.77s
sys 0m 0.24s
You can expand each of these:
% docker run -t -i fedora /bin/bash
[root@62d3cae2e2f9 /]# time dnf install -y ack
Fedora 32 openh264 (From Cisco) - x86_64 1.9 kB/s | 2.5 kB 00:01
Fedora Modular 32 - x86_64 6.8 MB/s | 4.9 MB 00:00
Fedora Modular 32 - x86_64 - Updates 5.6 MB/s | 3.7 MB 00:00
Fedora 32 - x86_64 - Updates 9.9 MB/s | 23 MB 00:02
Fedora 32 - x86_64 39 MB/s | 70 MB 00:01
[…]
real 0m32.898s
user 0m25.121s
sys 0m1.408s
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -iA nixpkgs.ack'
unpacking channels...
created 1 symlinks in user environment
installing 'perl5.32.0-ack-3.3.1'
these paths will be fetched (15.55 MiB download, 85.51 MiB unpacked):
/nix/store/34l8jdg76kmwl1nbbq84r2gka0kw6rc8-perl5.32.0-ack-3.3.1-man
/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31
/nix/store/9fd4pjaxpjyyxvvmxy43y392l7yvcwy1-perl5.32.0-File-Next-1.18
/nix/store/czc3c1apx55s37qx4vadqhn3fhikchxi-libunistring-0.9.10
/nix/store/dj6n505iqrk7srn96a27jfp3i0zgwa1l-acl-2.2.53
/nix/store/ifayp0kvijq0n4x0bv51iqrb0yzyz77g-perl-5.32.0
/nix/store/w9wc0d31p4z93cbgxijws03j5s2c4gyf-coreutils-8.31
/nix/store/xim9l8hym4iga6d4azam4m0k0p1nw2rm-libidn2-2.3.0
/nix/store/y7i47qjmf10i1ngpnsavv88zjagypycd-attr-2.4.48
/nix/store/z45mp61h51ksxz28gds5110rf3wmqpdc-perl5.32.0-ack-3.3.1
copying path '/nix/store/34l8jdg76kmwl1nbbq84r2gka0kw6rc8-perl5.32.0-ack-3.3.1-man' from 'https://cache.nixos.org'...
copying path '/nix/store/czc3c1apx55s37qx4vadqhn3fhikchxi-libunistring-0.9.10' from 'https://cache.nixos.org'...
copying path '/nix/store/9fd4pjaxpjyyxvvmxy43y392l7yvcwy1-perl5.32.0-File-Next-1.18' from 'https://cache.nixos.org'...
copying path '/nix/store/xim9l8hym4iga6d4azam4m0k0p1nw2rm-libidn2-2.3.0' from 'https://cache.nixos.org'...
copying path '/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31' from 'https://cache.nixos.org'...
copying path '/nix/store/y7i47qjmf10i1ngpnsavv88zjagypycd-attr-2.4.48' from 'https://cache.nixos.org'...
copying path '/nix/store/dj6n505iqrk7srn96a27jfp3i0zgwa1l-acl-2.2.53' from 'https://cache.nixos.org'...
copying path '/nix/store/w9wc0d31p4z93cbgxijws03j5s2c4gyf-coreutils-8.31' from 'https://cache.nixos.org'...
copying path '/nix/store/ifayp0kvijq0n4x0bv51iqrb0yzyz77g-perl-5.32.0' from 'https://cache.nixos.org'...
copying path '/nix/store/z45mp61h51ksxz28gds5110rf3wmqpdc-perl5.32.0-ack-3.3.1' from 'https://cache.nixos.org'...
building '/nix/store/m0rl62grplq7w7k3zqhlcz2hs99y332l-user-environment.drv'...
created 49 symlinks in user environment
real 0m 5.60s
user 0m 3.21s
sys 0m 1.66s
% docker run -t -i debian:sid
root@1996bb94a2d1:/# time (apt update && apt install -y ack-grep)
Get:1 http://deb.debian.org/debian sid InRelease [146 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 Packages [8400 kB]
Fetched 8546 kB in 1s (8088 kB/s)
[…]
The following NEW packages will be installed:
ack libfile-next-perl libgdbm-compat4 libgdbm6 libperl5.30 netbase perl perl-modules-5.30
0 upgraded, 8 newly installed, 0 to remove and 23 not upgraded.
Need to get 7341 kB of archives.
After this operation, 46.7 MB of additional disk space will be used.
[…]
real 0m9.544s
user 0m2.839s
sys 0m0.775s
% docker run -t -i archlinux/base
[root@9f6672688a64 /]# time (pacman -Sy && pacman -S --noconfirm ack)
:: Synchronizing package databases...
core 130.8 KiB 1090 KiB/s 00:00
extra 1655.8 KiB 3.48 MiB/s 00:00
community 5.2 MiB 6.11 MiB/s 00:01
resolving dependencies...
looking for conflicting packages...
Packages (2) perl-file-next-1.18-2 ack-3.4.0-1
Total Download Size: 0.07 MiB
Total Installed Size: 0.19 MiB
[…]
real 0m2.936s
user 0m0.375s
sys 0m0.160s
% docker run -t -i alpine
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/4) Installing libbz2 (1.0.8-r1)
(2/4) Installing perl (5.30.3-r0)
(3/4) Installing perl-file-next (1.18-r0)
(4/4) Installing ack (3.3.1-r0)
Executing busybox-1.31.1-r16.trigger
OK: 43 MiB in 18 packages
real 0m 1.24s
user 0m 0.40s
sys 0m 0.15s
You can expand each of these:
% docker run -t -i fedora /bin/bash
[root@6a52ecfc3afa /]# time dnf install -y qemu
Fedora 32 openh264 (From Cisco) - x86_64 3.1 kB/s | 2.5 kB 00:00
Fedora Modular 32 - x86_64 6.3 MB/s | 4.9 MB 00:00
Fedora Modular 32 - x86_64 - Updates 6.0 MB/s | 3.7 MB 00:00
Fedora 32 - x86_64 - Updates 334 kB/s | 23 MB 01:10
Fedora 32 - x86_64 33 MB/s | 70 MB 00:02
[…]
Total download size: 181 M
Downloading Packages:
[…]
real 4m37.652s
user 0m38.239s
sys 0m6.321s
% docker run -t -i nixos/nix
83971cf79f7e:/# time sh -c 'nix-channel --update && nix-env -iA nixpkgs.qemu'
unpacking channels...
created 1 symlinks in user environment
installing 'qemu-5.1.0'
these paths will be fetched (180.70 MiB download, 1146.92 MiB unpacked):
[…]
real 0m 33.64s
user 0m 16.96s
sys 0m 3.05s
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86)
Get:1 http://deb.debian.org/debian sid InRelease [146 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 Packages [8400 kB]
Fetched 8546 kB in 1s (5998 kB/s)
[…]
Fetched 216 MB in 43s (5006 kB/s)
[…]
real 1m25.375s
user 0m29.163s
sys 0m12.835s
% docker run -t -i archlinux/base
[root@58c78bda08e8 /]# time (pacman -Sy && pacman -S --noconfirm qemu)
:: Synchronizing package databases...
core 130.8 KiB 1055 KiB/s 00:00
extra 1655.8 KiB 3.70 MiB/s 00:00
community 5.2 MiB 7.89 MiB/s 00:01
[…]
Total Download Size: 135.46 MiB
Total Installed Size: 661.05 MiB
[…]
real 0m43.901s
user 0m4.980s
sys 0m2.615s
% docker run -t -i alpine
/ # time apk add qemu-system-x86_64
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
[…]
OK: 78 MiB in 95 packages
real 0m 2.43s
user 0m 0.46s
sys 0m 0.09s
You can expand each of these:
% docker run -t -i fedora /bin/bash
[root@722e6df10258 /]# time dnf install -y ack
Fedora Modular 30 - x86_64 4.4 MB/s | 2.7 MB 00:00
Fedora Modular 30 - x86_64 - Updates 3.7 MB/s | 2.4 MB 00:00
Fedora 30 - x86_64 - Updates 17 MB/s | 19 MB 00:01
Fedora 30 - x86_64 31 MB/s | 70 MB 00:02
[…]
Install 44 Packages
Total download size: 13 M
Installed size: 42 M
[…]
real 0m29.498s
user 0m22.954s
sys 0m1.085s
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -i perl5.28.2-ack-2.28'
unpacking channels...
created 2 symlinks in user environment
installing 'perl5.28.2-ack-2.28'
these paths will be fetched (14.91 MiB download, 80.83 MiB unpacked):
/nix/store/57iv2vch31v8plcjrk97lcw1zbwb2n9r-perl-5.28.2
/nix/store/89gi8cbp8l5sf0m8pgynp2mh1c6pk1gk-attr-2.4.48
/nix/store/gkrpl3k6s43fkg71n0269yq3p1f0al88-perl5.28.2-ack-2.28-man
/nix/store/iykxb0bmfjmi7s53kfg6pjbfpd8jmza6-glibc-2.27
/nix/store/k8lhqzpaaymshchz8ky3z4653h4kln9d-coreutils-8.31
/nix/store/svgkibi7105pm151prywndsgvmc4qvzs-acl-2.2.53
/nix/store/x4knf14z1p0ci72gl314i7vza93iy7yc-perl5.28.2-File-Next-1.16
/nix/store/zfj7ria2kwqzqj9dh91kj9kwsynxdfk0-perl5.28.2-ack-2.28
copying path '/nix/store/gkrpl3k6s43fkg71n0269yq3p1f0al88-perl5.28.2-ack-2.28-man' from 'https://cache.nixos.org'...
copying path '/nix/store/iykxb0bmfjmi7s53kfg6pjbfpd8jmza6-glibc-2.27' from 'https://cache.nixos.org'...
copying path '/nix/store/x4knf14z1p0ci72gl314i7vza93iy7yc-perl5.28.2-File-Next-1.16' from 'https://cache.nixos.org'...
copying path '/nix/store/89gi8cbp8l5sf0m8pgynp2mh1c6pk1gk-attr-2.4.48' from 'https://cache.nixos.org'...
copying path '/nix/store/svgkibi7105pm151prywndsgvmc4qvzs-acl-2.2.53' from 'https://cache.nixos.org'...
copying path '/nix/store/k8lhqzpaaymshchz8ky3z4653h4kln9d-coreutils-8.31' from 'https://cache.nixos.org'...
copying path '/nix/store/57iv2vch31v8plcjrk97lcw1zbwb2n9r-perl-5.28.2' from 'https://cache.nixos.org'...
copying path '/nix/store/zfj7ria2kwqzqj9dh91kj9kwsynxdfk0-perl5.28.2-ack-2.28' from 'https://cache.nixos.org'...
building '/nix/store/q3243sjg91x1m8ipl0sj5gjzpnbgxrqw-user-environment.drv'...
created 56 symlinks in user environment
real 0m 14.02s
user 0m 8.83s
sys 0m 2.69s
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y ack-grep)
Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [233 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8270 kB]
Fetched 8502 kB in 2s (4764 kB/s)
[…]
The following NEW packages will be installed:
ack ack-grep libfile-next-perl libgdbm-compat4 libgdbm5 libperl5.26 netbase perl perl-modules-5.26
The following packages will be upgraded:
perl-base
1 upgraded, 9 newly installed, 0 to remove and 60 not upgraded.
Need to get 8238 kB of archives.
After this operation, 42.3 MB of additional disk space will be used.
[…]
real 0m9.096s
user 0m2.616s
sys 0m0.441s
% docker run -t -i archlinux/base
[root@9604e4ae2367 /]# time (pacman -Sy && pacman -S --noconfirm ack)
:: Synchronizing package databases...
core 132.2 KiB 1033K/s 00:00
extra 1629.6 KiB 2.95M/s 00:01
community 4.9 MiB 5.75M/s 00:01
[…]
Total Download Size: 0.07 MiB
Total Installed Size: 0.19 MiB
[…]
real 0m3.354s
user 0m0.224s
sys 0m0.049s
% docker run -t -i alpine
/ # time apk add ack
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
(1/4) Installing perl-file-next (1.16-r0)
(2/4) Installing libbz2 (1.0.6-r7)
(3/4) Installing perl (5.28.2-r1)
(4/4) Installing ack (3.0.0-r0)
Executing busybox-1.30.1-r2.trigger
OK: 44 MiB in 18 packages
real 0m 0.96s
user 0m 0.25s
sys 0m 0.07s
You can expand each of these:
% docker run -t -i fedora /bin/bash
[root@722e6df10258 /]# time dnf install -y qemu
Fedora Modular 30 - x86_64 3.1 MB/s | 2.7 MB 00:00
Fedora Modular 30 - x86_64 - Updates 2.7 MB/s | 2.4 MB 00:00
Fedora 30 - x86_64 - Updates 20 MB/s | 19 MB 00:00
Fedora 30 - x86_64 31 MB/s | 70 MB 00:02
[…]
Install 262 Packages
Upgrade 4 Packages
Total download size: 172 M
[…]
real 1m7.877s
user 0m44.237s
sys 0m3.258s
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -i qemu-4.0.0'
unpacking channels...
created 2 symlinks in user environment
installing 'qemu-4.0.0'
these paths will be fetched (262.18 MiB download, 1364.54 MiB unpacked):
[…]
real 0m 38.49s
user 0m 26.52s
sys 0m 4.43s
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86)
Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [149 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8426 kB]
Fetched 8574 kB in 1s (6716 kB/s)
[…]
Fetched 151 MB in 2s (64.6 MB/s)
[…]
real 0m51.583s
user 0m15.671s
sys 0m3.732s
% docker run -t -i archlinux/base
[root@9604e4ae2367 /]# time (pacman -Sy && pacman -S --noconfirm qemu)
:: Synchronizing package databases...
core 132.2 KiB 751K/s 00:00
extra 1629.6 KiB 3.04M/s 00:01
community 4.9 MiB 6.16M/s 00:01
[…]
Total Download Size: 123.20 MiB
Total Installed Size: 587.84 MiB
[…]
real 1m2.475s
user 0m9.272s
sys 0m2.458s
% docker run -t -i alpine
/ # time apk add qemu-system-x86_64
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
[…]
OK: 78 MiB in 95 packages
real 0m 2.43s
user 0m 0.46s
sys 0m 0.09s
postgres
), or generating/updating cache files.
Triggers are a kind of hook which run when other packages are installed. For
example, on Debian, the man(1)
package
comes with a trigger which regenerates the search database index whenever any
package installs a manpage. When, for example, the
nginx(8)
package is installed, a
trigger provided by the man(1)
package
runs.
Over the past few decades, Open Source software has become more and more uniform: instead of each piece of software defining its own rules, a small number of build systems are now widely adopted.
Hence, I think it makes sense to revisit whether offering extension via hooks and triggers is a net win or net loss.
Package managers commonly can make very little assumptions about what hooks do, what preconditions they require, and which conflicts might be caused by running multiple package’s hooks concurrently.
Hence, package managers cannot concurrently install packages. At least the hook/trigger part of the installation needs to happen in sequence.
While it seems technically feasible to retrofit package manager hooks with concurrency primitives such as locks for mutual exclusion between different hook processes, the required overhaul of all hooks¹ seems like such a daunting task that it might be better to just get rid of the hooks instead. Only deleting code frees you from the burden of maintenance, automated testing and debugging.
① In Debian, there are 8620 non-generated maintainer scripts, as reported by
find shard*/src/*/debian -regex ".*\(pre\|post\)\(inst\|rm\)$"
on a Debian
Code Search instance.
Personally, I never use the
apropos(1)
command, so I don’t
appreciate the man(1)
package’s trigger
which updates the database used by
apropos(1)
. The process takes a long
time and, because hooks and triggers must be executed serially (see previous
section), blocks my installation or update.
When I tell people this, they are often surprised to learn about the existance
of the apropos(1)
command. I suggest
adopting an opt-in model.
Hooks run when packages are installed. If a package’s contents are not used between two updates, running the hook in the first update could have been skipped. Running the hook lazily when the package contents are used reduces unnecessary work.
As a welcome side-effect, lazy hook evaluation automatically makes the hook work in operating system images, such as live USB thumb drives or SD card images for the Raspberry Pi. Such images must not ship the same crypto keys (e.g. OpenSSH host keys) to all machines, but instead generate a different key on each machine.
Why do users keep packages installed they don’t use? It’s extra work to remember and clean up those packages after use. Plus, users might not realize or value that having fewer packages installed has benefits such as faster updates.
I can also imagine that there are people for whom the cost of re-installing packages incentivizes them to just keep packages installed—you never know when you might need the program again…
While working on hermetic packages (more on that in another blog post), where
the contained programs are started with modified environment variables
(e.g. PATH
) via a wrapper bash script, I noticed that the overhead of those
wrapper bash scripts quickly becomes significant. For example, when using the
excellent magit interface for Git in Emacs, I encountered
second-long delays² when using hermetic packages compared to standard
packages. Re-implementing wrappers in a compiled language provided a significant
speed-up.
Similarly, getting rid of an extension point which mandates using shell scripts allows us to build an efficient and fast implementation of a predefined set of primitives, where you can reason about their effects and interactions.
② magit needs to run git a few times for displaying the full status, so small overhead quickly adds up.
Hooks are an escape hatch for distribution maintainers to express anything which their packaging system cannot express.
Distributions should only rely on well-established interfaces such as autoconf’s
classic ./configure && make && make install
(including commonly used flags) to
build a distribution package. Integrating upstream software into a distribution
should not require custom hooks. For example, instead of requiring a hook which
updates a cache of schema files, the library used to interact with those files
should transparently (re-)generate the cache or fall back to a slower code path.
Distribution maintainers are hard to come by, so we should value their time. In particular, there is a 1:n relationship of packages to distribution package maintainers (software is typically available in multiple Linux distributions), so it makes sense to spend the work in the 1 and have the n benefit.
If we want to get rid of hooks, we need another mechanism to achieve what we currently achieve with hooks.
If the hook is not specific to the package, it can be moved to the package
manager. The desired system state should either be derived from the package
contents (e.g. required system users can be discovered from systemd service
files) or declaratively specified in the package build instructions—more on that
in another blog post. This turns hooks (arbitrary code) into configuration,
which allows the package manager to collapse and sequence the required state
changes. E.g., when 5 packages are installed which each need a new system user,
the package manager could update /etc/passwd
just once.
If the hook is specific to the package, it should be moved into the package
contents. This typically means moving the functionality into the program start
(or the systemd service file if we are talking about a daemon). If (while?)
upstream is not convinced, you can either wrap the program or patch it. Note
that this case is relatively rare: I have worked with hundreds of packages and
the only package-specific functionality I came across was automatically
generating host keys before starting OpenSSH’s
sshd(8)
³.
There is one exception where moving the hook doesn’t work: packages which modify state outside of the system, such as bootloaders or kernel images.
③ Even that can be moved out of a package-specific hook, as Fedora demonstrates.
Global state modifications performed as part of package installation today use hooks, an overly expressive extension mechanism.
Instead, all modifications should be driven by configuration. This is feasible because there are only a few different kinds of desired state modifications. This makes it possible for package managers to optimize package installation.
]]>When building software from source, most programming languages and build systems support conditional compilation: different parts of the source code are compiled based on certain conditions.
An optional dependency is conditional compilation hooked up directly to a knob (e.g. command line flag, configuration file, …), with the effect that the software can now be built without an otherwise required dependency.
Let’s walk through a few issues with optional dependencies.
Software is usually not built by end users, but by packagers, at least when we are talking about Open Source.
Hence, end users don’t see the knob for the optional dependency, they are just presented with the fait accompli: their version of the software behaves differently than other versions of the same software.
Depending on the kind of software, this situation can be made obvious to the user: for example, if the optional dependency is needed to print documents, the program can produce an appropriate error message when the user tries to print a document.
Sometimes, this isn’t possible: when i3 introduced an optional dependency on cairo and pangocairo, the behavior itself (rendering window titles) worked in all configurations, but non-ASCII characters might break depending on whether i3 was compiled with cairo.
For users, it is frustrating to only discover in conversation that a program has a feature that the user is interested in, but it’s not available on their computer. For support, this situation can be hard to detect, and even harder to resolve to the user’s satisfaction.
Unfortunately, many build systems don’t stop the build when optional dependencies are not present. Instead, you sometimes end up with a broken build, or, even worse: with a successful build that does not work correctly at runtime.
This means that packagers need to closely examine the build output to know which
dependencies to make available. In the best case, there is a summary of
available and enabled options, clearly outlining what this build will
contain. In the worst case, you need to infer the features from the checks that
are done, or work your way through the --help
output.
The better alternative is to configure your build system such that it stops when any dependency was not found, and thereby have packagers acknowledge each optional dependency by explicitly disabling the option.
Code paths which are not used will inevitably bit rot. If you have optional dependencies, you need to test both the code path without the dependency and the code path with the dependency. It doesn’t matter whether the tests are automated or manual, the test matrix must cover both paths.
Interestingly enough, this principle seems to apply to all kinds of software projects (but it slows down as change slows down): one might think that important Open Source building blocks should have enough users to cover all sorts of configurations.
However, consider this example: building cairo without libxrender results in all GTK application windows, menus, etc. being displayed as empty grey surfaces. Cairo does not fail to build without libxrender, but the code path clearly is broken without libxrender.
I’m not saying optional dependencies should never be used. In fact, for
bootstrapping, disabling dependencies can save a lot of work and can sometimes
allow breaking circular dependencies. For example, in an early bootstrapping
stage, binutils can be compiled with --disable-nls
to disable
internationalization.
However, optional dependencies are broken so often that I conclude they are overused. Read on and see for yourself whether you would rather commit to best practices or not introduce an optional dependency.
If you do decide to make dependencies optional, please:
--disable
flag.--version
.Debian has been in my life for well over 10 years at this point.
A few weeks ago, I have visited some old friends at the Zürich Debian meetup after a multi-year period of absence. On my bike ride home, it occurred to me that the topics of our discussions had remarkable overlap with my last visit. We had a discussion about the merits of systemd, which took a detour to respect in open source communities, returned to processes in Debian and eventually culminated in democracies and their theoretical/practical failings. Admittedly, that last one might be a Swiss thing.
I say this not to knock on the Debian meetup, but because it prompted me to reflect on what feelings Debian is invoking lately and whether it’s still a good fit for me.
So I’m finally making a decision that I should have made a long time ago: I am winding down my involvement in Debian to a minimum.
Over the coming weeks, I will:
Uploaders
field on packages with other maintainersI will try to keep up best-effort maintenance of the manpages.debian.org service and the codesearch.debian.net service, but any help would be much appreciated.
For all intents and purposes, please treat me as permanently on vacation. I will try to be around for administrative issues (e.g. permission transfers) and questions addressed directly to me, permitted they are easy enough to answer.
When I joined Debian, I was still studying, i.e. I had luxurious amounts of spare time. Now, over 5 years of full time work later, my day job taught me a lot, both about what works in large software engineering projects and how I personally like my computer systems. I am very conscious of how I spend the little spare time that I have these days.
The following sections each deal with what I consider a major pain point, in no particular order. Some of them influence each other—for example, if changes worked better, we could have a chance at transitioning packages to be more easily machine readable.
The last few years, my current team at work conducted various smaller and larger refactorings across the entire code base (touching thousands of projects), so we have learnt a lot of valuable lessons about how to effectively do these changes. It irks me that Debian works almost the opposite way in every regard. I appreciate that every organization is different, but I think a lot of my points do actually apply to Debian.
In Debian, packages are nudged in the right direction by a document called the Debian Policy, or its programmatic embodiment, lintian.
While it is great to have a lint tool (for quick, local/offline feedback), it is even better to not require a lint tool at all. The team conducting the change (e.g. the C++ team introduces a new hardening flag for all packages) should be able to do their work transparent to me.
Instead, currently, all packages become lint-unclean, all maintainers need to read up on what the new thing is, how it might break, whether/how it affects them, manually run some tests, and finally decide to opt in. This causes a lot of overhead and manually executed mechanical changes across packages.
Notably, the cost of each change is distributed onto the package maintainers in the Debian model. At work, we have found that the opposite works better: if the team behind the change is put in power to do the change for as many users as possible, they can be significantly more efficient at it, which reduces the total cost and time a lot. Of course, exceptions (e.g. a large project abusing a language feature) should still be taken care of by the respective owners, but the important bit is that the default should be the other way around.
Debian is lacking tooling for large changes: it is hard to programmatically deal with packages and repositories (see the section below). The closest to “sending out a change for review” is to open a bug report with an attached patch. I thought the workflow for accepting a change from a bug report was too complicated and started mergebot, but only Guido ever signaled interest in the project.
Culturally, reviews and reactions are slow. There are no deadlines. I literally sometimes get emails notifying me that a patch I sent out a few years ago (!!) is now merged. This turns projects from a small number of weeks into many years, which is a huge demotivator for me.
Interestingly enough, you can see artifacts of the slow online activity manifest itself in the offline culture as well: I don’t want to be discussing systemd’s merits 10 years after I first heard about it.
Lastly, changes can easily be slowed down significantly by holdouts who refuse to collaborate. My canonical example for this is rsync, whose maintainer refused my patches to make the package use debhelper purely out of personal preference.
Granting so much personal freedom to individual maintainers prevents us as a project from raising the abstraction level for building Debian packages, which in turn makes tooling harder.
How would things look like in a better world?
To learn more about how successful large changes can look like, I recommend my colleague Hyrum Wright’s talk “Large-Scale Changes at Google: Lessons Learned From 5 Yrs of Mass Migrations”.
Debian generally seems to prefer decentralized approaches over centralized ones. For example, individual packages are maintained in separate repositories (as opposed to in one repository), each repository can use any SCM (git and svn are common ones) or no SCM at all, and each repository can be hosted on a different site. Of course, what you do in such a repository also varies subtly from team to team, and even within teams.
In practice, non-standard hosting options are used rarely enough to not justify their cost, but frequently enough to be a huge pain when trying to automate changes to packages. Instead of using GitLab’s API to create a merge request, you have to design an entirely different, more complex system, which deals with intermittently (or permanently!) unreachable repositories and abstracts away differences in patch delivery (bug reports, merge requests, pull requests, email, …).
Wildly diverging workflows is not just a temporary problem either. I participated in long discussions about different git workflows during DebConf 13, and gather that there were similar discussions in the meantime.
Personally, I cannot keep enough details of the different workflows in my head. Every time I touch a package that works differently than mine, it frustrates me immensely to re-learn aspects of my day-to-day.
After noticing workflow fragmentation in the Go packaging team (which I started), I tried fixing this with the workflow changes proposal, but did not succeed in implementing it. The lack of effective automation and slow pace of changes in the surrounding tooling despite my willingness to contribute time and energy killed any motivation I had.
When you want to make a package available in Debian, you upload GPG-signed files
via anonymous FTP. There are several batch jobs (the queue daemon, unchecked
,
dinstall
, possibly others) which run on fixed schedules (e.g. dinstall
runs
at 01:52 UTC, 07:52 UTC, 13:52 UTC and 19:52 UTC).
Depending on timing, I estimated that you might wait for over 7 hours (!!) before your package is actually installable.
What’s worse for me is that feedback to your upload is asynchronous. I like to do one thing, be done with it, move to the next thing. The current setup requires a many-minute wait and costly task switch for no good technical reason. You might think a few minutes aren’t a big deal, but when all the time I can spend on Debian per day is measured in minutes, this makes a huge difference in perceived productivity and fun.
The last communication I can find about speeding up this process is ganneff’s post from 2008.
How would things look like in a better world?
I dread interacting with the Debian bug tracker. debbugs is a piece of software (from 1994) which is only used by Debian and the GNU project these days.
Debbugs processes emails, which is to say it is asynchronous and cumbersome to deal with. Despite running on the fastest machines we have available in Debian (or so I was told when the subject last came up), its web interface loads very slowly.
Notably, the web interface at bugs.debian.org is read-only. Setting up a working
email setup for
reportbug(1)
or manually dealing with attachments is a rather big hurdle.
For reasons I don’t understand, every interaction with debbugs results in many different email threads.
Aside from the technical implementation, I also can never remember the different ways that Debian uses pseudo-packages for bugs and processes. I need them rarely enough to establish a mental model of how they are set up, or working memory of how they are used, but frequently enough to be annoyed by this.
How would things look like in a better world?
It baffles me that in 2019, we still don’t have a conveniently browsable threaded archive of mailing list discussions. Email and threading is more widely used in Debian than anywhere else, so this is somewhat ironic. Gmane used to paper over this issue, but Gmane’s availability over the last few years has been spotty, to say the least (it is down as I write this).
I tried to contribute a threaded list archive, but our listmasters didn’t seem to care or want to support the project.
While it is obviously possible to deal with Debian packages programmatically, the experience is far from pleasant. Everything seems slow and cumbersome. I have picked just 3 quick examples to illustrate my point.
debiman needs help from
piuparts in analyzing the
alternatives mechanism of each package to display the manpages of
e.g. psql(1)
. This
is because maintainer scripts modify the alternatives database by calling shell
scripts. Without actually installing a package, you cannot know which changes it
does to the alternatives database.
pk4 needs to maintain its own cache to look up package metadata based on the package name. Other tools parse the apt database from scratch on every invocation. A proper database format, or at least a binary interchange format, would go a long way.
Debian Code Search wants to ingest new packages as quickly as possible. There used to be a fedmsg instance for Debian, but it no longer seems to exist. It is unclear where to get notifications from for new packages, and where best to fetch those packages.
See my “Debian package build tools” post. It really bugs me that the sprawl of tools is not seen as a problem by others.
Most of the points discussed so far deal with the experience in developing Debian, but as I recently described in my post “Debugging experience in Debian”, the experience when developing using Debian leaves a lot to be desired, too.
At this point, the article is getting pretty long, and hopefully you got a rough idea of my motivation.
While I described a number of specific shortcomings above, the final nail in the coffin is actually the lack of a positive outlook. I have more ideas that seem really compelling to me, but, based on how my previous projects have been going, I don’t think I can make any of these ideas happen within the Debian project.
I intend to publish a few more posts about specific ideas for improving operating systems here. Stay tuned.
Lastly, I hope this post inspires someone, ideally a group of people, to improve the developer experience within Debian.
]]>I copied the latest Raspberry Pi Debian image onto an SD card, booted it, and was able to reproduce the issue.
Conceptually, at this point, I should be able to install and start gdb
, set a
break point and step through the code.
Debian, by default, strips debug symbols when building packages to conserve disk space and network bandwidth. The motivation is very reasonable: most users will never need the debug symbols.
Unfortunately, obtaining debug symbols when you do need them is unreasonably hard.
We begin by configuring an additional apt repository which contains automatically generated debug packages:
raspi# cat >>/etc/apt/sources.list.d/debug.list <<'EOT'
deb http://deb.debian.org/debian-debug buster-debug main contrib non-free
EOT
raspi# apt update
Notably, not all Debian packages have debug packages. As the DebugPackage
Debian Wiki page explains,
debhelper/9.20151219
started generating debug packages (ending in -dbgsym
)
automatically. Packages which have not been updated might come with their own
debug packages (ending in -dbg
) or might not preserve debug symbols at all!
Now that we can install debug packages, how do we know which ones we need?
For debugging i3, we obviously need at least the i3-dbgsym
package, but i3
uses a number of other libraries through whose code we may need to step.
The debian-goodies
package ships a tool called
find-dbgsym-packages
which prints the required packages to debug an executable, core dump or running
process:
raspi# apt install debian-goodies
raspi# apt install $(find-dbgsym-packages $(which i3))
Now we should have symbol names and line number information available in
gdb
. But for effectively stepping through the program, access to the source
code is required.
Naively, one would assume that apt source
should be sufficient for obtaining
the source code of any Debian package. However, apt source
defaults to the
package candidate version, not the version you have installed on your
system.
I have addressed this issue with the
pk4
tool, which
defaults to the installed version.
Before we can extract any sources, we need to configure yet another apt repository:
raspi# cat >>/etc/apt/sources.list.d/source.list <<'EOT'
deb-src http://deb.debian.org/debian buster main contrib non-free
EOT
raspi# apt update
Regardless of whether you use apt source
or pk4
, one remaining problem is
the directory mismatch: the debug symbols contain a certain path, and that path
is typically not where you extracted your sources to. While debugging, you will
need to tell gdb
about the location of the sources. This is tricky when you
debug a call across different source packages:
(gdb) pwd
Working directory /usr/src/i3.
(gdb) list main
229 * the main loop. */
230 ev_unref(main_loop);
231 }
232 }
233
234 int main(int argc, char *argv[]) {
235 /* Keep a symbol pointing to the I3_VERSION string constant so that
236 * we have it in gdb backtraces. */
237 static const char *_i3_version __attribute__((used)) = I3_VERSION;
238 char *override_configpath = NULL;
(gdb) list xcb_connect
484 ../../src/xcb_util.c: No such file or directory.
See Specifying Source
Directories in the
gdb manual for the dir
command which allows you to add multiple directories to
the source path. This is pretty tedious, though, and does not work for all
programs.
While Fedora conceptually shares all the same steps, the experience on Fedora is
so much better: when you run gdb /usr/bin/i3
, it will tell you what the next
step is:
# gdb /usr/bin/i3
[…]
Reading symbols from /usr/bin/i3...(no debugging symbols found)...done.
Missing separate debuginfos, use: dnf debuginfo-install i3-4.16-1.fc28.x86_64
Watch what happens when we run the suggested command:
# dnf debuginfo-install i3-4.16-1.fc28.x86_64
enabling updates-debuginfo repository
enabling fedora-debuginfo repository
[…]
Installed:
i3-debuginfo.x86_64 4.16-1.fc28
i3-debugsource.x86_64 4.16-1.fc28
Complete!
A single command understood our intent, enabled the required repositories and
installed the required packages, both for debug symbols and source code (stored
in e.g. /usr/src/debug/i3-4.16-1.fc28.x86_64
). Unfortunately, gdb
doesn’t
seem to locate the sources, which seems like a bug to me.
One downside of Fedora’s approach is that gdb
will only print all required
dependencies once you actually run the program, so you may need to run multiple
dnf
commands.
Ideally, none of the manual steps described above would be necessary. It seems
absurd to me that so much knowledge is required to efficiently debug programs in
Debian. Case in point: I only learnt about find-dbgsym-packages
a few days ago
when talking to one of its contributors.
Installing gdb
should be all that a user needs to do. Debug symbols and
sources can be transparently provided through a lazy-loading FUSE file
system. If our build/packaging infrastructure assured predictable paths and
automated debug symbol extraction, we could have transparent, quick and reliable
debugging of all programs within Debian.
NixOS’s dwarffs is an implementation of this idea: https://github.com/edolstra/dwarffs
While I agree with the removal of debug symbols as a general optimization, I think every Linux distribution should strive to provide an entirely transparent debugging experience: you should not even have to know that debug symbols are not present by default. Debian really falls short in this regard.
Getting Debian to a fully transparent debugging experience requires a lot of technical work and a lot of social convincing. In my experience, programmatically working with the Debian archive and packages is tricky, and ensuring that all packages in a Debian release have debug packages (let alone predictable paths) seems entirely unachievable due to the fragmentation of packaging infrastructure and holdouts blocking any progress.
My go-to example is rsync’s
debian/rules, which
intentionally (!) still has not adopted debhelper. It is not a surprise that
there are no debug symbols for rsync
in Debian.
I have recently been looking into speeding up Debian Code Search. As a quick reminder, search engines answer queries by consulting an inverted index: a map from term to documents containing that term (called a “posting list”). See the Debian Code Search Bachelor Thesis (PDF) for a lot more details.
Currently, Debian Code Search does not store positional information in its index, i.e. the index can only reveal that a certain trigram is present in a document, not where or how often.
From analyzing Debian Code Search queries, I knew that identifier queries (70%) massively outnumber regular expression queries (30%). When processing identifier queries, storing positional information in the index enables a significant optimization: instead of identifying the possibly-matching documents and having to read them all, we can determine matches from querying the index alone, no document reads required.
This moves the bottleneck: having to read all possibly-matching documents requires a lot of expensive random I/O, whereas having to decode long posting lists requires a lot of cheap sequential I/O.
Of course, storing positions comes with a downside: the index is larger, and a larger index takes more time to decode when querying.
Hence, I have been looking at various posting list compression/decoding techniques, to figure out whether we could switch to a technique which would retain (or improve upon!) current performance despite much longer posting lists and produce a small enough index to fit on our current hardware.
I started looking into this space because of Daniel Lemire’s Stream VByte post. As usual, Daniel’s work is well presented, easily digestible and accompanied by not just one, but multiple implementations.
I also looked for scientific papers to learn about the state of the art and classes of different approaches in general. The best I could find is Compression, SIMD, and Postings Lists. If you don’t have access to the paper, I hear that Sci-Hub is helpful.
The paper is from 2014, and doesn’t include all algorithms. If you know of a better paper, please let me know and I’ll include it here.
Eventually, I stumbled upon an algorithm/implementation called TurboPFor, which the rest of the article tries to shine some light on.
If you’re wondering: PFor stands for Patched Frame Of Reference and describes a family of algorithms. The principle is explained e.g. in SIMD Compression and the Intersection of Sorted Integers (PDF).
The TurboPFor project’s README file claims that TurboPFor256 compresses with a rate of 5.04 bits per integer, and can decode with 9400 MB/s on a single thread of an Intel i7-6700 CPU.
For Debian Code Search, we use unsigned integers of 32 bit (uint32), which TurboPFor will compress into as few bits as required.
Dividing Debian Code Search’s file sizes by the total number of integers, I get similar values, at least for the docid index section:
I can confirm the order of magnitude of the decoding speed, too. My benchmark calls TurboPFor from Go via cgo, which introduces some overhead. To exclude disk speed as a factor, data comes from the page cache. The benchmark sequentially decodes all posting lists in the specified index, using as many threads as the machine has cores¹:
I think the numbers differ because the position index section contains larger integers (requiring more bits). I repeated both benchmarks, capped to 1 GiB, and decoding speeds still differed, so it is not just the size of the index.
Compared to Streaming VByte, a TurboPFor256 index comes in at just over half the size, while still reaching 83% of Streaming VByte’s decoding speed. This seems like a good trade-off for my use-case, so I decided to have a closer look at how TurboPFor works.
① See cmd/gp4-verify/verify.go run on an Intel i9-9900K.
To confirm my understanding of the details of the format, I implemented a pure-Go TurboPFor256 decoder. Note that it is intentionally not optimized as its main goal is to use simple code to teach the TurboPFor256 on-disk format.
If you’re looking to use TurboPFor from Go, I recommend using cgo. cgo’s function call overhead is about 51ns as of Go 1.8, which will easily be offset by TurboPFor’s carefully optimized, vectorized (SSE/AVX) code.
With that caveat out of the way, you can find my teaching implementation at https://github.com/stapelberg/goturbopfor
I verified that it produces the same results as TurboPFor’s p4ndec256v32
function for all posting lists in the Debian Code Search index.
Note that TurboPFor does not fully define an on-disk format on its own. When encoding, it turns a list of integers into a byte stream:
size_t p4nenc256v32(uint32_t *in, size_t n, unsigned char *out);
When decoding, it decodes the byte stream into an array of integers, but needs to know the number of integers in advance:
size_t p4ndec256v32(unsigned char *in, size_t n, uint32_t *out);
Hence, you’ll need to keep track of the number of integers and length of the generated byte streams separately. When I talk about on-disk format, I’m referring to the byte stream which TurboPFor returns.
The TurboPFor256 format uses blocks of 256 integers each, followed by a trailing block — if required — which can contain fewer than 256 integers:
SIMD bitpacking is used for all blocks but the trailing block (which uses regular bitpacking). This is not merely an implementation detail for decoding: the on-disk structure is different for blocks which can be SIMD-decoded.
Each block starts with a 2 bit header, specifying the type of the block:
Each block type is explained in more detail in the following sections.
Note that none of the block types store the number of elements: you will always need to know how many integers you need to decode. Also, you need to know in advance how many bytes you need to feed to TurboPFor, so you will need some sort of container format.
Further, TurboPFor automatically choses the best block type for each block.
A constant block (all integers of the block have the same value) consists of a
single value of a specified bit width ≤ 32. This value will be stored in each
output element for the block. E.g., after calling decode(input, 3, output)
with input
being the constant block depicted below, output is {0xB8912636, 0xB8912636, 0xB8912636}
.
The example shows the maximum number of bytes (5). Smaller integers will use fewer bytes: e.g. an integer which can be represented in 3 bits will only use 2 bytes.
A bitpacking block specifies a bit width ≤ 32, followed by a stream of
bits. Each value starts at the Least Significant Bit (LSB), i.e. the 3-bit
values 0 (000b
) and 5 (101b
) are encoded as 101000b
.
The constant and bitpacking block types work well for integers which don’t exceed a certain width, e.g. for a series of integers of width ≤ 5 bits.
For a series of integers where only a few values exceed an otherwise common width (say, two values require 7 bits, the rest requires 5 bits), it makes sense to cut the integers into two parts: value and exception.
In the example below, decoding the third integer out2
(000b
) requires
combination with exception ex0
(10110b
), resulting in 10110000b
.
The number of exceptions can be determined by summing the 1 bits in the bitmap using the popcount instruction.
When the exceptions are not uniform enough, it makes sense to switch from bitpacking to a variable byte encoding:
The variable byte encoding used by the TurboPFor format is similar to the one used by SQLite, which is described, alongside other common variable byte encodings, at github.com/stoklund/varint.
Instead of using individual bits for dispatching, this format classifies the
first byte (b[0]
) into ranges:
b[0]
b[0]
(6 high bits) and b[1]
(8 low bits)b[0]
(3 high bits), b[1]
and b[2]
(16 low bits)b[1]
, b[2]
, b[3]
and possibly b[4]
Here is the space usage of different values:
An overflow marker will be used to signal that encoding the values would be less space-efficient than simply copying them (e.g. if all values require 5 bytes).
This format is very space-efficient: it packs 0-176 into a single byte, as opposed to 0-128 (most others). At the same time, it can be decoded very quickly, as only the first byte needs to be compared to decode a value (similar to PrefixVarint).
In regular (non-SIMD) bitpacking, integers are stored on disk one after the other, padded to a full byte, as a byte is the smallest addressable unit when reading data from disk. For example, if you bitpack only one 3 bit int, you will end up with 5 bits of padding.
SIMD bitpacking works like regular bitpacking, but processes 8 uint32 little-endian values at the same time, leveraging the AVX instruction set. The following illustration shows the order in which 3-bit integers are decoded from disk:
For a Debian Code Search index, 85% of posting lists are short enough to only consist of a trailing block, i.e. no SIMD instructions can be used for decoding.
The distribution of block types looks as follows:
Constant blocks are mostly used for posting lists with just one entry.
The TurboPFor on-disk format is very flexible: with its 4 different kinds of blocks, chances are high that a very efficient encoding will be used for most integer series.
Of course, the flip side of covering so many cases is complexity: the format and implementation take quite a bit of time to understand — hopefully this article helps a little! For environments where the C TurboPFor implementation cannot be used, smaller algorithms might be simpler to implement.
That said, if you can use the TurboPFor implementation, you will benefit from a highly optimized SIMD code base, which will most likely be an improvement over what you’re currently using.
]]>(Cross-posting this message I sent to pkg-raspi-maintainers for broader visibility.)
I started building Raspberry Pi images because I thought there should be an easy, official way to install Debian on the Raspberry Pi.
I still believe that, but I’m not actually using Debian on any of my Raspberry Pis anymore¹, so my personal motivation to do any work on the images is gone.
On top of that, I realize that my commitments exceed my spare time capacity, so I need to get rid of responsibilities.
Therefore, I’m looking for someone to take up maintainership of the Raspberry Pi images. Numerous people have reached out to me with thank you notes and questions, so I think the user interest is there. Also, I’ll be happy to answer any questions that you might have and that I can easily answer. Please reply here (or in private) if you’re interested.
If I can’t find someone within the next 7 days, I’ll put up an announcement message in the raspi3-image-spec README, wiki page, and my blog posts, stating that the image is unmaintained and looking for a new maintainer.
Thanks for your understanding,
① just in case you’re curious, I’m now running cross-compiled Go programs directly under a Linux kernel and minimal userland, see https://gokrazy.org/
]]>To reduce hurdles from using/contributing to Debian, I wanted to make sbuild easier to set up.
sbuild ≥ 0.74.0 provides a Debian package called sbuild-debian-developer-setup. Once installed, run the sbuild-debian-developer-setup(1) command to create a chroot suitable for building packages for Debian unstable.
On a system without any sbuild/schroot bits installed, a transcript of the full setup looks like this:
% sudo apt install -t unstable sbuild-debian-developer-setup Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: libsbuild-perl sbuild schroot Suggested packages: deborphan btrfs-tools aufs-tools | unionfs-fuse qemu-user-static Recommended packages: exim4 | mail-transport-agent autopkgtest The following NEW packages will be installed: libsbuild-perl sbuild sbuild-debian-developer-setup schroot 0 upgraded, 4 newly installed, 0 to remove and 1454 not upgraded. Need to get 1.106 kB of archives. After this operation, 3.556 kB of additional disk space will be used. Do you want to continue? [Y/n] Get:1 http://localhost:3142/deb.debian.org/debian unstable/main amd64 libsbuild-perl all 0.74.0-1 [129 kB] Get:2 http://localhost:3142/deb.debian.org/debian unstable/main amd64 sbuild all 0.74.0-1 [142 kB] Get:3 http://localhost:3142/deb.debian.org/debian testing/main amd64 schroot amd64 1.6.10-4 [772 kB] Get:4 http://localhost:3142/deb.debian.org/debian unstable/main amd64 sbuild-debian-developer-setup all 0.74.0-1 [62,6 kB] Fetched 1.106 kB in 0s (5.036 kB/s) Selecting previously unselected package libsbuild-perl. (Reading database ... 276684 files and directories currently installed.) Preparing to unpack .../libsbuild-perl_0.74.0-1_all.deb ... Unpacking libsbuild-perl (0.74.0-1) ... Selecting previously unselected package sbuild. Preparing to unpack .../sbuild_0.74.0-1_all.deb ... Unpacking sbuild (0.74.0-1) ... Selecting previously unselected package schroot. Preparing to unpack .../schroot_1.6.10-4_amd64.deb ... Unpacking schroot (1.6.10-4) ... Selecting previously unselected package sbuild-debian-developer-setup. Preparing to unpack .../sbuild-debian-developer-setup_0.74.0-1_all.deb ... Unpacking sbuild-debian-developer-setup (0.74.0-1) ... Processing triggers for systemd (236-1) ... Setting up schroot (1.6.10-4) ... Created symlink /etc/systemd/system/multi-user.target.wants/schroot.service → /lib/systemd/system/schroot.service. Setting up libsbuild-perl (0.74.0-1) ... Processing triggers for man-db (2.7.6.1-2) ... Setting up sbuild (0.74.0-1) ... Setting up sbuild-debian-developer-setup (0.74.0-1) ... Processing triggers for systemd (236-1) ... % sudo sbuild-debian-developer-setup The user `michael' is already a member of `sbuild'. I: SUITE: unstable I: TARGET: /srv/chroot/unstable-amd64-sbuild I: MIRROR: http://localhost:3142/deb.debian.org/debian I: Running debootstrap --arch=amd64 --variant=buildd --verbose --include=fakeroot,build-essential,eatmydata --components=main --resolve-deps unstable /srv/chroot/unstable-amd64-sbuild http://localhost:3142/deb.debian.org/debian I: Retrieving InRelease I: Checking Release signature I: Valid Release signature (key id 126C0D24BD8A2942CC7DF8AC7638D0442B90D010) I: Retrieving Packages I: Validating Packages I: Found packages in base already in required: apt I: Resolving dependencies of required packages... […] I: Successfully set up unstable chroot. I: Run "sbuild-adduser" to add new sbuild users. ln -s /usr/share/doc/sbuild/examples/sbuild-update-all /etc/cron.daily/sbuild-debian-developer-setup-update-all Now run `newgrp sbuild', or log out and log in again. % newgrp sbuild % sbuild -d unstable hello sbuild (Debian sbuild) 0.74.0 (14 Mar 2018) on x1 +==============================================================================+ | hello (amd64) Mon, 19 Mar 2018 07:46:14 +0000 | +==============================================================================+ Package: hello Distribution: unstable Machine Architecture: amd64 Host Architecture: amd64 Build Architecture: amd64 Build Type: binary […]
I hope you’ll find this useful.
]]>
With these changes, after building a package, you just need to
type dput
(in the correct directory of course) to sign and upload
it.