Table of contents
Over the last year or so I have worked on a research linux distribution in my spare time. It’s not a distribution for researchers (like Scientific Linux), but my personal playground project to research linux distribution development, i.e. try out fresh ideas.
This article focuses on the package format and its advantages, but there is more to distri, which I will cover in upcoming blog posts.
I was a Debian Developer for the 7 years from 2012 to 2019, but using the distribution often left me frustrated, ultimately resulting in me winding down my Debian work.
Frequently, I was noticing a large gap between the actual speed of an operation (e.g. doing an update) and the possible speed based on back of the envelope calculations. I wrote more about this in my blog post “Package managers are slow”.
To me, this observation means that either there is potential to optimize the
package manager itself (e.g.
apt), or what the system does is just too
complex. While I remember seeing some low-hanging fruit¹, through my work on
distri, I wanted to explore whether all the complexity we currently have in
Linux distributions such as Debian or Fedora is inherent to the problem space.
I have completed enough of the experiment to conclude that the complexity is not inherent: I can build a Linux distribution for general-enough purposes which is much less complex than existing ones.
① Those were low-hanging fruit from a user perspective. I’m not saying that
fixing them is easy in the technical sense; I know too little about
base to make such a statement.
Key idea: packages are images, not archives
One key idea is to switch from using archives to using images for package
contents. Common package managers such as
archives with various compression
distri uses SquashFS images, a comparatively simple file system image format that I happen to be familiar with from my work on the gokrazy Raspberry Pi 3 Go platform.
This idea is not novel: AppImage and snappy also use images, but only for individual, self-contained applications. distri however uses images for distribution packages with dependencies. In particular, there is no duplication of shared libraries in distri.
A nice side effect of using read-only image files is that applications are immutable and can hence not be broken by accidental (or malicious!) modification.
Key idea: separate hierarchies
Package contents are made available under a fully-qualified path. E.g., all
files provided by package
zsh-amd64-5.6.2-3 are available under
/ro/zsh-amd64-5.6.2-3. The mountpoint
/ro stands for read-only, which is
short yet descriptive.
Perhaps surprisingly, building software with custom
prefix values of
/ro/zsh-amd64-5.6.2-3 is widely supported, thanks to:
Linux distributions, which build software with
/usr, whereas FreeBSD (and the autotools default), which build with
Enthusiast users in corporate or research environments, who install software into their home directories.
Because using a custom
prefix is a common scenario, upstream awareness for
prefix-correctness is generally high, and the rarely required patch will be
Key idea: exchange directories
Software packages often exchange data by placing or locating files in well-known directories. Here are just a few examples:
zsh(1)locates executable programs via
PATHcomponents such as
In distri, these locations are called exchange directories and are provided
via FUSE in
Exchange directories come in two different flavors:
global. The exchange directory, e.g.
/ro/share, provides the union of the
sharesub directory of all packages in the package store.
Global exchange directories are largely used for compatibility, see below.
per-package. Useful for tight coupling: e.g.
irssi(1)does not provide any ABI guarantees, so plugins such as
irssi-robustirccan declare that they want e.g.
/ro/irssi-amd64-1.1.1-1/out/lib/irssi/modulesto be a per-package exchange directory and contain files from their
Search paths sometimes need to be fixed
Programs which use exchange directories sometimes use search paths to access
multiple exchange directories. In fact, the examples above were taken from
PATH. These are
prominent ones, but more examples are easy to find:
loads completion functions from its
Some search path values are derived from
--datadir=/ro/share and require no
further attention, but others might derive from
--prefix=/ro/zsh-amd64-5.6.2-3/out and need to be pointed to an exchange
directory via a specific command line flag.
Global exchange directories are used to make distri provide enough of the Filesystem Hierarchy Standard (FHS) that third-party software largely just works. This includes a C development environment.
I successfully ran a few programs from their binary packages such as Google Chrome, Spotify, or Microsoft’s Visual Studio Code.
Fast package manager
I previously wrote about how Linux distribution package managers are too slow.
distri’s package manager is extremely fast. Its main bottleneck is typically the network link, even at high speed links (I tested with a 100 Gbps link).
Its speed comes largely from an architecture which allows the package manager to do less work. Specifically:
Package images can be added atomically to the package store, so we can safely skip
fsync(2). Corruption will be cleaned up automatically, and durability is not important: if an interactive installation is interrupted, the user can just repeat it, as it will be fresh on their mind.
Because all packages are co-installable thanks to separate hierarchies, there are no conflicts at the package store level, and no dependency resolution (an optimization problem requiring SAT solving) is required at all.
In exchange directories, we resolve conflicts by selecting the package with the highest monotonically increasing distri revision number.
distri proves that we can build a useful Linux distribution entirely without hooks and triggers. Not having to serialize hook execution allows us to download packages into the package store with maximum concurrency.
Because we are using images instead of archives, we do not need to unpack anything. This means installing a package is really just writing its package image and metadata to the package store. Sequential writes are typically the fastest kind of storage usage pattern.
Fast installation also make other use-cases more bearable, such as creating disk
images, be it for testing them in
them on real hardware from a USB drive, or for cloud providers such as Google
Fast package builder
Contrary to how distribution package builders are usually implemented, the distri package builder does not actually install any packages into the build environment.
Instead, distri makes available a filtered view of the package store (only
declared dependencies are available) at
/ro in the build environment.
This means that even for large dependency trees, setting up a build environment happens in a fraction of a second! Such a low latency really makes a difference in how comfortable it is to iterate on distribution packages.
In distri, package images are installed from a remote package store into the
local system package store
/roimg, which backs the
A package store is implemented as a directory of package images and their associated metadata files.
You can easily make available a package store by using
To provide a mirror for your local network, you can periodically
from the package store you want to mirror, and then
distri export your local
copy. Special tooling (e.g.
debmirror in Debian) is not required because
distri install is atomic (and
Producing derivatives is easy: just add your own packages to a copy of the package store.
The package store is intentionally kept simple to manage and distribute. Its files could be exchanged via peer-to-peer file systems, or synchronized from an offline medium.
distri’s first release
distri works well enough to demonstrate the ideas explained above. I have
branched this state into branch
jackherer, distri’s first
release code name. This way, I can keep experimenting in the distri repository
without breaking your installation.
From the branch contents, our autobuilder creates:
- disk images, which…
- can be tested on real hardware
- can be tested in qemu
- can be tested in virtualbox
- can be tested in docker
- can be tested on Google Cloud
a package repository. Installations can pick up new packages with
- Definitely check out the “Cool things to try” README section.
The project website can be found at https://distr1.org. The website is just the README for now, but we can improve that later.
The repository can be found at https://github.com/distr1/distri
Right now, distri is mainly a vehicle for my spare-time Linux distribution research. I don’t recommend anyone use distri for anything but research, and there are no medium-term plans of that changing. At the very least, please contact me before basing anything serious on distri so that we can talk about limitations and expectations.
I expect the distri project to live for as long as I have blog posts to publish, and we’ll see what happens afterwards. Note that this is a hobby for me: I will continue to explore, at my own pace, parts that I find interesting.
My hope is that established distributions might get a useful idea or two from distri.
There’s more to come: subscribe to the distri feed
I don’t want to make this post too long, but there is much more!
Please subscribe to the following URL in your feed reader to get all posts about distri:
Next in my queue are articles about hermetic packages and good package maintainer experience (including declarative packaging).
Feedback or questions?
I’d love to discuss these ideas in case you’re interested!
Please send feedback to the distri mailing list so that everyone can participate!