linux/arm64
build for gokrazy,
my Go appliance platform, which started out on the
Raspberry Pi, and then a linux/amd64
one for router7,
which runs on PCs.
The update process for both of these builds is entirely automated, meaning new Linux kernel releases are automatically tested and merged, but recently the continuous integration testing failed to automatically merge Linux 6â€7 â this article is about tracking down the root cause of that failure.
gokrazy started out targeting only the Raspberry Pi, where you configure the bootloader with a plain text file on a FAT partition, so we did not need to include our own UEFI/MBR bootloader.
When I ported gokrazy to work on PCs in BIOS mode, I decided against complicated solutions like GRUB â I really wasnât looking to maintain a GRUB package. Just keeping GRUB installations working on my machines is enough work. The fact that GRUB consists of many different files (modules) that can go out of sync really does not appeal to me.
Instead, I went with Sebastian Plotzâs Minimal Linux Bootloader because it fits entirely into the Master Boot Record (MBR) and does not require any files. In bootloader lingo, this is a stage1-only bootloader. You donât even need a C compiler to compile its (Assembly) code. It seemed simple enough to integrate: just write the bootloader code into the first sector of the gokrazy disk image; done. The bootloader had its last release in 2012, so no need for updates or maintenance.
You canât really implement booting a kernel and parsing text configuration
files in 446
bytes of 16-bit
8086 assembly instructions, so to tell the bootloader where on disk to load the
kernel code and kernel command line from, gokrazy writes the disk offset
(LBA) of vmlinuz
and
cmdline.txt
to the last bytes of the bootloader code. Because gokrazy
generates the FAT partition, we know there is never any fragmentation, so the
bootloader does not need to understand the FAT file system.
The symptom was that the rtr7/kernel
pull request
#434 for updating to Linux 6.7 failed.
My continuous integration tests run in two environments: a physical embedded PC from PC Engines (apu2c4) in my living room, and a virtual QEMU PC. Only the QEMU test failed.
On the physical PC Engines apu2c4, the pull request actually passed the boot test. It would be wrong to draw conclusions like âthe issue only affects QEMUâ from this, though, as later attempts to power on the apu2c4 showed the device boot-looping. I made a mental note that something is different about how the problem affects the two environments, but both are affected, and decided to address the failure in QEMU first, then think about the PC Engines failure some more.
In QEMU, the output I see is:
SeaBIOS (version Arch Linux 1.16.3-1-1)
iPXE (http://ipxe.org) 00:03.0 C900 PCI2.10 PnP PMM+06FD3360+06F33360 C900
Booting from Hard Disk...
Notably, the kernel doesnât even seem to start â no âDecompressing linuxâ message is printed, the boot just hangs. I tried enabling debug output in SeaBIOS and eventually succeeded, but only with an older QEMU version:
Booting from Hard Disk...
Booting from 0000:7c00
In resume (status=0)
In 32bit resume
Attempting a hard reboot
This doesnât tell me anything unfortunately.
Okay, so something about introducing Linux 6.7 into my setup breaks MBR boot.
I figured using Git Bisection should identify the problematic change within a few iterations, so I cloned the currently working Linux 6.6 source code, applied the router7 config and compiled it.
To my surprise, even my self-built Linux 6.6 kernel would not boot! đČ
Why does the router7 build work when built inside the Docker container, but not when built on my Linux installation? I decided to rebase the Docker container from Debian 10 (buster, from 2019) to Debian 12 (bookworm, from 2023) and that resulted in a non-booting kernel, too!
We have two triggers: building Linux 6.7 or building older Linux, but in newer environments.
First, check out the rtr7/kernel
repository and undo the mitigation:
% mkdir -p go/src/github.com/rtr7/
% cd go/src/github.com/rtr7/
% git clone --depth=1 https://github.com/rtr7/kernel
% cd kernel
% sed -i 's,CONFIG_KERNEL_ZSTD,#CONFIG_KERNEL_ZSTD,g' cmd/rtr7-build-kernel/config.addendum.txt
% go run ./cmd/rtr7-rebuild-kernel
# takes a few minutes to compile Linux
% ls -l vmlinuz
-rw-r--r-- 1 michael michael 15885312 2024-01-28 16:18 vmlinuz
Now, you can either create a new gokrazy instance, replace the kernel and
configure the gokrazy instance to use rtr7/kernel
:
% gok -i mbr new
% gok -i mbr add .
% gok -i mbr edit
# Adjust to contain:
"KernelPackage": "github.com/rtr7/kernel",
"FirmwarePackage": "github.com/rtr7/kernel",
"EEPROMPackage": "",
âŠor you skip these steps and extract my already prepared
config to ~/gokrazy/mbr
.
Then, build the gokrazy disk image and start it with QEMU:
% GOARCH=amd64 gok -i mbr overwrite \
--full /tmp/gokr-boot.img \
--target_storage_bytes=1258299392
% qemu-system-i386 \
-nographic \
-drive file=/tmp/gokr-boot.img,format=raw
Unlike application programs, the Linux kernel doesnât depend on shared libraries at runtime, so the dependency footprint is a little smaller than usual. The most significant dependencies are the components of the build environment, like the C compiler or the linker.
So letâs look at the software versions of the known-working (Debian 10) environment and the smallest change we can make to that (upgrading to Debian 11):
To figure out if the problem is triggered by GCC, binutils, or something else entirely, I checked:
Debian 10 (buster) with its gcc-8
, but with binutils 2.35
from bullseye
still works. (Checked by updating /etc/apt/sources.list
, then upgrading only
the binutils
package.)
Debian 10 (buster), but with gcc-10
and binutils 2.35
results in a
non-booting kernel.
So it seems like upgrading from GCC 8 to GCC 10 triggers the issue.
Instead of working with a Docker container and Debianâs packages, you could also
use Nix. The instructions
arenât easy, but I used
nix-shell
to quickly try out GCC 8 (works), GCC 9 (works) and GCC 10 (kernel doesnât boot)
on my machine.
To recap, we have two triggers: building Linux 6.7 or building older Linux, but with GCC 10.
Two theories seemed most plausible to me at this point: Either a change in GCC 10 (possibly enabled by another change in Linux 6.7) is the problem, or the size of the kernel is the problem.
To verify the file size hypothesis, I padded a known-working vmlinuz
file to
the size of a known-broken vmlinuz
:
% ls -l vmlinuz
% dd if=/dev/zero bs=108352 count=1 >> vmlinuz
But, even though it had the same file size as the known-broken kernel, the padded kernel booted!
So I ruled out kernel size as a problem and started researching significant changes in GCC 10.
I read that GCC 10 changed behavior with regards to stack protection.
Indeed, building the kernel with Debian 11 (bullseye), but with
CONFIG_STACKPROTECTOR=n
makes it boot. So, I suspected that our bootloader
does not set up the stack correctly, or similar.
I sent an email to Sebastian Plotz, the author of the Minimal Linux Bootloader, to ask if he knew about any issues with his bootloader, or if stack protection seems like a likely issue with his bootloader to him.
To my surprise (it has been over 10 years since he published the bootloader!) he actually replied: He hadnât received any problem reports regarding his bootloader, but didnât really understand how stack protection would be related.
At this point, we have isolated at least one trigger for the problem, and exhausted the easy techniques of upgrading/downgrading surrounding software versions and asking upstream.
Itâs time for a Tooling Level Up! Without a debugger you can only poke into the dark, which takes time and doesnât result in thorough explanations. Particularly in this case, I think it is very likely that any source modifications could have introduced subtle issues. So letâs reach for a debugger!
Luckily, QEMU comes with built-in support for the GDB debugger. Just add the -s -S
flags to your QEMU command to make QEMU stop execution (-s
) and set up a
GDB stub (-S
) listening on localhost:1234
.
If you wanted to debug the Linux kernel, you could connect GDB to QEMU right away, but for debugging a boot loader we need an extra step, because the boot loader runs in Real Mode, but QEMUâs GDB integration rightfully defaults to the more modern Protected Mode.
When GDB is not configured correctly, it decodes addresses and registers with the wrong size, which throws off the entire disassembly â compare GDBâs output with our assembly source:
(gdb) b *0x7c00
(gdb) c
(gdb) x/20i $pc ; [expected (bootloader.asm)]
=> 0x7c00: cli ; => 0x7c00: cli
0x7c01: xor %eax,%eax ; 0x7c01: xor %ax,%ax
0x7c03: mov %eax,%ds ; 0x7c03: mov %ax,%ds
0x7c05: mov %eax,%ss ; 0x7c05: mov %ax,%ss
0x7c07: mov $0xb87c00,%esp ; 0x7c07: mov $0x7c00,%sp
0x7c0c: adc %cl,-0x47990440(%esi) ; 0x7c0a: mov $0x1000,%ax
0x7c12: add %eax,(%eax) ; 0x7c0d: mov %ax,%es
0x7c14: add %al,(%eax) ; 0x7c0f: sti
0x7c16: xor %ebx,%ebx
So we need to ensure we use qemu-system-i386
(qemu-system-x86_64
prints
Remote 'g' packet reply is too long
) and configure the GDB target
architecture to 16-bit
8086:
(gdb) set architecture i8086
(gdb) target remote localhost:1234
Unfortunately, the above doesnât actually work in QEMU 2.9 and newer: https://gitlab.com/qemu-project/qemu/-/issues/141.
On the web, people are working around this bug by using a modified target.xml
file. I
tried this, but must have made a mistake â I thought modifying target.xml
didnât help, but when I wrote this article, I found that it does actually seem
to work. Maybe I didnât use qemu-system-i386
but the x86_64
variant or
something like that.
It is typically an exercise in frustration to get older software to compile in newer environments.
Itâs much easier to use an older environment to run old software.
By querying packages.debian.org
, we can see the QEMU versions included in
current and previous Debian
versions.
Unfortunately, the oldest listed version (QEMU 3.1 in Debian 10 (buster)) isnât
old enough. By querying snapshot.debian.org
, we can see that Debian 9
(stretch) contained QEMU
2.8.
So letâs run Debian 9 â the easiest way I know is to use Docker:
% docker run --net=host -v /tmp:/tmp -ti debian:stretch
Unfortunately, the debian:stretch
Docker container does not work out of the
box anymore, because its /etc/apt/sources.list
points to the deb.debian.org
CDN, which only serves current versions and no longer serves stretch
.
So we need to update the sources.list
file to point to
archive.debian.org
. To correctly install QEMU you need both entries, the
debian
line and the debian-security
line, because the Docker container has
packages from debian-security
installed and gets confused when these are
missing from the package list:
root@650a2157f663:/# cat > /etc/apt/sources.list <<'EOT'
deb http://archive.debian.org/debian/ stretch contrib main non-free
deb http://archive.debian.org/debian-security/ stretch/updates main
EOT
root@650a2157f663:/# apt update
Now we can just install QEMU as usual and start it to debug our boot process:
root@650a2157f663:/# apt install qemu-system-x86
root@650a2157f663:/# qemu-system-i386 \
-nographic \
-drive file=/tmp/gokr-boot.img,format=raw \
-s -S
Now letâs start GDB and set a breakpoint on address 0x7c00
, which is the
address to which the BIOS loades the MBR
code and starts execution:
% gdb
(gdb) set architecture i8086
The target architecture is set to "i8086".
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000fff0 in ?? ()
(gdb) break *0x7c00
Breakpoint 1 at 0x7c00
(gdb) continue
Continuing.
Breakpoint 1, 0x00007c00 in ?? ()
(gdb)
Okay, so we have GDB attached to QEMU and can step through assembly instructions. Letâs start debugging!?
Not so fast. There is another Tooling Level Up we need first: debug symbols. Yes, even for a Minimal Linux Bootloader, which doesnât use any libraries or local variables. Having proper names for functions, as well as line numbers, will be hugely helpful in just a second.
Before debug symbols, I would directly build the bootloader using nasm bootloader.asm
, but to end up with a symbol file for GDB, we need to instruct
nasm
to generate an ELF file with debug symbols, then use ld
to link it and
finally use objcopy
to copy the code out of the ELF file again.
After commit
d29c615
in gokrazy/internal/mbr
, I have bootloader.elf
.
Back in GDB, we can load the symbols using the symbol-file
command:
(gdb) set architecture i8086
The target architecture is set to "i8086".
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000fff0 in ?? ()
(gdb) symbol-file bootloader.elf
Reading symbols from bootloader.elf...
(gdb) break *0x7c00
Breakpoint 1 at 0x7c00: file bootloader.asm, line 48.
(gdb) continue
Continuing.
Breakpoint 1, ?? () at bootloader.asm:48
48 cli
(gdb)
At this point, we need 4 commands each time we start GDB. We can automate these
by writing them to a .gdbinit
file:
% cat > .gdbinit <<'EOT'
set architecture i8086
target remote localhost:1234
symbol-file bootloader.elf
break *0x7c00
EOT
% gdb
The target architecture is set to "i8086".
0x0000fff0 in ?? ()
Breakpoint 1 at 0x7c00: file bootloader.asm, line 48.
(gdb)
The easiest way to understand program flow seems to be to step through the program.
But Minimal Linux Bootloader (MLB) contains loops that run through thousands of
iterations. You canât use gdbâs stepi
command with that.
Because MLB only contains a few functions, I eventually realized that placing a breakpoint on each function would be the quickest way to understand the high-level program flow:
(gdb) b read_kernel_setup
Breakpoint 2 at 0x7c38: file bootloader.asm, line 75.
(gdb) b check_version
Breakpoint 3 at 0x7c56: file bootloader.asm, line 88.
(gdb) b read_protected_mode_kernel
Breakpoint 4 at 0x7c8f: file bootloader.asm, line 105.
(gdb) b read_protected_mode_kernel_2
Breakpoint 5 at 0x7cd6: file bootloader.asm, line 126.
(gdb) b run_kernel
Breakpoint 6 at 0x7cff: file bootloader.asm, line 142.
(gdb) b error
Breakpoint 7 at 0x7d51: file bootloader.asm, line 190.
(gdb) b reboot
Breakpoint 8 at 0x7d62: file bootloader.asm, line 204.
With the working kernel, we get the following transcript:
(gdb)
Continuing.
Breakpoint 2, read_kernel_setup () at bootloader.asm:75
75 xor eax, eax
(gdb)
Continuing.
Breakpoint 3, check_version () at bootloader.asm:88
88 cmp word [es:0x206], 0x204 ; we need protocol version >= 2.04
(gdb)
Continuing.
Breakpoint 4, read_protected_mode_kernel () at bootloader.asm:105
105 mov edx, [es:0x1f4] ; edx stores the number of bytes to load
(gdb)
Continuing.
Breakpoint 5, read_protected_mode_kernel_2 () at bootloader.asm:126
126 mov eax, edx
(gdb)
Continuing.
Breakpoint 6, run_kernel () at bootloader.asm:142
142 cli
(gdb)
With the non-booting kernel, we get:
(gdb) c
Continuing.
Breakpoint 1, ?? () at bootloader.asm:48
48 cli
(gdb)
Continuing.
Breakpoint 2, read_kernel_setup () at bootloader.asm:75
75 xor eax, eax
(gdb)
Continuing.
Breakpoint 3, check_version () at bootloader.asm:88
88 cmp word [es:0x206], 0x204 ; we need protocol version >= 2.04
(gdb)
Continuing.
Breakpoint 4, read_protected_mode_kernel () at bootloader.asm:105
105 mov edx, [es:0x1f4] ; edx stores the number of bytes to load
(gdb)
Continuing.
Breakpoint 1, ?? () at bootloader.asm:48
48 cli
(gdb)
Okay! Now we see that the bootloader starts loading the kernel from disk into
RAM, but doesnât actually get far enough to call run_kernel
, meaning the
problem isnât with stack protection, with loading a working command line or with
anything inside the Linux kernel.
This lets us rule out a large part of the problem space. We now know that we can focus entirely on the bootloader and why it cannot load the Linux kernel into memory.
Letâs take a closer lookâŠ
In the example above, using breakpoints was sufficient to narrow down the problem.
You might think we used GDB, and it looked like this:
But thatâs not GDB! Itâs an easy mistake to make. After all, GDB starts up with just a text prompt, and as you can see from the example above, we can just enter text and achieve a good result.
To see the real GDB, you need to start it up fully, meaning including its user interface.
You can either use GDBâs text user interface (TUI), or a graphical user interface for gdb, such as the one available in Emacs.
Youâre already familiar with the architecture
, target
and breakpoint
commands from above. To also set up the text-mode user interface, we run a few
layout
commands:
(gdb) set architecture i8086
(gdb) target remote localhost:1234
(gdb) symbol-file bootloader.elf
(gdb) layout split
(gdb) layout src
(gdb) layout regs
(gdb) break *0x7c00
(gdb) continue
The layout split
command loads the text-mode user interface and splits the
screen into a register window, disassembly window and command window.
With layout src
we disregard the disassembly window in favor of a source
listing window. Both are in assembly language in our case, but the source
listing contains comments as well.
The layout src
command also got rid of the register window, which weâll get
back using layout regs
. Iâm not sure if thereâs an easier way.
The result looks like this:
The source window will highlight the next line of code that will be executed. On
the left, the B+
marker indicates an enabled breakpoint, which will become
helpful with multiple breakpoints. Whenever a register value changes, the
register and its new value will be highlighted.
The up and down arrow keys scroll the source window.
Use C-x o
to switch between the windows.
If youâre familiar with Emacs, youâll recognize the keyboard shortcut. But as an Emacs user, you might prefer the GDB Emacs user interface:
This is M-x gdb
with gdb-many-windows
enabled:
Letâs take a look at the loop that we know the bootloader is entering, but not
leaving (neither read_protected_mode_kernel_2
nor run_kernel
are ever called):
read_protected_mode_kernel:
mov edx, [es:0x1f4] ; edx stores the number of bytes to load
shl edx, 4
.loop:
cmp edx, 0
je run_kernel
cmp edx, 0xfe00 ; less than 127*512 bytes remaining?
jb read_protected_mode_kernel_2
mov eax, 0x7f ; load 127 sectors (maximum)
xor bx, bx ; no offset
mov cx, 0x2000 ; load temporary to 0x20000
mov esi, current_lba
call read_from_hdd
mov cx, 0x7f00 ; move 65024 bytes (127*512 byte)
call do_move
sub edx, 0xfe00 ; update the number of bytes to load
add word [gdt.dest], 0xfe00
adc byte [gdt.dest+2], 0
jmp short read_protected_mode_kernel.loop
The comments explain that the code loads chunks of FE00h == 65024 (127*512) bytes at a time.
Loading means calling read_from_hdd
, then do_move
. Letâs take a look at do_move
:
do_move:
push edx
push es
xor ax, ax
mov es, ax
mov ah, 0x87
mov si, gdt
int 0x15 ; line 182
jc error
pop es
pop edx
ret
int 0x15
is a call to the BIOS Service Interrupt, which will dispatch the call
based on AH == 87H
to the Move Memory Block
(techhelpmanual.com)
function.
This function moves the specified amount of memory (65024 bytes in our case) from source/destination addresses specified in a Global Descriptor Table (GDT) record.
We can use GDB to show the addresses of each of do_move
âs memory move calls by
telling it to stop at line 182 (the int 0x15
instruction) and print the GDT
recordâs destination descriptor:
(gdb) break 182
Breakpoint 2 at 0x7d49: file bootloader.asm, line 176.
(gdb) command 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>x/8bx gdt+24
>end
(gdb) continue
Continuing.
Breakpoint 1, ?? () at bootloader.asm:48
42 cli
(gdb)
Continuing.
Breakpoint 2, do_move () at bootloader.asm:182
182 int 0x15
0x7d85: 0xff 0xff 0x00 0x00 0x10 0x93 0x00 0x00
(gdb)
Continuing.
Breakpoint 2, do_move () at bootloader.asm:182
182 int 0x15
0x7d85: 0xff 0xff 0x00 0xfe 0x10 0x93 0x00 0x00
(gdb)
The destination address is stored in byte 2..4
. Remember to read these little
endian entries âback to frontâ.
Address #1 is 0x100000
.
Address #2 is 0x10fe00
.
If we press Return long enough, we eventually end up here:
Breakpoint 2, do_move () at bootloader.asm:182
182 int 0x15
0x7d85: 0xff 0xff 0x00 0x1e 0xff 0x93 0x00 0x00
(gdb)
Continuing.
Breakpoint 2, do_move () at bootloader.asm:182
182 int 0x15
0x7d85: 0xff 0xff 0x00 0x1c 0x00 0x93 0x00 0x00
(gdb)
Continuing.
Breakpoint 1, ?? () at bootloader.asm:48
42 cli
(gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x000079b0 in ?? ()
(gdb)
Now that execution left the bootloader, letâs take a look at the last do_move
call parameters: We notice that the destination address overflowed its 24 byte
data type:
0xff1e00
0x001c00
At this point I reached out to Sebastian again to ask him if there was an (undocumented) fundamental architectural limit to his Minimal Linux Bootloader â with 24 bit addresses, you can address at most 16 MB of memory.
He replied explaining that he didnât know of this limit either! He then linked to Move Memory Block (techhelpmanual.com) as proof for the 24 bit limit.
So, is it impossible to load larger kernels into memory from Real Mode? Iâm not sure.
The current bootloader code prepares a GDT in which addresses are 24 bits long at most. But note that the techhelpmanual.com documentation that Sebastian referenced is apparently for the Intel 286 (a 16 bit CPU), and some of the GDT bytes are declared reserved.
Todayâs CPUs are Intel 386-compatible (a
32 bit CPU), which seems to use one of the formerly reserved bytes to represent
bit 24..31
of the address, meaning we might be able to pass 32 bit addresses
to BIOS functions in a GDT after all!
I wasnât able to find clear authoritative documentation on the Move Memory Block API on 386+, or whether BIOS functions in general are just expected to work with 32 bit addresses.
But Microsoftâs 1989 HIMEM.SYS source contains a struct that documents this 32-bit descriptor usage. A more modern reference is this Operating Systems Class from FAU 2023 (page 71/72).
Hence Iâm thinking that most BIOS implementations should actually support 32 bit addresses for their Move Memory Block implementation â provided you fill the descriptor accordingly.
If that doesnât work out, thereâs also âUnreal Modeâ, which allows using up to 4 GB in Real Mode, but is a change that is a lot more complicated. See also Julio Merinoâs âBeyond the 1 MB barrier in DOSâ post to get an idea of the amount of code needed.
Lobsters reader abbeyj pointed out that the following code change should fix the truncation and result in a GDT with all address bits in the right place:
--- i/mbr/bootloader.asm
+++ w/mbr/bootloader.asm
@@ -119,6 +119,7 @@ read_protected_mode_kernel:
sub edx, 0xfe00 ; update the number of bytes to load
add word [gdt.dest], 0xfe00
adc byte [gdt.dest+2], 0
+ adc byte [gdt.dest+5], 0
jmp short read_protected_mode_kernel.loop
read_protected_mode_kernel_2:
âŠand indeed, in my first test this seems to fix the problem! Itâll take me a little while to clean this up and submit it. You can follow gokrazy issue #248 if youâre interested.
There are actually a couple of BIOS implementations that we can look into to get a better understanding of how Move Memory Block works.
We can look at DOSBox, an open source DOS emulator. Its Move Memory Block implementation does seem to support 32 bit addresses:
PhysPt dest = (mem_readd(data+0x1A) & 0x00FFFFFF) +
(mem_readb(data+0x1E)<<24);
Another implementation is SeaBIOS. Contrary to DOSBox, SeaBIOS is not just used in emulation: The PC Engines apu uses coreboot with SeaBIOS. QEMU also uses SeaBIOS.
The SeaBIOS handle_1587
source
code
is a little harder to follow, because it requires knowledge of Real Mode
assembly. The way I read it, SeaBIOS doesnât truncate or otherwise modify the
descriptors and just passes them to the CPU. On 386 or newer, 32 bit addresses
should work.
While itâs great to understand the limitation weâre running into, I wanted to unblock the pull request as quickly as possible, so I needed a quick mitigation instead of investigating if my speculation can be developed into a proper fix.
When I started router7, we didnât support loadable kernel modules, so everything had to be compiled into the kernel. We now do support loadable kernel modules, so I could have moved functionality into modules.
Instead, I found an even easier quick fix: switching from gzip to zstd compression. This saved about 1.8 MB and will buy us some time to implement a proper fix while unblocking automated new Linux kernel version merges.
I wanted to share this debugging story because it shows a couple of interesting lessons:
Being able to run older versions of various parts of your software stack is a very valuable debugging tool. It helped us isolate a trigger for the bug (using an older GCC) and it helped us set up a debugging environment (using an older QEMU).
Setting up a debugger can be annoying (symbol files, learning the UI) but itâs so worth it.
Be on the lookout for wrong turns during debugging. Write down every conclusion and challenge it.
The BIOS can seem mysterious and âtoo low levelâ but there are many blog posts, lectures and tutorials. You can also just read open-source BIOS code to understand it much better.
Enjoy poking at your BIOS!
I found the following resources helpful:
]]>On servers, this isnât what I want â in general itâs helpful for automated recovery if daemons are restarted indefinitely. As long as you donât have circular dependencies between services, all your services will eventually come up after transient failures, without having to specify dependencies.
This is particularly useful because specifying dependencies on the systemd level introduces footguns: when interactively stopping individual services, systemd also stops the dependents. And then you need to remember to restart the dependent services later, which is easy to forget.
To make systemd restart a service indefinitely, I first like to create a drop-in config file like so:
cat > /etc/systemd/system/restart-drop-in.conf <<'EOT'
[Unit]
StartLimitIntervalSec=0
[Service]
Restart=always
RestartSec=1s
EOT
Then, I can enable the restart behavior for individual services like
prometheus-node-exporter
, without having to modify their .service
files
(which needs manual effort when updating):
cd /etc/systemd/system
mkdir prometheus-node-exporter.service.d
cd prometheus-node-exporter.service.d
ln -s ../restart-drop-in.conf
systemctl daemon-reload
If most of your services set Restart=always
or Restart=on-failure
, you can
change the system-wide defaults for RestartSec
and StartLimitIntervalSec
like so:
mkdir /etc/systemd/system.conf.d
cat > /etc/systemd/system.conf.d/restartdefaults.conf <<'EOT'
[Manager]
DefaultRestartSec=1s
DefaultStartLimitIntervalSec=0
EOT
systemctl daemon-reload
So why do we need to change these settings to begin with?
The default systemd settings (as of systemd 255) are:
DefaultRestartSec=100ms
DefaultStartLimitIntervalSec=10s
DefaultStartLimitBurst=5
This means that services which specify Restart=always
are restarted 100ms
after they crash, and if the service crashes more than 5 times in 10 seconds,
systemd does not attempt to restart the service anymore.
Itâs easy to see that for a service which takes, say, 100ms to crash, for example because it canât bind on its listening IP address, this means:
time | event |
---|---|
T+0 | first start |
T+100ms | first crash |
T+200ms | second start |
T+300ms | second crash |
T+400ms | third start |
T+500ms | third crash |
T+600ms | fourth start |
T+700ms | fourth crash |
T+800ms | fifth start |
T+900ms | fifth crash within 10s |
T+1s | systemd gives up |
Iâm not sure. If I had to speculate, I would guess the developers wanted to prevent laptops running out of battery too quickly because one CPU core is permanently busy just restarting some service thatâs crashing in a tight loop.
That same goal could be achieved with a more relaxed DefaultRestartSec=
value,
though: With DefaultRestartSec=5s
, for example, we would sufficiently space
out these crashes over time.
There is some recent discussion upstream regarding changing the default. Letâs see where the discussion goes.
]]>In this article, I describe my goals, which hardware I picked for my new build (and why) and how I set it up.
I use my network storage devices primarily for archival (daily backups), and secondarily as a media server.
There are days when I donât consume any media (TV series and movies) from my NAS, because I have my music collection mirrored to another server thatâs running 24/7 anyway. In total, my NAS runs for a few hours in some evenings, and for about an hour (daily backups) in the mornings.
This usage pattern is distinctly different than, for example, running a NAS as a file server for collaborative video editing that needs to be available 24/7.
The goals of my NAS setup are:
--delete
flag.In this specific build, I am trying out ZFS. Because I have two NAS builds running, it is easy to change one variable of the system (which file system to use) in one build, without affecting the other build.
My main motivation for using ZFS instead of ext4
is that ZFS does data checksumming, whereas ext4 only checksums metadata and the journal, but not data at rest. With large enough datasets, the chance of bit flips increases significantly, and I would prefer to know about them so that I can restore the affected files from another copy.
Each of the two storage builds has (almost) the same components. This makes it easy to diagnose one with the help of the other. When needed, I can swap out components of the second build to temporarily repair the first one, or vice versa.
Price | Type | Article | Remark |
---|---|---|---|
114 CHF | mainboard | AsRock B450 Gaming ITX/ac | Mini ITX |
80 CHF | cpu | AMD Athlon 3000G | 35W TDP, GPU |
65 CHF | cpu cooler | Noctua NH-L12S | silent! |
58 CHF | power supply | Silverstone ST30SF 300W SFX | SFX form factor |
51 CHF | case | Silverstone SST-SG05BB-Lite | Mini ITX |
48 CHF | system disk | WD Red SN700 250GB | M.2 NVMe |
32 CHF | case fan | Noctua NF-S12A ULN | silent 120mm |
28 CHF | ram | 8 GB DDR4 Value RAM (F4-2400C15-8GNT) |
The total price of 476 CHF makes this not a cheap build.
But, I think each component is well worth its price. Hereâs my thinking regarding the components:
As a disclaimer: the two builds I use are very similar to the component list above, with the following differences:
I didnât describe the exact builds I use because a component list is more useful if the components on it are actually available :-).
It used to be that Solid State Drives (SSDs) were just way too expensive compared to spinning hard disks when talking about terabyte sizes, so I used to put the largest single disk drive I could find into each NAS build: I started with 8 TB disks, then upgraded to 16 TB disks later.
Luckily, the price of flash storage has come down quite a bit: the Samsung SSD 870 QVO (8 TB) costs âonlyâ 42 CHF per TB. For a total of 658 CHF, I can get 16 TB of flash storage in 2 drives:
Of course, spinning hard disks are at 16 CHF per TB, so going all-flash is over 3x as expensive.
I decided to pay the premium to get a number of benefits:
The choice of CPU, Mainboard and Network Card all influence the total power usage of the system. Here are a couple of measurements to give you a rough idea of the power usage:
build | CPU | main board | network card | idle | load |
---|---|---|---|---|---|
s2 | 5600X | B450 | 10G: Mellanox ConnectX-3 | 26W | 60W |
s3 | 200GE | AB350 | 10G: FS Intel 82599 | 28W | 50W |
s3 | 200GE | AB350 | 1G onboard | 23W | 40W |
These values were measured using a myStrom WiFi Switch.
Before this build, I ran my NAS using Docker containers on CoreOS (later renamed to Container Linux), which was a light-weight Linux distribution focused on containers. There are two parts about CoreOS that I liked most.
The most important part was that CoreOS updated automatically, using an A/B updating scheme, just like I do in gokrazy. I want to run as many of my devices as possible with A/B updates.
The other bit I like is that the configuration is very clearly separated from the OS. I managed the configuration (a cloud-init YAML file) on my main PC, so when swapping out the NAS system disk with a blank disk, I could just plug my config file into the CoreOS installer, and be done.
When CoreOS was bought by Red Hat and merged into Project Atomic, there wasnât a good migration path and cloud-init wasnât supported anymore. As a short-term solution, I switched from CoreOS to Flatcar Linux, a spiritual successor.
For this build, I wanted to try out ZFS. I always got the impression that ZFS was a pain to run because its kernel modules are not included in the upstream Linux kernel source.
Then, in 2016, Ubuntu decided to include ZFS by default. There are a couple of other Linux distributions on which ZFS seems easy enough to run, like Gentoo, Arch Linux or NixOS.
I wanted to spend my âinnovation tokensâ on ZFS, and keep the rest boring and similar to what I already know and work with, so I chose Ubuntu Server over NixOS. Itâs similar enough to Debian that I donât need to re-learn.
Luckily, the migration path from Flatcarâs cloud-init config to Ubuntu Server is really easy: just copy over parts of the cloud-config until youâre through the entire thing. Itâs like a checklist!
In the future, it might be interesting to build a NAS setup using gokrazy. In particular since we now can run Docker containers on gokrazy, which makes running Samba or Jellyfin quite easy!
Using gokrazy instead of Ubuntu Server would get rid of a lot of moving parts. The current blocker is that ZFS is not available on gokrazy. Unfortunately thatâs not easy to change, in particular also from a licensing perspective.
I changed the following UEFI settings:
Advanced â ACPI Configuration â PCIE Devices Power On: Enabled
Advanced â Onboard Devices Configuration â Restore on AC/Power Loss: Power On
I like to configure static IP addresses for devices that are a permanent part of my network.
I have come to prefer configuring static addresses as static DHCP leases in my router, because then the address remains the same no matter which operating system I boot â whether itâs the installed one, or a live USB stick for debugging.
Download Ubuntu Server from https://ubuntu.com/download/server
Disable swap:
swapoff -a
$EDITOR /etc/fstab
# delete the swap lineAutomatically load the corresponding sensors kernel module for the mainboard so that the Prometheus node exporter picks up temperature values and fan speed values:
echo nct6775 | sudo tee /etc/modules
Enable unattended upgrades:
dpkg-reconfigure -plow unattended-upgrades
Edit /etc/apt/apt.conf.d/50unattended-upgrades
â I like to make the following changes:
Unattended-Upgrade::MinimalSteps "true";
Unattended-Upgrade::Mail "michael@example.net";
Unattended-Upgrade::MailReport "only-on-error";
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-Time "08:00";
Unattended-Upgrade::SyslogEnable "true";
I have come to like Tailscale. Itâs a mesh VPN (data flows directly between the machines) that allows me access to and from my PCs, servers and storage machines from anywhere.
Specifically, I followed the install Tailscale on Ubuntu 22.04 guide.
For monitoring, I have an existing Prometheus setup. To add a new machine to my setup, I need to configure it as a new target on my Prometheus server. In addition, I need to set up Prometheus on the new machine.
First, I installed the Prometheus node exporter using apt install prometheus-node-exporter
.
Then, I modified /etc/default/prometheus-node-exporter
to only listen on the Tailscale IP address:
ARGS="--web.listen-address=100.85.3.16:9100"
Lastly, I added a systemd override to ensure the node exporter keeps trying to start until tailscale is up: the command systemctl edit prometheus-node-exporter
opens an editor, and I configured the override like so:
# /etc/systemd/system/prometheus-node-exporter.service.d/override.conf
[Unit]
# Allow infinite restarts, even within a short time.
StartLimitIntervalSec=0
[Service]
RestartSec=1
Similar to the static IPv4 address, I like to give my NAS a static IPv6 address as well. This way, I donât need to reconfigure remote systems when I (sometimes temporarily) switch my NAS to a different network card with a different MAC address. Of course, this point becomes moot if I ever switch all my backups to Tailscale.
Ubuntu Server comes with Netplan by default, but I donât know Netplan and donât want to use it.
To switch to systemd-networkd
, I ran:
apt remove --purge netplan.io
Then, I created a systemd-networkd
config file with a static IPv6 token, resulting in a predictable IPv6 address:
$EDITOR /etc/systemd/network/enp.network
My config file looks like this:
[Match]
Name=enp*
[Network]
DHCP=yes
IPv6Token=0:0:0:0:10::253
IPv6AcceptRouterAdvertisements=yes
An easy way to configure Linuxâs netfilter
firewall is to apt install iptables-persistent
. That package takes care of saving firewall rules on shutdown and restoring them on the next system boot.
My rule setup is very simple: allow ICMP (IPv6 needs it), then set up ACCEPT
rules for the traffic I expect, and DROP
the rest.
Hereâs my resulting /etc/iptables/rules.v6
from such a setup:
/etc/iptables/rules.v6
# Generated by ip6tables-save v1.4.14 on Fri Aug 26 19:57:51 2016
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p ipv6-icmp -m comment --comment "IPv6 needs ICMPv6 to work" -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -m comment --comment "Allow packets for outgoing connections" -j ACCEPT
-A INPUT -s fe80::/10 -d fe80::/10 -m comment --comment "Allow link-local traffic" -j ACCEPT
-A INPUT -s 2001:db8::/64 -m comment --comment "local traffic" -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m comment --comment "SSH" -j ACCEPT
COMMIT
# Completed on Fri Aug 26 19:57:51 2016
Before you can use ZFS, you need to install the ZFS tools using apt install zfsutils-linux
.
Then, we create a zpool that spans both SSDs:
zpool create \
-o ashift=12 \
srv \
/dev/disk/by-id/ata-Samsung_SSD_870_QVO_8TB_S5SSNF0TC06121Z \
/dev/disk/by-id/ata-Samsung_SSD_870_QVO_8TB_S5SSNF0TC06787P
The -o ashift=12
ensures proper alignment on disks with a sector size of either 512B or 4KB.
On that zpool, we now create our datasets:
(echo -n on-device-secret && \
wget -qO - https://autounlock.zekjur.net:8443/nascrypto) | zfs create \
-o encryption=on \
-o compression=off \
-o atime=off \
-o keyformat=passphrase \
-o keylocation=file:///dev/stdin \
srv/data
The key Iâm piping into zfs create
is constructed from two halves: the on-device secret and the remote secret, which is a setup Iâm using to implement an automated crypto unlock that is remotely revokable. See the next section for the corresponding unlock.service
.
I repeated this same command (adjusting the dataset name) for each dataset: I currently have one for data
and one for backup
, just so that the used disk space of each major use case is separately visible:
df -h /srv /srv/backup /srv/data
Filesystem Size Used Avail Use% Mounted on
srv 4,2T 128K 4,2T 1% /srv
srv/backup 8,1T 3,9T 4,2T 49% /srv/backup
srv/data 11T 6,4T 4,2T 61% /srv/data
To detect errors on your disks, ZFS has a feature called âscrubbingâ. I donât think I need to scrub more often than monthly, but maybe your scrubbing requirements are different.
I enabled monthly scrubbing on my zpool srv
:
systemctl enable --now zfs-scrub-monthly@srv.timer
On this machine, a scrub takes a little over 4 hours and keeps the disks busy:
scan: scrub in progress since Wed Oct 11 16:32:05 2023
808G scanned at 909M/s, 735G issued at 827M/s, 10.2T total
0B repaired, 7.01% done, 03:21:02 to go
We can confirm by looking at the Prometheus Node Exporter metrics:
The other maintenance-related setting I changed is to enable automated TRIM:
zpool set autotrim=on srv
To automatically unlock the encrypted datasets at boot, Iâm using a custom unlock.service
systemd service file.
My unlock.service
constructs the crypto key from two halves: the on-device secret and the remote secret thatâs downloaded over HTTPS.
This way, my NAS can boot up automatically, but in an emergency I can remotely stop this mechanism.
[Unit]
Description=unlock hard drive
Wants=network.target
After=systemd-networkd-wait-online.service
Before=samba.service
[Service]
Type=oneshot
RemainAfterExit=yes
# Wait until the host is actually reachable.
ExecStart=/bin/sh -c "c=0; while [ $c -lt 5 ]; do /bin/ping6 -n -c 1 autounlock.zekjur.net && break; c=$((c+1)); sleep 1; done"
ExecStart=/bin/sh -c "(echo -n secret && wget --retry-connrefused -qO - https://autounlock.zekjur.net:8443/nascrypto) | zfs load-key srv/data"
ExecStart=/bin/sh -c "(echo -n secret && wget --retry-connrefused -qO - https://autounlock.zekjur.net:8443/nascrypto) | zfs load-key srv/backup"
ExecStart=/bin/sh -c "zfs mount srv/data"
ExecStart=/bin/sh -c "zfs mount srv/backup"
[Install]
WantedBy=multi-user.target
For the last 10 years, I have been doing my backups using rsync
.
Each machine pushes an incremental backup of its entire root file system (and any mounted file systems that should be backed up, too) to the backup destination (storage2/3).
All the machines Iâm backing up run Linux and the ext4
file system. I verified that my backup destination file systems support all the features of the backup source file system that I care about, i.e. extended attributes and POSIX ACLs.
The scheduling of backups is done by âdornröschenâ, a Go program that wakes up the backup sources and destination machines and starts the backup by triggering a command via SSH.
The backup scheduler establishes an SSH connection to the backup source.
On the backup source, I authorized the scheduler like so, meaning it will run /root/backup.pl
when connecting:
command="/root/backup.pl",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAAC3Nzainvalidkey backup-scheduler
backup.pl runs rsync
, which establishes another SSH connection, this time from the backup source to the backup destination.
On the backup destination (storage2/3), I authorize the backup sourceâs SSH public key to run rrsync(1)
, a script that only permits running rsync
in the specified directory:
command="/usr/bin/rrsync /srv/backup/server.zekjur.net",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAAC3Nzainvalidkey server.zekjur.net
I found it easiest to signal readiness by starting an empty HTTP server gated on After=unlock.service
in systemd:
/etc/systemd/system/healthz.service
[Unit]
Description=nginx for /srv health check
Wants=network.target
After=unlock.service
Requires=unlock.service
StartLimitInterval=0
[Service]
Restart=always
# https://itectec.com/unixlinux/restarting-systemd-service-on-dependency-failure/
ExecStartPre=/bin/sh -c 'systemctl is-active docker.service'
# Stay on the same major version in the hope that nginx never decides to break
# the config file syntax (or features) without doing a major version bump.
ExecStartPre=/usr/bin/docker pull nginx:1
ExecStartPre=-/usr/bin/docker kill nginx-healthz
ExecStartPre=-/usr/bin/docker rm -f nginx-healthz
ExecStart=/usr/bin/docker run \
--name nginx-healthz \
--publish 10.0.0.253:8200:80 \
--log-driver=journald \
nginx:1
[Install]
WantedBy=multi-user.target
My wake
program then polls that port and returns once the server is up, i.e. the file system has been unlocked and mounted.
Instead of explicitly triggering a shutdown from the scheduler program, I run âdramaqueenâ, which shuts down the machine after 10 minutes, but will be inhibited while a backup is running. Optionally, shutting down can be inhibited while there are active samba sessions.
/etc/systemd/system/dramaqueen.service
[Unit]
Description=dramaqueen
After=docker.service
Requires=docker.service
[Service]
Restart=always
StartLimitInterval=0
# Always pull the latest version (bleeding edge).
ExecStartPre=-/usr/bin/docker pull stapelberg/dramaqueen
ExecStartPre=-/usr/bin/docker rm -f dramaqueen
ExecStartPre=/usr/bin/docker create --name dramaqueen stapelberg/dramaqueen
ExecStartPre=/usr/bin/docker cp dramaqueen:/usr/bin/dramaqueen /tmp/
ExecStartPre=/usr/bin/docker rm -f dramaqueen
ExecStart=/tmp/dramaqueen -net_command=
[Install]
WantedBy=multi-user.target
Luckily, the network driver of the onboard network card supports WOL by default. If thatâs not the case for your network card, see the Arch wiki Wake-on-LAN article.
I have been running a PC-based few-large-disk Network Storage setup for years at this point, and I am very happy with all the properties of the system. I expect to run a very similar setup for years to come.
The low-tech approach to backups of using rsync has worked well â without changes â for years, and I donât see rsync going away anytime soon.
The upgrade to all-flash is really nice in terms of random access time (for incremental backups) and to eliminate one of the largest sources of noise from my builds.
ZFS seems to work fine so far and is well-integrated into Ubuntu Server.
There are solutions for almost everyoneâs NAS needs. This build obviously hits my personal sweet spot, but your needs and preferences might be different!
Here are a couple of related solutions:
My current monitor is a Dell 32-inch 8K monitor (UP3218K), which has a brilliant picture, but a few annoying connectivity limitations and quirks â it needs two (!) DisplayPort cables on a GPU with MST support, meaning that in practice, it only works with nVidia graphics cards.
I was curious to try out the new 6K monitor to see if it would improve the following points:
I read a review on heise+ (also included in their c’t magazine), but the review canât answer these subjective questions of mine.
So I ordered one and tried it out!
The native resolution of this monitor is 6144x3456 pixels.
To drive that resolution at 60 Hz, about 34 Gbps of data rate is needed.
DisplayPort 1.4a only offers a data rate of 25 Gbps, so your hardware and driver need to support Display Stream Compression (DSC) to reach the full resolution at 60 Hz. I tried using DisplayPort 2.0, which supports 77 Gbps of data rate, but the only GPU I have that supports DisplayPort 2 is the Intel A380, which I could not get to work well with this monitor (see the next section).
HDMI 2.1 offers 42 Gbps of data rate, but in my setup, the link would still always use DSC.
Here are the combinations I have successfully tried:
Device | Cable | OS / Driver | Resolution |
---|---|---|---|
MacBook Air M1 | TB 3 | macOS 13.4.1 | native @ 60 Hz, 8.1Gbps |
GeForce RTX 4070 (DisplayPort 1.4a) |
mDP-DP | Windows 11 21H2 | native @ 60 Hz, 12Gbps DSC |
GeForce RTX 4070 | mDP-DP | Linux 6.3 nVidia 535.54.03 |
native @ 60 Hz, 8.1Gbps DSC |
GeForce RTX 4070 (HDMI 2.1a) |
HDMI | Windows 11 21H2 | native @ 60 Hz, 8.1Gbps DSC |
GeForce RTX 4070 | HDMI | Linux 6.3 nVidia 535.54.03 |
native @ 60 Hz, 6Gbps 3CH DSC |
GeForce RTX 3060 | HDMI | Linux 6.3 nVidia 535.54.03 |
native @ 60 Hz, 6Gbps 3CH DSC |
ThinkPad X1 Extreme | TB 4 | Linux 6.3 nVidia 535.54.03 |
native @ 60 Hz, 8.1Gbps DSC |
The MacBook Air is the only device in my test that reaches full resolution without using DSC.
Letâs talk about the combinations that did not work well.
You need a quite recent version of the nVidia driver, as they just recently shipped support for DSC at high resolutions. I successfully used DSC with 535.54.03.
With the âolderâ 530.41.03, I could only select 6016x3384 at 60 Hz, which is not the native resolution of 6144x3456 at 60 Hz.
Device | Cable | OS / Driver | Resolution |
---|---|---|---|
GeForce RTX 4070 (DisplayPort 1.4a) |
mDP-DP | Linux 6.3 nVidia 530.41.03 |
native @ 30 Hz only, 6016x3384@60 |
GeForce RTX 4070 (HDMI 2.1a) |
HDMI | Linux 6.3 nVidia 530.41.03 |
native @ 30 Hz only, 6016x3384@60 |
I was so excited when Intel announced that they are entering the graphics card business. With all the experience and driver support for their integrated graphics, I hoped for good Linux support.
Unfortunately, the Intel A380 I bought months ago continues to disappoint.
I could not get the 6K monitor to work at any resolution higher than 4K, not even under Windows. Worse, when connecting the monitor using DisplayPort, I wouldnât get a picture at all (in Linux)!
Device | Cable | OS / Driver | Resolution |
---|---|---|---|
ASRock Intel A380 (DisplayPort 2.0) |
mDP-DP | Windows 11 21H2 Intel 31.0.101.4502 |
only 4K @ 60 Hz |
ASRock Intel A380 (HDMI 2.0b) |
HDMI | Windows 11 21H2 Intel 31.0.101.4502 |
only 4K @ 60 Hz |
ASRock Intel A380 (DisplayPort 2.0) |
mDP-DP | Linux 6.4 | no picture in Xorg! |
ASRock Intel A380 (HDMI 2.0b) |
HDMI | Linux 6.4 | only 4K @ 60 Hz |
I suspend my PC to RAM at least once per day, sometimes even more often.
With my current 8K monitor, I have nailed the suspend/wakeup procedure. With the
help of a smart plug, Iâm automatically turning the monitor off (on suspend) and
on (on wakeup). After a couple of seconds of delay, I configure the correct
resolution using xrandr
.
I had hoped that the 6K monitor would make any sort of intricate automation superfluous.
Unfortunately, when I resumed my PC, I noticed that the monitor would not show a
picture at all! I had to log in from my laptop via SSH to change the resolution
with xrandr
to 4K, then power the monitor off and on again, then change
resolution back to the native 6K.
Once you have a physical connection established, how do you configure your computer? With 6K at 32 inches, youâll need to enable some kind of scaling in order to comfortably read text.
This section shows what options Linux and macOS offer.
Just like many other programs on Linux, you configure i3âs scaling by setting
the Xft.dpi
X
resource. The default is 96
dpi, so to get 200% scaling, set Xft.dpi: 192
.
Personally, I found 240% scaling more comfortable, i.e. Xft.dpi: 230
.
This corresponds to a logical resolution of 2560x1440 pixels.
I figured Iâd also give Wayland a shot, so I ran GNOME in Fedora 38 on my ThinkPad X1 Extreme.
Hereâs what the settings app shows in its âDisplaysâ tab:
I tried enabling fractional scaling, but then GNOME froze until I disconnected the Dell monitor.
When connecting the monitor to my MacBook Air M1 (2020), it defaults to a logical resolution of 3072x1728, i.e. 200% scaling.
For comparison, with Appleâs (5K) Studio Display, the default setting is 2560x1440 (200% scaling), or 2880x1620 (âMore Spaceâ, 177% scaling).
I remember the uproar when Lenovo introduced ThinkPads with glossy screens. At the time, I thought I prefer matte screens, but over the years, I heard that glossy screens are getting better and better, and consumers typically prefer them for their better picture quality.
The 8K monitor Iâm using has a glossy screen on which reflections are quite visible. The MacBook Airâs screen shows fewer reflections in comparison.
Dellâs 6K monitor offers me a nice opportunity to see which option I prefer.
Surprisingly, I found that I donât like the matte screen better!
Itâs hard to describe, but somehow the picture seems more âdullâ, or less bright (independent of the actual brightness of the monitor), or more toned down. The colors donât pop as much.
One thing that I did not anticipate beforehand is the difference in how peripherals are treated when they are built into the monitor vs. when they are plugged into a USB hub.
I like to have my peripherals off-by-default, with âonâ being the exceptional state. In fact, I leave my microphone disconnected and only plug its USB cable in when I need it. I also recently realized that I want sound to only be played on headphones, so I disconnected my normal speakers in favor of my Bluetooth dongle.
The 6K monitor, on the other hand, has all of its peripherals on-by-default, and bright red LEDs light up when the speaker or microphone is muted.
This is the opposite of how I want my peripherals to behave, but of course I understand why Dell developed the monitor with on-by-default peripherals.
Letâs go back to the questions I started the article with and answer them one by one:
Does the 6K monitor work well with most (all?) of my PCs and laptops?
â Answer: The 6K monitor works a lot better than the 8K monitor, but thatâs a low bar to clear. I would still call the 6K monitor finicky. Even when you run a latest-gen GPU with latest drivers, the monitor does not reliably show a picture after a suspend/resume cycle.
Is 6K resolution enough, or would I miss the 8K resolution?
â Answer: I had really hoped that 6K would turn out to be enough, but the difference to 8K is visible with the bare eye. Just like 200% scaling is a nice step up from working at 96 dpi, 300% scaling (what I use on 8K) is another noticeable step up.
Is a matte screen the better option compared to the 8K monitorâs glossy finish?
â Answer: While I donât like the reflections in Dellâs 8K monitor, the picture quality is undeniably better compared to a matte screen. The 6K monitor just doesnât look as good, and itâs not just about the difference in text sharpness.
Do the built-in peripherals work with Linux out of the box?
â Answer: Yes, as far as I can tell. The webcam works fine with the
generic uvcvideo
USB webcam driver, the microphone and speakers work out of
the box. I have not tested the presence sensor.
So, would I recommend the monitor? Depends on what youâre using as your current monitor and as the device you want to connect!
If youâre coming from a 4K display, the 6K resolution will be a nice step up. Connecting a MacBook Air M1 or newer is a great experience. If you want to connect PCs, be sure to use a new-enough nVidia GPU with latest drivers. Even under these ideal conditions, you might run into quirks like the no picture after resume problem. If you donât mind early adopter pains like that, and are looking for a monitor that includes peripherals, go for it!
For me, switching from my 8K monitor would be a downgrade without enough benefits.
The ideal monitor for me would be a mixture between Dellâs 8K and 6K models:
Maybe theyâll develop an updated version of the 8K monitor at some point?
]]>Iâm excited to let you know that gokrazy now comes with a re-designed gok
command line tool and gokrazy instance configuration mechanism!
The traditional way to run Go software on a Raspberry Pi would be to install Raspbian or some other Linux distribution onto the SD card, copy over your program(s) and then maintain that installation (do regular updates).
I thought it would be nicer to run my Raspberry Pis such that only Go software is run by the Linux kernel on it, without any traditional Linux distribution programs like package managers or even the usual GNU Core Utilities.
gokrazy builds Go programs into a read-only SquashFS root file system image. When that image is started on a Raspberry Pi, a minimal init system supervises the Go programs, and a DHCP and NTP client configure the IP address and synchronize the time, respectively. After the first installation, all subsequent updates can be done over the network, with an A/B partitioning scheme.
I use gokrazy to, for example:
Connect to the internet using router7, my small home internet router written in Go, running on a fast router PC build that handles a 25 Gbit/s Fiber To The Home connection.
Automate the lights in my home, and control and monitor the heating.
Offer Tailscale access to a Raspberry Pi Zero 2 W in my home network to then send Wake On Lan (WOL) packets before SSH’ing into my normally-suspended computers. See also my post DIY out-of-band management: remote console server (2022).
Previously, the concept of gokrazy instance configuration was only a
convention. Each gokrazy build was created using the gokr-packer
CLI tool, and
configured by the packerâs command-line flags, parameters, config files in
~/.config
and per-package config files in the current directory
(e.g. flags/github.com/gokrazy/breakglass/flags.txt
).
Now, all gokrazy commands and tools understand the --instance
flag (or -i
for short), which determines the directory from which the Instance
Config is read. For a gokrazy
instance named âhelloâ, the default directory is ~/gokrazy/hello
, which
contains the config.json
file.
Letâs say I have the evcc repository cloned
as ~/src/evcc
. evcc is an electric vehicle charge controller with PV
integration, written in Go.
Now I want to run evcc
on my Raspberry Pi using gokrazy. First, I create a new
instance:
% gok -i evcc new
gokrazy instance configuration created in /home/michael/gokrazy/evcc/config.json
(Use 'gok -i evcc edit' to edit the configuration interactively.)
Use 'gok -i evcc add' to add packages to this instance
To deploy this gokrazy instance, see 'gok help overwrite'
Now letâs add our working copy of evcc
to the instance:
% gok -i evcc add .
2023/01/15 18:55:39 Adding the following package to gokrazy instance "evcc":
Go package : github.com/evcc-io/evcc
in Go module: github.com/evcc-io/evcc
in local dir: /tmp/evcc
2023/01/15 18:55:39 Creating gokrazy builddir for package github.com/evcc-io/evcc
2023/01/15 18:55:39 Creating go.mod with replace directive
go: creating new go.mod: module gokrazy/build/github.com/evcc-io/evcc
2023/01/15 18:55:39 Adding package to gokrazy config
2023/01/15 18:55:39 All done! Next, use 'gok overwrite' (first deployment), 'gok update' (following deployments) or 'gok run' (run on running instance temporarily)
We might want to monitor this Raspberry Piâs stats later, so letâs add the Prometheus node exporter to our gokrazy instance, too:
% gok -i evcc add github.com/prometheus/node_exporter
2023/01/15 19:04:05 Adding github.com/prometheus/node_exporter as a (non-local) package to gokrazy instance evcc
2023/01/15 19:04:05 Creating gokrazy builddir for package github.com/prometheus/node_exporter
2023/01/15 19:04:05 Creating go.mod before calling go get
go: creating new go.mod: module gokrazy/build/github.com/prometheus/node_exporter
2023/01/15 19:04:05 running [go get github.com/prometheus/node_exporter@latest]
go: downloading github.com/prometheus/node_exporter v1.5.0
[âŠ]
2023/01/15 19:04:07 Adding package to gokrazy config
Itâs time to insert an SD card (/dev/sdx
), which we will overwrite with a
gokrazy build:
% gok -i evcc overwrite --full /dev/sdx
See gokrazy quickstart for more detailed instructions.
The new gok
subcommands (add
, update
, etc.) are much easier to manage than
long gokr-packer
command lines.
The new Automation page shows how
to automate common tasks, be it daily updates via cron
, or automated building
in Continuous Integration environments like GitHub Actions.
Are you already a gokrazy user? If so, see the Instance Config Migration
Guide for how to switch from
the old gokr-packer
tool to the new gok
command.
If you have any questions, please feel free to reach out at gokrazy GitHub Discussions đ
]]>While a commercial solution like IPMI offers many more features like remote serial, or remote image mounting, this DIY solution feels really magical, and has great price performance if all you need is power management.
To save power, I want to shut down my network storage PC when it isnât currently needed.
For this plan to work out, my daily backup automation needs to be able to turn on the network storage PC, and power it back off when done.
Usually, I implement that via Wake On LAN (WOL). But, for this particular machine, I donât have an ethernet network link, I only have a fiber link. Unfortunately, it seems like none of the 3 different 10 Gbit/s network cards I tested has functioning Wake On LAN, and when I asked on Twitter, none of my followers had ever seen functioning WOL on any 10 Gbit/s card. I suppose itâs not a priority for the typical target audience of these network cards, which go into always-on servers.
I didnât want to run an extra 10 Gbit/s switch just for WOL over an ethernet connection, because switches like the MikroTik CRS305-1G-4S+IN consume at least 10W. As the network storage PC only consumes about 20W overall, I wanted a more power-efficient option.
The core of this DIY remote power button is a WiFi-enabled micro controller such as the ESP32. To power the micro controller, I use the 5V standby power on the mainboardâs USB 2.0 pin headers, which is also available when the PC is turned off and only the power supply (PSU) is turned on. A micro controller with an on-board 5V voltage regulator is convenient for this.
Aside from the micro controller, we also need a transistor or logic-level MOSFET to simulate a push of the power button, and a resistor to control the transistor. An opto coupler is not needed, since the ESP32 is powered from the mainboard, not from a separate power supply.
The mainboardâs front panel header contains a POWERBTN#
signal (3.3V), and a
GND
signal. When connecting a typical PC case power button to the header, you
donât need to pay attention to the polarity. This is because the power button
just physically connects the two signals.
In our case, the polarity matters, because we need the 3.3V on the transistorâs
drain pin, otherwise we wonât be able to control the transistor via its base
pin. The POWERBTN#
3.3V signal is typically labeled +
on the mainboard (or
in the manual), whereas GND
is labeled -
. If you are unsure, double-check
the voltage using a multimeter.
I wanted a quick solution (with ideally no custom firmware development) and was already familiar with ESPHome, which turns out to very easily implement the functionality I wanted :)
In addition to a standard ESPHome configuration, I have added the following lines to make the GPIO pin available through MQTT, and make it a momentary switch instead of a toggle switch, so that it briefly presses the power button and doesnât hold the power button:
switch:
- platform: gpio
pin: 25
id: powerbtn
name: "powerbtn"
restore_mode: ALWAYS_OFF
on_turn_on:
- delay: 500ms
- switch.turn_off: powerbtn
I have elided the full configuration for brevity, but you can click here to see it:
esphome:
name: poweresp
esp32:
board: pico32
framework:
type: arduino
# Enable logging
logger:
mqtt:
broker: 10.0.0.54
ota:
password: ""
wifi:
ssid: "essid"
password: "secret"
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Poweresp Fallback Hotspot"
password: "secret2"
captive_portal:
switch:
- platform: gpio
pin: 25
id: powerbtn
name: "powerbtn"
restore_mode: ALWAYS_OFF
on_turn_on:
- delay: 500ms
- switch.turn_off: powerbtn
For the first flash, I used:
docker run --rm \
-v "${PWD}":/config \
--device=/dev/ttyUSB0 \
-it \
esphome/esphome \
run poweresp.yaml
To update over the network after making changes (serial connection no longer needed), I used:
docker run --rm \
-v "${PWD}":/config \
-it \
esphome/esphome \
run poweresp.yaml
In case you want to learn more about the relevant ESPHome concepts, here are a few pointers:
use_address
To push the power button remotely from Go, Iâm using the following code:
func pushMainboardPower(mqttBroker, clientID string) error {
opts := mqtt.NewClientOptions().AddBroker(mqttBroker)
if hostname, err := os.Hostname(); err == nil {
clientID += "@" + hostname
}
opts.SetClientID(clientID)
opts.SetConnectRetry(true)
mqttClient := mqtt.NewClient(opts)
if token := mqttClient.Connect(); token.Wait() && token.Error() != nil {
return fmt.Errorf("connecting to MQTT: %v", token.Error())
}
const topic = "poweresp/switch/powerbtn/command"
const qos = 0 // at most once (no re-transmissions)
const retained = false
token := mqttClient.Publish(topic, qos, retained, string("on"))
if token.Wait() && token.Error() != nil {
return fmt.Errorf("publishing to MQTT: %v", token.Error())
}
return nil
}
I hope this small project write-up is useful to others in a similar situation!
If you need more features than that, check out the next step on the feature and complexity ladder: PiKVM or TinyPilot. See also this comparison by Jeff Geerling.
]]>Because the event is located in another country, many hours of travel away, there are a couple of scenarios where remote control of my home router can be a life-saver. For example, should my home router crash, remotely turning power off and on again gets the event back online.
But, power-cycling a machine is a pretty big hammer. For some cases, like locking yourself out with a configuration mistake, a more precise tool like a remote serial console might be nicer.
In this article, Iâll present two cheap and pragmatic DIY out-of-band management solutions that I have experimented with in the last couple of weeks and wanted to share:
You can easily start with the first variant and upgrade it into the second variant later.
Here is the architecture of the system at a glance. The right-hand side is the existing router I want to control, the left-hand side shows the out of band management system:
Letâs go through the hardware components from top to bottom.
The easiest way to have another network connection for projects like this one is the digitec iot subscription. They offer various different options, and their cheapest one, a 0.4 Mbps flatrate for 4 CHF per month, is sufficient for our use-case.
A convenient way of making the digitec iot subscription available to other devices is to use a mobile WiFi router such as the TP-Link M7350 4G/LTE Mobile Wi-Fi router (68 CHF). You can power it via USB, and it has a built-in battery that will last for a few hours.
By default, the device turns itself off after a while when it thinks it is
unused, which is undesired for us â if the smart plug drops out of the WiFi, we
donât want the whole system to go offline. You can turn off this behavior in the
web interface under Advanced â Power Saving â Power Saving Mode
.
With the out of band network connection established, all you need to remotely toggle power is a smart plug such as the Sonoff S26 WiFi Smart Plug.
The simplest setup is to connect the Smart Plug to the 4G router via WiFi, and control it using Sonoffâs mobile app via Sonoffâs cloud.
Alternatively, if you want to avoid the Sonoff cloud, the device comes with a âDIY modeâ, but the DIY mode wouldnât work reliably for me when I tried it. Instead, I flashed the Open Source Tasmota firmware and connected it to a self-hosted MQTT server via the internet.
Of course, now your self-hosted MQTT server is a single point of failure, but perhaps you prefer that over the Sonoff cloud being a single point of failure.
Turning power off and on remotely is a great start, but what if you need actual remote access to a system? In my case, Iâm using a serial port to see log messages and run a shell on my router. This is also called a âserial consoleâ, and any device that allows accessing a serial console without sitting physically in front of the serial port is called a âremote console serverâ.
Commercially available remote console servers typically offer lots of ports (up to 48) and cost lots of money (many thousand dollars or equivalent), because their target application is to be installed in a rack full of machines in a lab or data center. A few years ago, I built freetserv, an open source, open hardware solution for this problem.
For the use-case at hand, we only need a single serial console, so weâll do it with a Raspberry Pi.
The architecture for this variant looks similar to the other variant, but adds the consrv Raspberry Pi Zero 2 W and a USB-to-serial adapter:
Weâll use a Raspberry Pi Zero 2 W as our console server. While the device is a little slower than a Raspberry Pi 3 B, it is still plenty fast enough for providing a serial console, and it only consumes 0.8W of power (see gokrazy â Supported platforms for a comparison):
If the Pi Zero 2 W is not available, you can try using any other Raspberry Pi supported by gokrazy, or even an older Pi Zero with the community-supported Pi OS 32-bit kernel (I didnât test that).
Our Pi will have at least two tasks:
You can use any USB-to-serial adapter supported by Linux. Personally, I like the Adafruit FT232H adapter, which I like to re-program with FTDIâs FT_Prog so that it has a unique serial number.
In my router, I plugged in an Longshine LCS-6321M serial PCIe card to add a serial port. Before you ask: no, using USB serial consoles for the kernel console does not cut it.
Because we not only want this Raspberry Pi to be available via the Out Of Band network (via WiFi), but also on the regular home network, we need a USB ethernet adapter.
Originally I was going to use the Waveshare ETH-USB-HUB-BOX: Ethernet / USB HUB BOX for Raspberry Pi Zero Series, but it turned out to be unreliable.
Instead, Iâm now connecting a USB hub (as the Pi Zero 2 W has only one USB port), a Linksys USB3GIG network adapter I had lying around, and my USB-to-serial adapter.
Just like in the gokrazy quickstart, weâre going to create a directory for this gokrazy instance:
INSTANCE=gokrazy/consrv
mkdir -p ~/${INSTANCE?}
cd ~/${INSTANCE?}
go mod init consrv
You could now directly run gokr-packer
, but personally, I like putting the
gokr-packer
command into a
Makefile
right away:
# The consrv hostname resolves to the deviceâs Tailscale IP address,
# once Tailscale is set up.
PACKER := gokr-packer -hostname=consrv
PKGS := \
github.com/gokrazy/breakglass \
github.com/gokrazy/timestamps \
github.com/gokrazy/serial-busybox \
github.com/gokrazy/stat/cmd/gokr-webstat \
github.com/gokrazy/stat/cmd/gokr-stat \
github.com/gokrazy/mkfs \
github.com/gokrazy/wifi \
tailscale.com/cmd/tailscaled \
tailscale.com/cmd/tailscale \
github.com/mdlayher/consrv/cmd/consrv
all:
.PHONY: update overwrite
update:
${PACKER} -update=yes ${PKGS}
overwrite:
${PACKER} -overwrite=/dev/sdx ${PKGS}
For the initial install, plug the SD card into your computer, put its device
name into the overwrite
target, and run make overwrite
.
For subsequent changes, you can use make update
.
Tailscale is a peer-to-peer mesh VPN, meaning we can use it to connect to our
consrv
Raspberry Pi from anywhere in the world, without having to set up port
forwardings, dynamic DNS, or similar.
As an added bonus, Tailscale also transparently fails over between connections, so while the fast ethernet/fiber connection works, Tailscale uses that, otherwise it uses the Out Of Band network.
Follow the gokrazy guide on Tailscale to include the device in your Tailscale mesh VPN.
Setup WiFi:
mkdir -p extrafiles/github.com/gokrazy/wifi/etc
cat '{"ssid": "oob", "psk": "secret"}' \
> extrafiles/github.com/gokrazy/wifi/etc/wifi.json
consrv
should use the Out Of Band mobile uplink to reach the internet. At the
same time, it should still be usable from my home network, too, to make gokrazy
updates go quickly.
We accomplish this using route priorities.
I arranged for the WiFi interface to have higher route priority (5) than the
ethernet interface (typically 1, but 11 in our setup thanks to the
-extra_route_priority=10
flag):
mkdir -p flags/github.com/gokrazy/gokrazy/cmd/dhcp
echo '-extra_route_priority=10' \
> flags/github.com/gokrazy/gokrazy/cmd/dhcp/flags.txt
make update
Now, tailscale netcheck
shows an IPv4 address belonging to Sunrise, the mobile
network provider behind the digitec iot subscription.
consrv
is an SSH serial console server
written in Go that Matt Layher and I developed. If youâre curious, you can watch
the two of us creating it in this twitch stream recording:
The installation of consrv
consists of two steps.
Step 1 is done: we already included consrv
in the Makefile
earlier in
gokrazy setup.
So, we only need to configure the desired serial ports in consrv.toml
(in
gokrazy extrafiles):
mkdir -p extrafiles/github.com/mdlayher/consrv/cmd/consrv/etc/consrv
cat > extrafiles/github.com/mdlayher/consrv/cmd/consrv/etc/consrv/consrv.toml <<'EOT'
[server]
address = ":2222"
[[devices]]
serial = "01716A92"
name = "router7"
baud = 115200
logtostdout = true
[[identities]]
name = "michael"
public_key = "ssh-ed25519 AAAAC3⊠michael@midna"
EOT
Run make update
to deploy the configuration to your device.
If everything is set up correctly, we can now start a serial console session via SSH:
midna% ssh -p 2222 router7@consrv.lan
Warning: Permanently added '[consrv.lan]:2222' (ED25519) to the list of known hosts.
consrv> opened serial connection "router7": path: "/dev/ttyUSB0", serial: "01716A92", baud: 115200
2022/06/19 20:50:47 dns.go:175: probe results: [{upstream: [2001:4860:4860::8888]:53, rtt: 999.665”s} {upstream: [2001:4860:4860::8844]:53, rtt: 2.041079ms} {upstream: 8.8.8.8:53, rtt: 2.073279ms} {upstream: 8.8.4.4:53, rtt: 16.200959ms}]
[âŠ]
Iâm using the logtostdout
option to make consrv
continuously read the serial
port and send it to stdout
, which gokrazy in turn sends via remote
syslog to the gokrazy syslog
daemon, running on another machine. You
could also run it on the same machine if you want to log to file.
You can use breakglass
to
interactively log into your gokrazy installation.
If you flashed your Smart Plug with Tasmota, you can easily turn power on from a
breakglass shell by directly calling Tasmotaâs HTTP API with curl
:
% breakglass consrv
consrv# curl -v -X POST --data 'cmnd=power on' http://tasmota_68462f-1583/cm
The original Sonoff firmware offers a DIY mode which should also offer an HTTP API, but the DIY mode did not work in my tests. Hence, Iâm only describing how to do it with Tasmota.
Personally, I like having the Smart Plug available both on the local network (via Tasmotaâs HTTP API) and via the internet with an external MQTT server. That way, even if either option fails, I still have a way to toggle power remotely.
But, maybe you want to obtain usage stats by listening to MQTT or similar, and you donât want to use an extra server for this. In that situation, you can easily run a local MQTT server on your Pi.
In the gokrazy Makefile
, add
github.com/fhmq/hmq
to the list of packages to
install, and configure Tasmota to connect to consrv
on port 1883.
To check that everything is working, use mosquitto_sub
from another machine:
midna% mosquitto_sub --verbose -h consrv.monkey-turtle.ts.net -t '#'
digitecâs IOT mobile internet subscription makes remote power management delightfully easy with a smart plug and 4G WiFi router, and affordable enough. The subscription is flexible enough that you can decide to only book it while youâre traveling.
We can elevate the whole setup in functionality (but also complexity) by combining Tailscale, consrv and gokrazy, running on a Raspberry Pi Zero 2 W, and connecting a USB-to-serial adapter.
If you need more features than that, check out the next step on the feature and complexity ladder: PiKVM or TinyPilot. See also this comparison by Jeff Geerling.
The first USB ethernet adapter I tried was the Apple USB Ethernet Adapter.
Unfortunately, after a few days of uptime, I experienced the following kernel
driver crash (with the asix
Linux driver), and the link remained down until I
rebooted.
I then switched to a Linksys
USB3GIG network adapter
(supported by the r8152
Linux driver) and did not see any problems with that
so far.
dwc2 3f980000.usb: dwc2_hc_chhltd_intr_dma: Channel 5 - ChHltd set, but reason is unknown
dwc2 3f980000.usb: hcint 0x00000002, intsts 0x04600009
dwc2 3f980000.usb: dwc2_update_urb_state_abn(): trimming xfer length
asix 1-1.4:1.0 eth0: Failed to read reg index 0x0000: -71
------------[ cut here ]------------
WARNING: CPU: 1 PID: 7588 at drivers/net/phy/phy.c:942 phy_error+0x10/0x58
Modules linked in: brcmfmac brcmutil
CPU: 1 PID: 7588 Comm: kworker/u8:2 Not tainted 5.18.3 #1
Hardware name: Raspberry Pi Zero 2 W Rev 1.0 (DT)
Workqueue: events_power_efficient phy_state_machine
pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : phy_error+0x10/0x58
lr : phy_state_machine+0x258/0x2b0
sp : ffff800009fe3d40
x29: ffff800009fe3d40 x28: 0000000000000000 x27: ffff6c7ac300c078
x26: ffff6c7ac300c000 x25: ffff6c7ac4390000 x24: 00000000ffffffb9
x23: 0000000000000004 x22: ffff6c7ac4019cd8 x21: ffff6c7ac4019800
x20: ffffce5c97f6f000 x19: ffff6c7ac4019800 x18: 0000000000000010
x17: 0000000400000000 x16: 0000000000000000 x15: 0000000000001007
x14: ffff800009fe3810 x13: 00000000ffffffea x12: 00000000fffff007
x11: fffffffffffe0290 x10: fffffffffffe0240 x9 : ffffce5c988e1018
x8 : c0000000fffff007 x7 : 00000000000000a8 x6 : ffffce5c98889280
x5 : 0000000000000268 x4 : ffff6c7acf392b80 x3 : ffff6c7ac4019cd8
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff6c7ac4019800
Call trace:
phy_error+0x10/0x58
phy_state_machine+0x258/0x2b0
process_one_work+0x1e4/0x348
worker_thread+0x48/0x418
kthread+0xf4/0x110
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
asix 1-1.4:1.0 eth0: Link is Down
With rsync up and running, itâs time to take a peek under the hood of rsync to better understand how it works.
When talking about the rsync protocol, we need to distinguish between:
All roles can be mixed and matched: both rsync clients (or servers!) can either send or receive.
Now that you know the terminology, letâs take a high-level look at the rsync
protocol. Weâll look at protocol version 27, which is older but simpler, and
which is the most widely supported protocol version, implemented by openrsync
and other third-party implementations:
The rsync protocol can be divided into two phases:
In the first phase, the sender walks the local file tree to generate and send the file list to the receiver. The file list must be transferred in full, because both sides sort it by filename (later rsync protocol versions eliminate this synchronous sorting step).
In the second phase, concurrently:
The architecture makes it easy to implement the second phase in 3 separate processes, each of which sending to the network as fast as possible using heavy pipelining. This results in utilizing the available hardware resources (I/O, CPU, network) on sender and receiver to the fullest.
When starting an rsync transfer, looking at the resource usage of both machines allows us to confirm our understanding of the rsync architecture, and to pin-point any bottlenecks:
(Again, the above was captured using rsync protocol version 27, later rsync protocol versions donât synchronize after completing phase 1, but instead interleave the phases more.)
Up until now, we have described the rsync protocol at a high level. Letâs zoom into the hash search step, which is what many people might associate with the term ârsync algorithmâ.
When a file exists on both sides, rsync sender and receiver, the receiver first divides the file into blocks. The block size is a rounded square root of the fileâs length. The receiver then sends the checksums of all blocks to the sender. In response, the sender finds matching blocks in the file and sends only the data needed to reconstruct the file on the receiver side.
Specifically, the sender goes through each byte of the file and tries to match existing receiver content. To make this less computationally expensive, rsync combines two checksums.
rsync first calculates what it calls the
âsum1â,
or âfast signatureâ. This is a small checksum (two uint16
) that can be
calculated with minimal effort for a rolling window over the file data. tridge
rsync comes with SIMD
implementations
to further speed this up where possible.
Only if the sum1 matches will âsum2â (or âstrong signatureâ) be calculated, a 16-byte MD4 hash. Newer protocol versions allow negotiating the hash algorithm and support the much faster xxhash algorithms.
If sum2 matches, the block is considered equal on both sides.
Hence, the best case for rsync is when a file has either not changed at all, or shares as many full blocks of content as possible with the old contents.
Now that we know how rsync works on the file level, letâs take a step back to the data set level.
The easiest situation is when you transfer a data set that is not currently changing. But what happens when the data set changes while your rsync transfer is running? Here are two examples.
debiman, the manpage generator powering manpages.debian.org is running on a Debian VM on which an rsync job periodically transfers the static manpage archive to different static web servers across the world. The rsync job and debiman are not sequenced in any way. Instead, debiman is careful to only ever atomically swap out files in its output directory, or add new files before it swaps out an updated index.
The second example, the PostgreSQL
database management system, is the opposite situation: instead of having full
control over how files are laid out, here I donât have control over how files
are written (this generalizes to any situation where the model of only ever
replacing files is not feasible). The data files which my Postgres installation
keeps on disk are not great to synchronize using rsync: they are large and
frequently change. Instead, I now exempt them from my rsync transfer and use pg_dump(1)
to create a snapshot of my databases instead.
To confirm rsyncâs behavior regarding changing data sets in detail, I modified rsync to ask for confirmation between generating the file list and transferring the files. Hereâs what I found:
rsyncprom
monitoring wrapper offers a flag to treat exit code 24 like exit
code
0,
because depending on the data set, vanishing files are expected.Another way of phrasing the above is that data consistency is not something that rsync can in any way guarantee. Itâs up to you to either live with the inconsistency (often a good-enough strategy!), or to add an extra step that ensures the data set you feed to rsync is consistent.
The fourth article in this series is rsync, article 4: My own rsync implementation (To be published.)
For verifying rsyncâs behavior with regards to changing data sets, I checked out the following version:
% git clone https://github.com/WayneD/rsync/ rsync-changing-data-sets
% cd rsync-changing-data-sets
% git checkout v3.2.4
% ./configure
% make
Then, I modified flist.c
to add a confirmation step between sending the file
list and doing the actual file transfers:
diff --git i/flist.c w/flist.c
index 1ba306bc..98981f34 100644
--- i/flist.c
+++ w/flist.c
@@ -20,6 +20,8 @@
* with this program; if not, visit the http://fsf.org website.
*/
+#include <stdio.h>
+
#include "rsync.h"
#include "ifuncs.h"
#include "rounding.h"
@@ -2516,6 +2518,17 @@ struct file_list *send_file_list(int f, int argc, char *argv[])
if (DEBUG_GTE(FLIST, 2))
rprintf(FINFO, "send_file_list done\n");
+ char *line = NULL;
+ size_t llen = 0;
+ ssize_t nread;
+ printf("file list sent. enter 'yes' to continue: ");
+ while ((nread = getline(&line, &llen, stdin)) != -1) {
+ if (nread == strlen("yes\n") && strcasecmp(line, "yes\n") == 0) {
+ break;
+ }
+ printf("enter 'yes' to continue: ");
+ }
+
if (inc_recurse) {
send_dir_depth = 1;
add_dirs_to_tree(-1, flist, stats.num_dirs);
My rsync invocation is:
./rsync -av --debug=all4 --protocol=27 ~/i3/src /tmp/DEST/
Itâs necessary to use an older protocol version to make rsync generate a full file list before starting the transfer. Later protocol versions interleave these parts of the protocol.
]]>Now that we know what to use rsync for, how can we best integrate rsync into monitoring and alerting, and on which operating systems does it work?
Once you have one or two important rsync
jobs, it might make sense to alert
when your job has not completed as expected.
Iâm using Prometheus for all my monitoring and alerting.
Because Prometheus pulls metrics from its (typically always-running) targets,
we need an extra component: the Prometheus
Pushgateway. The Pushgateway
stores metrics pushed by short-lived jobs like rsync
transfers and makes them
available to subsequent Prometheus pulls.
To integrate rsync
with the Prometheus Pushgateway, I wrote
rsyncprom
, a small tool that wraps
rsync
, or parses rsync output supplied by you. Once rsync
completes,
rsyncprom
pushes the rsync exit code and parsed statistics about the transfer
to your Pushgateway.
First, I set up the Prometheus Pushgateway (via Docker and systemd) on my server.
Then, in my prometheus.conf
file, I instruct Prometheus to pull data from my
Pushgateway:
# prometheus.conf
rule_files:
- backups.rules.yml
scrape_configs:
# [âŠ]
- job_name: pushgateway
honor_labels: true
static_configs:
- targets: ['pushgateway:9091']
Finally, in backups.rules.yml
, I configure an alert on the time series rsync_exit_code
:
# backups.rules.yml
groups:
- name: backups.rules
rules:
- alert: RsyncFailing
expr: rsync_exit_code{job="rsync"} > 0
for: 1m
labels:
job: rsync
annotations:
description: rsync {{ $labels.instance }} is failing
summary: rsync {{ $labels.instance }} is failing
This alert will fire any time an rsync job monitored via rsyncprom
exits with
a non-zero exit code.
On each machine that runs rsync
jobs I want to monitor, I first install
rsyncprom
:
go install github.com/stapelberg/rsyncprom/cmd/rsync-prom@latest
Then, I just wrap rsync
transfers where itâs most convenient, for example in
my crontab(5)
:
# crontab -e
9 9 * * * /home/michael/go/bin/rsync-prom --job="cron" --instance="gphotos-sync@midna" -- /home/michael/gphotos-sync/sync.sh
The same wrapper technique works in shell scripts or systemd service files.
You can also provide rsync
output from Go
code
(this example runs rsync
via SSH).
Hereâs how the whole setup looks like architecturally:
The rsync scheduler runs on a Raspberry Pi running
gokrazy. The scheduler invokes the rsync
job to back
up websrv.zekjur.net via SSH and sends the output to Prometheus, which is
running on a (different) server at an ISP.
The grafana dashboard looks like this in action:
Now that we have learnt about a couple of typical use-cases, where can you use
rsync
to implement these use-cases? The answer is: in most environments, as
rsync
is widely available on different Linux and BSD versions.
Macs come with rsync
available by default (but itâs an old, patched version),
and OpenBSD comes with a BSD-licensed implementation called
openrsync by default.
On Windows, you can use the Windows Subsystem for Linux.
Operating System | Implementation | Version |
---|---|---|
FreeBSD 13.1 (ports) | tridge | 3.2.3 |
OpenBSD 7.1 | openrsync | (7.1) |
OpenBSD 7.1 (ports) | tridge | 3.2.4 |
NetBSD 9.2 (pkgsrc) | tridge | 3.2.4 |
Linux | tridge | repology |
macOS | tridge | 2.6.9 |
The third article in this series is rsync, article 3: How does rsync work?. With rsync up and running, itâs time to take a peek under the hood of rsync to better understand how it works.
]]>To motivate why it makes sense to look at rsync, I present three scenarios for which I have come to appreciate rsync: DokuWiki transfers, Software deployment and Backups.
Recently, I set up a couple of tools for a website that is built on DokuWiki, such as a dead link checker and a statistics program. To avoid overloading the live website (and possibly causing spurious requests that interfere with statistics), I decided it would be best to run a separate copy of the DokuWiki installation locally. This requires synchronizing:
A DokuWiki installation is exactly the kind of file tree that scp(1)
cannot efficiently transfer (too many small files),
but rsync(1)
can! The rsync
transfer only takes a few seconds, no matter if
itâs a full download (can be simpler for batch jobs) or an incremental
synchronization (more efficient for regular synchronizations like backups).
For smaller projects where I donât publish new versions through Docker, I instead use a shell script to transfer and run my software on the server.
rsync
is a great fit here, as it transfers many small files (static assets and
templates) efficiently, only transfers the binaries that actually changed, and
doesnât mind if the binary file itâs uploading is currently running (contrary to
scp(1)
, for example).
To illustrate how such a script could look like, hereâs my push script for Debian Code Search:
#!/bin/zsh
set -ex
# Asynchronously transfer assets while compiling:
(
ssh root@dcs 'for i in $(seq 0 5); do mkdir -p /srv/dcs/shard${i}/{src,idx}; done'
ssh root@dcs "adduser --disabled-password --gecos 'Debian Code Search' dcs || true"
rsync -r systemd/ root@dcs:/etc/systemd/system/ &
rsync -r cmd/dcs-web/templates/ root@dcs:/srv/dcs/templates/ &
rsync -r static/ root@dcs:/srv/dcs/static/ &
wait
) &
# Compile a new Debian Code Search version:
tmp=$(mktemp -d)
mkdir $tmp/bin
GOBIN=$tmp/bin \
GOAMD64=v3 \
go install \
-ldflags '-X github.com/Debian/dcs/cmd/dcs-web/common.Version=$version' \
github.com/Debian/dcs/cmd/...
# Transfer the Debian Code Search binaries:
rsync \
$tmp/bin/dcs-{web,source-backend,package-importer,compute-ranking,feeder} \
$tmp/bin/dcs \
root@dcs:/srv/dcs/bin/
# Wait for the asynchronous asset transfer to complete:
wait
# Restart Debian Code Search on the server:
UNITS=(dcs-package-importer.service dcs-source-backend.service dcs-compute-ranking.timer dcs-web.service)
ssh root@dcs systemctl daemon-reload \&\& \
systemctl enable ${UNITS} \; \
systemctl reset-failed ${UNITS} \; \
systemctl restart ${UNITS} \; \
systemctl reload nginx
rm -rf "${tmp?}"
The first backup system I used was bacula, which Wikipedia describes as an enterprise-level backup system. That certainly matches my impression, both in positive and negative ways: while bacula is very powerful, some seemingly common operations turn out quite complicated in bacula. Restoring a single file or directory tree from a backup was always more effort than I thought reasonable. For some reason, I often had to restore backup catalogs before I was able to access the backup contents (I donât remember the exact details).
When moving apartment last time, I used the opportunity to change my backup
strategy. Instead of using complicated custom software with its own volume file
format (like bacula), I wanted backed-up files to be usable on the file system
level with standard tools like rm
, ls
, cp
, etc.
Working with files in a regular file system makes day-to-day usage easier, and also ensures that when my network storage hardware dies, I can just plug the hard disk into any PC, boot a Linux live system, and recover my data.
To back up machines onto my network storage PCâs file system, I ended up with a hand-written rsync wrapper script that copies the full file system of each machine into dated directory trees:
storage2# ls -l backup/midna/2022-05-27
bin boot etc home lib lib64 media opt
proc root run sbin sys tmp usr var
storage2# ls -l backup/midna/2022-05-27/home/michael/configfiles/zshrc
-rw-r--r--. 7 1000 1000 14554 May 9 19:37 backup/midna/2022-05-27/home/michael/configfiles/zshrc
To revert my ~/.zshrc
to an older version, I can scp(1)
the file:
midna% scp storage2:/srv/backup/midna/2022-05-27/home/michael/configfiles/zshrc ~/configfiles/zshrc
To compare a whole older source tree, I can mount it using sshfs(1)
:
midna% mkdir /tmp/2022-05-27-i3
midna% sshfs storage2:/srv/backup/midna/2022-05-27/$HOME/i3 /tmp/2022-05-27-i3
midna% diff -ur /tmp/2022-05-27-i3 ~/i3/
Of course, the idea is not to transfer the full machine contents every day, as
that would quickly fill up my network storageâs 16 TB disk! Instead, we can use
rsyncâs --link-dest
option to elegantly deduplicate files using file system
hard links:
backup/midna/2022-05-26
backup/midna/2022-05-27 # rsync --link-dest=2022-05-26
To check the de-duplication level, we can use du(1)
,
first on a single directory:
storage2# du -hs 2022-05-27
113G 2022-05-27
âŠand then on two subsequent directories:
storage2# du -hs 2022-05-25 2022-05-27
112G 2022-05-25
7.3G 2022-05-27
As you can see, the 2022-05-27 backup took 7.3 GB of disk space, and 104.7 GB were re-used from the previous backup(s).
To print all files which have changed since the last backup, we can use:
storage2# find 2022-05-27 -type f -links 1 -print
A significant limitation of backups at the file level is that the destination file system (network storage) needs to support all the file system features used on the machines you are backing up.
For example, if you use POSIX
ACLs or Extended
attributes
(possibly for Capabilities or
SELinux), you need to ensure that
your backup file system has these features enabled, and that you are using rsync(1)
âs --xattrs
(or -X
for short) option.
This can turn from a pitfall into a dealbreaker as soon as multiple operating
systems are involved. For example, the rsync
version on macOS has
Apple-specific
code
to work with Apple resource forks
and other extended attributes. Itâs not clear to me whether macOS rsync
can
send files to Linux rsync
, restore them, and end up with the same system state.
Luckily, I am only interested in backing up Linux systems, or merely home directories of non-Linux systems, where no extended attributes are used.
The biggest downside of this architecture is that working with the directory
trees in bulk can be very slow, especially when using a hard disk instead of an
SSD. For example, deleting old backups can easily take many hours to multiple
days (!). Sure, you can just let the rm
command run in the background, but
itâs annoying nevertheless.
Even merely calculating the disk space usage of each directory tree is a painfully slow operation. I tried using stateful disk usage tools like duc, but it didnât work reliably on my backups.
In practice, I found that for tracking down large files, using ncdu(1)
on any recent backup typically quickly shows the
large file. In one case, I found var/lib/postgresql
to consume many
gigabytes. I excluded it in favor of using pg_dump(1)
, which resulted in much smaller backups!
Unfortunately, even when using an SSD, determining which files take up most space of a full backup takes a few minutes:
storage2# time du -hs backup/midna/2022-06-09
742G backup/midna/2022-06-09
real 8m0.202s
user 0m11.651s
sys 2m0.731s
To transfer data via rsync
from the backup host to my network storage, Iâm
using SSH.
Each machineâs SSH access is restricted in my network storageâs SSH authorized_keys(5)
config file to not allow arbitrary
commands, but to perform just a specific operation. The only allowed operation
in my case is running rrsync
(ârestricted rsyncâ) in a container whose file
system only contains the backup hostâs sub directory, e.g. .websrv.zekjur.net
:
command="/bin/docker run --log-driver none -i -e SSH_ORIGINAL_COMMAND -v /srv/backup/websrv.zekjur.net:/srv/backup/websrv.zekjur.net stapelberg/docker-rsync /srv/backup/websrv.zekjur.net",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAAC3âŠ
(The corresponding Dockerfile
can be found in my Gigabit NAS
article.)
To trigger such an SSH-protected rsync
transfer remotely, Iâm using a small
custom scheduling program called
dornröschen. The
program arranges for all involved machines to be powered on (using
Wake-on-LAN) and then starts
rsync
via another operation-restricted SSH connection.
You could easily replace this with a cron job if you donât care about WOL.
The architecture looks like this:
The operation-restricted SSH connection on each backup host is configured in
SSHâs authorized_keys(5)
config file:
command="/root/backup-remote.pl",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAAC3âŠ
The second article in this series is rsync, article 2: Surroundings. Now that we know what to use rsync for, how can we best integrate rsync into monitoring and alerting, and on which operating systems does it work?
]]>Over time, I found rsync useful in more and more cases, and would recommend every computer user put this great tool into their toolbox đ 𧰠!
Iâm publishing a series of blog posts about rsync:
I found a Mellanox ConnectX-4 Lx for the comparatively low price of 204 CHF on digitec:
To connect it to my router, I ordered a MikroTik XS+DA0003 SFP28/SFP+ Direct Attach Cable (DAC) with it. I installed the network card into my old workstation (on the right) and connected it with the 25 Gbit/s DAC to router7 (on the left):
Component | Model |
---|---|
Mainboard | ASRock B550 Taichi |
CPU | AMD Ryzen 5 5600X 6-Core Processor |
Network card | Intel XXV710 |
Linux | Linux 5.17.4 (router7) curl 7.83.0 from debian bookworm Go net/http from Go 1.18 |
router7 comes with TCP BBR enabled by default.
Component | Model |
---|---|
Mainboard | ASUS PRIME Z370-A |
CPU | Intel i9-9900K CPU @ 3.60GHz |
Network card | Mellanox ConnectX-4 |
Linux | 5.17.5 (Arch Linux) nginx 1.21.6 caddy 2.4.3 |
Before taking any measurements, I do one full download so that the file contents are entirely in the Linux page cache, and the measurements therefore no longer contain the speed of the disk.
big.img
in the tests below refers to the 35 GB test file Iâm downloading,
which consists of distri-disk.img repeated 5 times.
The simplest test is using just a single TCP connection, for example:
curl -v -o /dev/null http://oldmidna:8080/distri/tmp/big.img
./httpget25 http://oldmidna:8080/distri/tmp/big.img
Client | Server | Gbit/s |
---|---|---|
curl | nginx | |
curl | caddy | |
Go | nginx | |
Go | caddy |
curl can saturate a 25 Gbit/s link without any trouble.
The Go net/http
package is slower and comes in at 20 Gbit/s.
Running 4 of these downloads concurrently is a reliable and easy way to saturate a 25 Gbit/s link:
for i in $(seq 0 4)
do
curl -v -o /dev/null http://oldmidna:8080/distri/tmp/big.img &
done
Client | Server | Gbit/s |
---|---|---|
curl | nginx | |
curl | caddy | |
Go | nginx | |
Go | caddy |
At link speeds this high, enabling TLS slashes bandwidth in half or worse.
Using 4 TCP connections allows saturating a 25 Gbit/s link.
Caddy uses more CPU to serve files compared to nginx.
This test works the same as T1.1, but with a HTTPS URL:
curl -v -o /dev/null --insecure https://oldmidna:8443/distri/tmp/big.img
./httpget25 https://oldmidna:8443/distri/tmp/big.img
Client | Server | Gbit/s |
---|---|---|
curl | nginx | |
curl | caddy | |
Go | nginx | |
Go | caddy |
This test works the same as T1.2, but with a HTTPS URL:
for i in $(seq 0 4)
do
curl -v -o /dev/null --insecure https://oldmidna:8443/distri/tmp/big.img &
done
Curiously, the Go net/http
client downloading from caddy cannot saturate a 25
Gbit/s link.
Client | Server | Gbit/s |
---|---|---|
curl | nginx | |
curl | caddy | |
Go | nginx | |
Go | caddy |
Linux 4.13 got support for Kernel TLS back in 2017.
nginx 1.21.4 introduced support for Kernel TLS, and they have a blog post on how to configure it.
In terms of download speeds, there is no difference with or without KTLS. But, enabling KTLS noticeably reduces CPU usage, from â10% to a steady 2%.
For even newer network cards such as the Mellanox ConnectX-6, the kernel can even offload TLS onto the network card!
Client | Server | Gbit/s |
---|---|---|
curl | nginx | |
Go | nginx |
Client | Server | Gbit/s |
---|---|---|
curl | nginx | |
Go | nginx |
When downloading from nginx with 1 TCP connection, with TLS encryption enabled
(HTTPS), the Go net/http
client is faster than curl!
Caddy is slightly slower than nginx, which manifests itself in slower speeds
with curl and even slower speeds with Goâs net/http
.
To max out 25 Gbit/s, even when using TLS encryption, just use 3 or more connections in parallel. This helps with HTTP and HTTPS, with any combination of client and server.
net/http
test program httpget25.go
package main
import (
"crypto/tls"
"flag"
"fmt"
"io"
"io/ioutil"
"log"
"net/http"
)
func httpget25() error {
http.DefaultTransport.(*http.Transport).TLSClientConfig = &tls.Config{InsecureSkipVerify: true}
for _, arg := range flag.Args() {
resp, err := http.Get(arg)
if err != nil {
return err
}
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("unexpected HTTP status code: want %v, got %v", http.StatusOK, resp.Status)
}
io.Copy(ioutil.Discard, resp.Body)
}
return nil
}
func main() {
flag.Parse()
if err := httpget25(); err != nil {
log.Fatal(err)
}
}
Caddyfile
{
local_certs
http_port 8080
https_port 8443
}
http://oldmidna:8080 {
file_server browse
}
https://oldmidna:8443 {
file_server browse
}
mkdir -p ~/lab25
cd ~/lab25
wget https://nginx.org/download/nginx-1.21.6.tar.gz
tar tf nginx-1.21.6.tar.gz
wget https://www.openssl.org/source/openssl-3.0.3.tar.gz
tar xf openssl-3.0.3.tar.gz
cd nginx-1.21.6
./configure --with-http_ssl_module --with-http_v2_module --with-openssl=$HOME/lab25/openssl-3.0.3 --with-openssl-opt=enable-ktls
make -j8
cd objs
./nginx -c nginx.conf -p $HOME/lab25
nginx.conf
worker_processes auto;
pid logs/nginx.pid;
daemon off;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
access_log /home/michael/lab25/logs/access.log combined;
sendfile on;
sendfile_max_chunk 2m;
keepalive_timeout 65;
server {
listen 8080;
listen [::]:8080;
server_name localhost;
root /srv/repo.distr1.org/;
location / {
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
location /distri {
autoindex on;
}
}
server {
listen 8443 ssl;
listen [::]:8443 ssl;
server_name localhost;
ssl_certificate nginx-ecc-p256.pem;
ssl_certificate_key nginx-ecc-p256.key;
#ssl_conf_command Options KTLS;
ssl_buffer_size 32768;
ssl_protocols TLSv1.3;
root /srv/repo.distr1.org/;
location / {
index index.html index.htm;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root /usr/share/nginx/html;
}
location /distri {
autoindex on;
}
}
}
(Feel free to skip right to the 25 Gbit/s announcement section, but I figured this would be a good point to reflect on the last 20 years of internet connections for me!)
The first internet connection that I consciously used was a symmetric DSL connection that my dad (â 2020) shared between his home office and the rest of the house, which was around the year 2000. My dad was an early adopter and was connected to the internet well before then using dial up connections, but the SDSL connection in our second house was the first connection I remember using myself. It wasnât particularly fast in terms of download speed â I think it delivered 256 kbit/s or something along those lines.
I encountered two surprises with this internet connection. The first surprise was that the upload speed (also 256 kbit/s â it was a symmetric connection) was faster than other peopleâs. At the time, even DSL connections with much higher download speeds were asymmetric (ADSL) and came with only 128 kbit/s upload. I learnt this while making first contact with file sharing: people kept asking me to stay online so that their transfers would complete more quickly.
The second surprise was the concept of a metered connection, specifically one where you pay more the more data you transfer. During the aforementioned file sharing experiments, it never crossed my mind that down- or uploading files could result in extra charges.
These two facts combined resulted in a 3000 ⏠surprise bill for my dad!
Luckily, his approach to solve this problem wasnât to restrict my internet usage, but rather to buy a cheap, separate ADSL flatrate line for the family (from Telekom, which he hated), while he kept the good SDSL metered line for his business.
I still vividly remember the first time that ADSL connection synchronized. It was a massive upgrade in download speed (768 kbit/s!), but a downgrade in upload speed (128 kbit/s). But, because it was a flatrate, it made possible new use cases for my dad, who would jump on this opportunity to download a number of CD images to upgrade the software of his SGI machines.
The different connection speeds and characteristics have always interested me, and I used several other connections over the years, all of which felt limiting. The ADSL connection at my parentâs place started at 1 Mbit/s, was upgraded first to 3 Mbit/s, then 6 Mbit/s, and eventually reached its limit at 16 Mbit/s. When I spent one semester in Ireland, I had a 9 Mbit/s ADSL connection, and then later in ZĂŒrich I started out with a 15 Mbit/s ADSL connection.
All of these connections have always felt limiting, like peeking through the keyhole to see a rich world behind, but not being able to open the door. Weâve had to set up (and tune) traffic shaping, and coordinate when large downloads were okay.
The dream was always to leave ADSL behind and get a fiber connection. The advantages are numerous: lower latency (ADSL came with 40 ms at the time), much higher bandwidth (possibly Gigabit/s?) and typically the connection was established via ethernet (instead of PPPoE). Most importantly, once the fiber is there, you can upgrade both ends to achieve higher speeds.
In ZĂŒrich, I managed to get a fiber connection set up in my apartment after fighting bureaucracy for many months. The issue was that there was no permission slip on file at Swisscom. Either the owner of my apartment never signed it to begin with, or it got lost. This is not a state that the online fiber availability checker can represent, but once you know it, the fix is easy: just have Swisscom send out the form again, have the owner sign it, and a few weeks later, you can order!
One wrinkle was that availability was only fixed in the Swisscom checker, and it was unclear when EWZ or other providers would get an updated data dump. Hence, I ordered Swisscom fiber to get things moving as quick as possible, and figured I could switch to a different provider later.
Hereâs a picture of when the electrician pulled the fiber from the building entry endpoint (BEP) in the basement into my flat, from March 2014:
Only two months after I first got my fiber connection, init7 launched their fiber7 offering, and I switched from Swisscom to fiber7 as quickly as I could.
The switch was worth it in every single dimension:
I have been very happy with my fiber7 connection ever since. What I wrote in 2014 regarding its performance remained true over the years â downloads were always fast for me, latencies were low, outages were rare (and came with good explanations).
I switched hardware multiple times over the years:
Notably, init7 encourages people to use their preferred router (Router Freedom).
Over the years, other Swiss internet providers such as Swisscom and Salt introduced 10 Gbit/s offerings, so an obvious question was when init7 would follow suit.
People who were following init7 closely already knew that an infrastructure upgrade was coming. In 2020, init7 CEO Fredy KĂŒnzler disclosed that in 2021, init7 would start offering 10 Gbit/s.
What nobody expected before init7 announced it on their seventh birthday, however, was that init7 started offering not only 10 Gbit/s (Fiber7-X), but also 25 Gbit/s connections (Fiber7-X2)! đ€Ż
This was init7âs announcement on Twitter:
FĂŒnfundzwanzig ist das neue #Gigabit.
— Init7 (AS13030) (@init7) May 25, 2021
Sieben Jahre nach dem Launch von #Fiber7 zĂŒnden wir die nĂ€chste Stufe đ - Fiber7-X (10Gbps) und Fiber7-X2 (25Gbps) - zum selben Preis: CHF 777 pro Jahr.
Unsere Medienmitteilung: https://t.co/UnnWTexcD0 #MaxFix #FTTH #Glasfaser
With this move, init7 has done it again: they introduced an offer that is better than anything else in the Swiss internet market, perhaps even world-wide!
One interesting aspect is init7âs so-called «MaxFix principle»: maximum speed for a fixed price. No matter if youâre using 1 Gbit/s or 25 Gbit/s, you pay the same monthly fee. init7âs approach is to make the maximum bandwidth available to you, limited only by your physical connection. This is such a breath of fresh air compared to other ISPs that think rate-limiting customers to ridiculously low speeds is somehow acceptable on an FTTH offering đ (recent example).
If youâre curious about the infrastructure upgrade that enabled this change, check out init7âs blog post about their new POP infrastructure.
A common first reaction to fast network connections is the question: âFor what do you need so much bandwidth?â
Interestingly enough, I heard this question as recently as last year, in the context of a Gigabit internet connection! Some people canât imagine using more than 100 Mbit/s. And sure, from a certain perspective, I get it â that 100 Mbit/s connection will not be overloaded any time soon.
But, looking at when a line is overloaded is only one aspect to take into account when deciding how fast of a connection you want.
There is a lower limit where you notice your connection is slow. Back in 2014, a 2 Mbit/s connection was noticeably slow for regular web browsing. These days, even a 10 Mbit/s connection is noticeably slow when re-opening my browser and loading a few tabs in parallel.
So what should you get? A 100 Mbit/s line? 500 Mbit/s? 1000 Mbit/s? Personally, I like to not worry about it and just get the fastest line I can, to reduce any and all wait times as much as possible, whenever possible. Itâs a freeing feeling! Here are a few specific examples:
Aside from my distaste for waiting, a fast and reliable fiber connection enables self-hosting. In particular for my distri Linux project where I explore fast package installation, itâs very appealing to connect it to the internet on as fast a line as possible. I want to optimize all the parts: software architecture and implementation, hardware, and network connectivity. But, for my hobby project budget, getting even a 10 Gbit/s line at a server hoster is too expensive, let alone a 25 Gbit/s line!
Lastly, even if there isnât really a need to have such a fast connection, I hope you can understand that after spending so many years of my life limited by slow connections, that Iâll happily take the opportunity of a faster connection whenever I can. Especially at no additional monthly cost!
Right after the announcement dropped, I wanted to prepare my side of the connection and therefore ordered a MikroTik CCR2004, the only router that init7 lists as compatible. I returned the MikroTik CCR2004 shortly afterwards, mostly because of its annoying fan regulation (spins up to top speed for about 1 minute every hour or so), and also because MikroTik seems to have made no progress at all since I last used their products almost 10 years ago. Table-stakes features such as DNS resolution for hostnames within the local network are still not included!
I expect that more and more embedded devices with SFP28 slots (like the MikroTik CCR2004) will become available over the next few years (hopefully with better fan control!), but at the moment, the selection seems to be rather small.
For my router, I instead went with a custom PC build. Having more space available means I can run larger, slow-spinning fans that are not as loud. Plugging in high-end Intel network cards (2 Ă 25 Gbit/s, and 4 Ă 10 Gbit/s on the other one) turns a PC into a 25 Gbit/s capable router.
With my equipment sorted out, I figured it was time to actually place the order. I wasnât in a hurry to order, because it was clear that it would be months before my POP could be upgraded. But, it canât hurt to register my interest (just in case it influences the POP upgrade plan). Shortly after, I got back this email from init7 where they promised to send me the SFP module via post:
And sure enough, a few days later, I received the SFP28 module in the mail:
With my router build, and the SFP28 module, I had everything I needed for my side of the connection.
The other side of the connection was originally planned to be upgraded in fall 2021, but the global supply shortage imposed various delays on the schedule.
Eventually, the fiber7 POP list showed an upgrade date of April 2022 for my POP, and that turned out to be correct.
I had read Pimâs blog post on the upgrade of the 1790BRE POP in BrĂŒttisellen, which contains a lot of super interesting details, so definitely check that one out, too!
Being able to plug in the SFP module into the new POP infrastructure yourself (like Pim did) sounded super cool to me, so I decided to reach out, and init7 actually agreed to let me stop by to plug in âmyâ fiber and SFP module!
Giddy with excitement, I left my place at just before 23:00 for a short walk to the POP building, which I had seen many times before, but never from the inside.
Patrick, the init7 engineer met me in front of the building and explained âHey! You wrote my window manager!â â what a coincidence :-). Luckily I had packed some i3 stickers that I could hand him as a small thank you.
Inside, I met the other init7 employee working on this upgrade. Pascal, init7âs CTO, was coordinating everything remotely.
Standing in front of init7âs rack, I spotted the old Cisco switch (at the bottom), and the new Cisco C9500-48Y4C switches that were already prepared (at the top). The SFP modules are for customers who decided to upgrade to 10 or 25 Gbit/s, whereas for the others, the old SFP modules would be re-used:
We then spent the next hour pulling out fiber cables and SFP modules out of the old Cisco switch, and plugging them back into the new Cisco switch.
Just like the init7 engineer working with me (who is usually a software guy, too, he explained), I enjoy doing physical labor from time to time for variety. Especially with nice hardware like this, and when itâs for a good cause (faster internet)! Itâs almost meditative, in a way, and I enjoyed the nice conversation we had while we were both moving the connections.
After completing about half of the upgrade (the top half of the old Cisco switch), I walked back to my place â still blissfully smiling all the way â to turn up my end of the connection while the others were still on site and could fix any mistakes.
After switching my uplink0
network interface to the faster network card, it also took a full reboot of my router for some reason, but then it recognized the SFP28 module without trouble and successfully established a 25 Gbit/s link! đ đ„ł
I did a quick speed test to confirm and called it a night.
Just like in the early days of Gigabit connections, my internet connection is now faster than the connection of many servers. Itâs a luxury problem to be sure, but in case youâre curious how far a 25 Gbit/s connection gets you in the internet, in this section I collected some speed test results.
speedtest.net (run by Ookla) is the best way to measure fast connections that Iâm aware of.
Here is my first 25 Gbit/s speedtest, which was run using the init7 speedtest server:
I also ran speedtests to all other servers that were listed for the broader ZĂŒrich area at the time, using the tamasboros/ookla-speedtest Docker image. As you can see, most speedtest servers are connected with a 10 Gbit/s port, and some (GGA Maur) even only with a 1 Gbit/s port:
Speedtest server | latency | download (mbps) | upload (mbps) |
---|---|---|---|
Init7 AG - Winterthur | 1.45 | 23530.27 | 23031.24 |
fdcservers.net | 18.15 | 9386.29 | 1262.92 |
GIB-Solutions AG - Schlieren | 6.64 | 9154.12 | 2207.68 |
Monzoon Networks AG | 0.74 | 8874.85 | 6427.66 |
Glattwerk AG | 0.92 | 8719.04 | 4008.28 |
AltusHost B.V. | 0.80 | 8373.34 | 8518.90 |
iWay AG - Zurich | 2.13 | 8337.56 | 8194.89 |
Sunrise Communication AG | 9.04 | 8279.60 | 3109.34 |
31173 Services AB | 18.69 | 8279.75 | 1503.92 |
Wingo | 4.25 | 6179.57 | 5248.36 |
Netrics ZĂŒrich AG | 0.74 | 7910.78 | 8770.19 |
Cloudflare - Zurich | 1.14 | 7410.97 | 2218.88 |
Netprotect - Zurich | 0.87 | 7034.62 | 8948.01 |
C41.ch - Zurich | 9.90 | 6792.60 | 690.33 |
Goldenphone GmbH | 18.91 | 3116.32 | 659.23 |
GGA Maur | 0.99 | 940.24 | 941.24 |
For a few popular Linux distributions, I went through the mirror list and tried all servers in Switzerland and Germany. Only one or two would be able to deliver files at more than 1 Gigabit/s. Other miror servers were either capped at 1 Gigabit/s, or wouldnât even reach that (slow disks?).
Here are the fast ones:
mirror1.infomaniak.com
and mirror2.infomaniak.com
mirror.puzzle.ch
mirrors.xtom.de
mirror.netcologne.de
and ubuntu.ch.altushost.com
Using iperf3 -P 2 -c speedtest.init7.net
, iperf3 shows 23 Gbit/s:
[SUM] 0.00-10.00 sec 26.9 GBytes 23.1 Gbits/sec 597 sender
[SUM] 0.00-10.00 sec 26.9 GBytes 23.1 Gbits/sec receiver
Itâs hard to find public iperf3 servers that are connected with a fast-enough port. I could only find one that claims to be connected via a 40 Gbit/s port, but it was unavailable when I wanted to test.
Do you have a â„ 10 Gbit/s line in Europe, too? Are you interested in a speed test? Reach out to me and we can set something up.
What an exciting time to be an init7 customer! I still canât quite believe that I now have a 25 Gbit/s connection in 2022, and it feels like Iâm living 10 years in the future.
Thank you to Fredy, Pascal, Patrick, and all the other former and current init7 employees for showing how to run an amazing Internet Service Provider. Thank you for letting me peek behind the curtains, and keep up the good work! đȘ
If you want to learn more, check out Pascalâs talk at DENOG:
]]>To me, the primary advantage of Smart Lights is the flexibility in where you place extra light switches, and the extra functions that become much easier with Smart Lights.
For example, I have added an extra light switch in the bed and next to the couch, without having to have an electrician tear up the walls to add more wiring. An âall-offâ button is super handy at the end of the day or when watching a movie.
Other attractive use-cases include controlling lights based on time of the day, based on whether people are home, or based on a motion sensor.
I used the RGB color light bulb version of all of the below systems. In practice, we typically donât change the color much, but it is nice to be able to adjust the color and brightness to something that fits the respective room. And, every once in a while, scenes that use color are fun!
The first smart light system I used was IKEA TRĂ DFRI. I figured as a system with a large user base, they would be inclined to improve it over time, and compatibility should be more likely than with other, smaller vendors.
Unfortunately the system is pretty much unchanged from when I first bought it many years ago.
You can easily find documentation about the API for using the TRĂ DFRI gateway programmatically, but when I looked for available Go packages, I decided to use COAP and DTLS myself back in 2019 for lack of an attractive Go package.
The light switches are good in terms of features, and easy to install: you can just remove the old switch and glue the TRĂ DFRI switch over the existing switch.
The downside of the light switches is that they are flimsy: because the switch is magnetically held in place in its case, it can easily fall on the floor when you bump against it.
Pairing the devices was always tricky for me. It got easier when I turned off all other ZigBee devices in my apartment before doing anything with IKEA devices.
At multiple points, the devices lost their pairing. It might have been when they ran out of battery.
The battery lifetime of the light switches was very poor â only about a year on average. They use the CR2032 form factor, which my charger does not support, so I couldnât use rechargables.
Swapping out the batteries and re-pairing the system every year or so quickly becomes tedious!
Because I also bought some Shelly 1L smart relays, I figured Iâd give the Shelly Bulb a try.
Instead of ZigBee, the Shelly Bulbs use WiFi. This makes them easy to get into your home network and does not require a separate gateway.
At 2 bulbs per room+hallway, and 2 buttons each, that sums up to having 16 extra devices in your WiFi network. This wasnât a problem for me in practice, but depending on how stable your WiFi network is, it might be a concern.
Notably, this also means your lights canât be controlled while your WiFi is unavailable.
In terms of physical light switches, youâll need to use a separate product such as the Shelly Button. This is the weakest point of the system. The latency is noticeable, even when configuring a static IP address, which does make things better, but still not good. The Shelly Button is extremely simple, so dimming has to be emulated with double or triple-press actions.
Given that one typically interacts with this system multiple times a day via its switches, I think it makes sense to chose a system that has good switches.
On the plus side, the Shelly Button uses a rechargable battery that can be charged from a USB power bank, which is a concept I really like.
After the Shelly Bulb, I figured Iâd try Philips Hue. Itâs by far the most expensive system of the ones I have tried, but also by far the most polished and user-friendly.
People recommended the Feller Smart Light Control switches, which use energy harvesting (from you clicking them!) and hence donât require a battery.
This makes it easy to place them anywhere, like next to the couch in the picture on the left.
Feller recommends extending existing installations by buying the next-larger mounting plate. Extending the box in the wall is not required, as no wires or in-wall space are needed. Drilling new holes for extra screws is required for stability, but thatâs a lot more doable than extending the whole box. Here are some pictures before, during and after the installation:
The Shelly 1L is a very interesting device. It goes behind your existing device into the wall and makes it smart!
This allows you to make smart any existing lights that canât easily be replaced by smart lights, for example a bathroom light built into the bathroom mirror cabinet.
You can also make existing light switches smart if you like the ones you already have and canât exchange them.
Another use-case is to easily connect buttons or sensors into your network, for example door bells or door sensors.
The Shelly 1L is special in that this specific model can be installed when all you have is a live wire (i.e. wiring for a light switch).
One potential issue is that depending on the configuration and connected deviceâs power usage, the Shelly might emit a slight hum noise. So, donât install one right next to your bed.
Another limitation is that while the Shelly does work with both, light switches (changes state) and light buttons (generates an impulse), it can only distinguish between short and long press events when you use a light button. Newer light switches from Feller can be re-configured to function as a button, but if your model is too old you might need to replace a light switch with a button.
One weird issue I ran into was that after installing a new bathroom mirror cabinet, the relay of the connected Shelly 1L would no longer function correctly â the light just remained on, even when turning it off via the Shelly. I read on the Shelly forum that this could be caused by running the Shelly upside-down, and indeed, after turning it around, it started to work again!
Smart Heating systems are often advertised to save cost. I wanted to try it out, and was also interested in the temperature logging because my apartment is on the more humid side and I wanted some data to optimize the situation.
I bought some HomeMatic temperature sensors and heating valve drives back in 2017. The hardware feels solid and was easy enough to install.
One massive downside of the system was the poor software quality of their Central Control Unit (CCU2). The web interface was super slow, looked very dated, and the whole thing kept running out of memory every 2 weeks or so. It was so bad that I re-implemented my own CCU in Go. I hear that by now, they have a new and better Control Unit version, though.
So far, one valve drive has failed with error code F1; I replaced it with a new one.
Turns out smart control of our heating does not seem to make any measurable difference. The rooms feel the same as before. No money is saved because the utility bill is divided equally among all tenants across the building (which seems to be standard in Switzerland), not billed for individual usage.
So, overall, I would not install smart heating valve drives again. The temperature sensors I still keep an eye on from time to time, but there are cheaper options if you only need temperature!
During the pandemic, I was receiving packages at home and hence I was relying on my door bell much more than usual. Hence, I was looking for a way to make it smarter!
The first device I got was the Nuki Opener, a smart intercom system. It allows you to get notifications on your phone when the doorbell is rung, and to unlock the door from your phone.
I got this device because it was specifically marketed as compatible with the BTicino intercom system our house uses. Unfortunately, this turned out to be incorrect, so I ended up building a hardware-modified intercom unit that is connected to the Nuki Opener in analogue mode.
Once it actually works, itâs a convenient system, and having your doorbell generate desktop notifications with sound is just super useful when wearing headphones! Strongly recommended.
As you can see on the pictures, Iâm powering the Nuki Opener via USB. It normally runs on batteries, but I want to minimize battery usage and swapping. A built-in rechargeable battery like in the Shelly devices would be a neat improvement to the Nuki Opener, so that the device could still work during power outages!
After I had the Nuki Opener, I also added a Nuki Smart Lock so that we can not only open the house front door, but also the apartment door itself in case one of us forgets their key.
The Nuki Smart Lock was easy to install and works great. It also shows with an elegant LED ring whether the door is currently locked or not, which I find handy.
Not having to turn on lights myself is something I find convenient, in particular in the kitchen, but also in the bathroom. When carrying plates or glasses into the kitchen, itâs nice to have the lights turn on while my hands are full.
First I tried Fellerâs Motion Sensors, because they physically fit well into the existing Feller light switch installation:
But, their limitations made me move away from them quickly: while you can change one or two basic settings, you cannot, for example, disable the motion sensor after a certain time of day, or manually disable it for a certain time period.
Also, because the device is installed in a fixed position (determined by where your light switch is), it isnât necessarily in the best place to spot all the motion you want to detect.
The Shelly Motion Sensor seems like a good motion sensor to me! It has a number of useful settings and can easily trigger any REST API endpoint or can be used via MQTT.
Like with the Shelly Button, this device has a built-in rechargeable battery that can be charged via USB. Depending on the location of the sensor, you can either attach a USB powerbank once a year, or remove the sensor from its fixture and charge it elsewhere.
The positioning of the Shelly Motion can either be easy (as it was in my kitchen) or tricky to get right (in my bathroom). I donât know if other motion sensors are better in terms of range.
One thing to note is that the Shelly Motion only reports state changes (motion start or motion end), and no continuous events while motion is detected.
For my kitchen, my regelwerk code directly translates motion on/off into light on/off commands (to Philips Hue and Shelly 1L), with the exception that a long-press turns off all motion control for the next 10 minutes. The granularity of the Shelly Motion is to report after no motion for 1 minute, which works well for me for the kitchen.
For my bathroom, I donât want the lights to immediately turn off when no motion is detected anymore, to err on the side of not turning off the light while people are still using the bathroom and are just not seen by the motion sensor. To implement that, I found that using the Shelly 1Lâs timer functionality works best. So, in my configuration, motion on means lights on, and motion off means lights on for 10 minutes, then off. Turning off the light manually disables that logic.
Note that the Shelly Motion should really be mounted in the orientation recommended by the manual. When the motion sensor lays on the side (or is upside down), detection is much poorer.
A smart plug is an easy way to turn off a power-hungry device while youâre away, to make a lamp smart, or to power on a connected device like a kettle to boil water for making a tea.
My current use-cases are saving power for the stereo sound system connected to my PC, and saving power by powering up the devices in my gokrazy Continuous Integration test environment on-demand only.
While there are tons of vendors selling smart plugs, the selection narrows considerably when you look for one with a Swiss power plug.
The HomeMatic smart plug is expensive (55 CHF) and super bulky! As you can see, even if you connect it at the very end of a power strip, it still blocks the adjacent connector.
Worse: the way itâs built (bulky side pointing away from the earth pin), I canât even insert it into 2 of the 3 power strips you see on the picture.
Somehow, even though itâs so bulky, the device feels flimsy at the same time. Iâm never 100% sure if the plug is inserted fully and correctly, and itâs easy to accidentally turn off power when bumping against the smart plug with your foot.
Because itâs a HomeMatic device, you need a working Central Control Unit (CCU) to control it programmatically. Conceptually, I prefer smart plugs that can be used with a REST or MQTT API.
The only upside of this smart plug is that it can measure power. I occasionally use it for that.
The Sonoff S26 are much cheaper (â12 USD when I bought mine) and come in a Swiss plug variant. Contrary to the HomeMatic ones, the Sonoff smart plugs are built âthe right way aroundâ, meaning I can plug them into many Swiss power strips. Unfortunately, they also block adjacent connectors, but at least not as many as the HomeMatic.
The Open Source firmware Tasmota supports the Sonoff S26, but flashing them is a painful experience. You canât do it over the air; you need to access rather small serial console pins inside the device.
Once you have them flashed with Tasmota, the devices work great.
One feature they lack is power measurement.
I would love to find a smart plug with a Swiss plug, that supports power measurement, and that is compatible with Tasmota (or builtin MQTT support), but until that product comes along, the Sonoff S26 are what Iâm going to use.
Here is an architecture diagram of the devices Iâm currently using:
To tie these different systems together, I use a Raspberry Pi running gokrazy, which in turn runs my regelwerk program. regelwerk only talks to MQTT, so all the different devices are connected to MQTT using small adapter programs such as my hue2mqtt or shelly2mqtt.
A more off-the-shelf solution would be to use Node-RED, if you want to do a little programming, or Home Assistant if you want to do barely any programming.
I donât look for one vendor or one system that has components for everything. Instead, I chose the leading vendor in each domain. Compatibility between systems is generally poor, so I try to keep my compatibility requirements to a minimum.
To programmatically interact with the devices, the best bet are devices that are designed to be developer-friendly (e.g. Shelly devices support MQTT) or at least have an official API with modules in my favorite programming language (e.g. Philips Hue). In terms of API, I expect to talk to a gateway device in my local network â I tried talking e.g. Zigbee directly but found it inconvenient due to poor software support, sparse documentation and strange compatibility issues.
Direct device-to-device communication is nice from a reliability perspective, but on some battery-powered systems you pay for it with reduced battery runtime. For example, when using multiple light switches for the same room with IKEA TRĂ DFRI, you pair one to the other, which also makes all signals go through it.
If possible, I select devices that have an open firmware available. Ideally, I can keep using the vendorâs firmware, but if the vendor unexpectedly goes out of business, itâs handy to have an alternative firmware available. Also, if the devices require a cloud service to function, using open firmware typically allows using them in your local network.
I have come to avoid WiFi where latency is important, e.g. between light switches and lights.
I stopped looking at the price too much and instead look at the user experience. Smart home is about comfort and convenience, and if a product doesnât delight in daily usage, why bother with it? Targeting the high end of mid-range devices seems like the sweet spot to me. Avoid anything more expensive than that, though â established players often re-brand third-party solutions and you only pay for the company name, not quality.
]]>Usually, I try to stay on the latest Intel CPU generation when possible. But I decided to skip the i9-10900 (Comet Lake) and i9-11900 (Rocket Lake) series entirely, largely because they were still stuck on Intelâs 14nm manufacturing process and didnât seem to offer much improvement.
The new i9-12900 (Alder Lake) delivered good benchmark results and is manufactured with the much newer Intel 7 process, so I was curious: would an upgrade be worth it?
Price | Type | Article |
---|---|---|
196 CHF | Case | Fractal Define 7 Solid (Midi Tower) |
89 CHF | Power Supply | Corsair RM750x 2018 (750 W) |
293 CHF | Mainboard | ASUS PRIME Z690-A (LGA1700, ATX) |
646 CHF | CPU | Intel Core i9-12900K |
113 CHF | CPU fan | Noctua NH-U12A |
30 CHF | Case fan | Noctua NF-A14 PWM (140 m) |
770 CHF | RAM | Corsair Vengeance CMK32GX5M2A4800C40 (64 GB) |
408 CHF | Disk | WD Black SN850 (2 TB) |
605 CHF | GPU | GeForce RTX 2070 |
65 EUR | Network | Mellanox ConnectX-3 (10 Gbit/s) |
The Noctua NH-U12A CPU fan required an adapter (âNoctua NM-i17xx-MP78 SecuFirm2 mounting kitâ) to be compatible with the Intel LGA1700 socket. I requested the adapter on Noctuaâs Website on November 5th, and it arrived November 26th.
Anytime you need to access a PCâs components, youâll deal with its case. Especially for a self-built PC, the case you chose determines how easy it is to assemble and later modify your PC.
Over the years, I have come to value the following aspects of a PC case:
I have been using Fractal cases for the past few years and came to generally prefer them over other brands because of their good build quality.
Hence Iâm happy to report that the Fractal Define 7 (their latest generation at the time of writing) ticks all of the above boxes!
The case and power supply work well together in terms of cable management. It was a joy to route the cables.
Itâs very easy to open the case doors (they clip in place), or remove the front panel. This is definitely the best PC case I have seen so far in terms of quick and easy access.
Hereâs how clean the inside looks. Most cables are routed with very short ways to the back, where the case offers plenty of convenient cable guides:
You might also find this YouTube video review of the Fractal Define 7 interesting:
When I first powered everything on, I waited for a while, but never saw any picture on my monitor. The PC eventually rebooted, multiple times in a row. I took that as a bad sign and turned it off to prevent further damage.
Turns out I should have just waited until it would eventually start up!
It took multiple minutes for the machine to eventually start. Iâm not 100% sure what the cause is for that, but I heard in a Linus Tech Tips YouTube video that DDR5 requires time-consuming memory testing when powering up with a fresh memory configuration, so that seems plausible.
In any case, my advice is: be patient when waiting for this machine to start up.
I originally ordered all components on November 5th 2021. It took a while for the mainboard to become available, but almost everything shipped on November 15th â except for the DDR5 RAM.
Until Late December, I was not able to find any available DDR5 RAM in Switzerland.
The shortage is so pronounced that some YouTubers recommend going with DDR4 mainboards for now, which manufacturers are scrambling to introduce in their lineups. I did really want to squeeze out the last few extra percent in memory-intensive workloads, so I decided to wait.
Where possible, I like only changing one thing at a time. In this case, I wanted to change the hardware, but keep using my Linux installation as-is.
To copy my Linux installation over, I plugged my old M.2 SSD into the new machine, and then started a live Linux environment, so that neither my old nor my new SSD were in use. My preferred live Linux is grml (current version: 2021.07), which I copied to a USB memory stick and booted the machine from it.
In the grml live Linux environment, I copied the full M.2 SSD contents from old to new:
grml# dd \
if=/dev/disk/by-id/nvme-Force_MP600_<TAB> \
of=/dev/disk/by-id/nvme-WD_BLACK_SN850_2TB_<TAB> \
bs=5M \
status=progress
For some reason, the transfer was super slow. Last time I transferred the contents of a Samsung 960 Pro to a Samsung 970 Pro, it took only 16 minutes. But this time, copying the Force MP600 to a WD Black SN850 took many hours!
Once the data was transferred, I unplugged the old M.2 SSD and booted the system.
The hostname remains the same, and the network addresses are tied to the MAC address of the network card that I moved to the new machine. So, I didnât have to adjust anything in the new machine and could just boot into my usual environment.
By default, the memory uses 4000 MHz instead of the 4800 MHz advertised on the box.
I figured it should be safe to try out the XMP option because it is shown as part of ASUSâs âEZ Modeâ welcome page in the UEFI setup.
So far, I have not noticed any issues when running the system with XMP enabled.
Update February 2022: I have experienced weird crashes that seem to have gone away after disabling XMP. Iâll leave it disabled for now.
The Fractal Define case comes with a built-in fan controller.
I recommend not using the Fractal fan controller, as you canât control it from Linux!
Instead, I have plugged my fans into the mainboard directly.
In the UEFI setup, I have configured all fan speeds to use the âsilentâ profile.
With Linux 5.15.11, some fan speeds and temperature are displayed, but oddly enough it only shows 2 out of the 3 fans I have connected:
% sudo sensors
nct6798-isa-0290
Adapter: ISA adapter
[âŠ]
fan1: 0 RPM (min = 0 RPM)
fan2: 944 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 625 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
fan6: 0 RPM (min = 0 RPM)
fan7: 0 RPM (min = 0 RPM)
SYSTIN: +35.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
CPUTIN: +40.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0: -128.0°C sensor = thermistor
AUXTIN1: +24.0°C sensor = thermistor
AUXTIN2: +28.0°C sensor = thermistor
AUXTIN3: +31.0°C sensor = thermistor
PECI Agent 0 Calibration: +40.0°C
[âŠ]
Unfortunately, writing to the /sys/class/hwmon/hwmon2/pwm2
file does not seem
to change its value, so I donât think one can control the fans via PWM from
Linux (yet?).
I have set all fans to silent in the UEFI setup, which is sufficient to not notice any noise.
After cloning my old disk to the new disk, I took the opportunity to run a few time-intensive tasks from my day-to-day that I could remember.
On both machines, I configured the CPU governor to performance
for stable
results.
Keep in mind that Iâm comparing two unique PC builds as they are (not under controlled and fair conditions), so the results might not necessarily be representative. For example, it seems like the SSD performance in the old machine was heavily degraded due to a incorrect TRIM configuration.
name | old | new |
---|---|---|
build Go 1.18beta1 (src/make.bash) | â45s | â29s |
gokrazy/rsync tests | â8s | â5s |
gokrazy UEFI test | â9s | â8s |
distri cryptimage (cold cache) | â143s | â18s |
gokrazy Linux compilation | 215s | 109s |
As we can see, in all of my tests, the new PC achieves measurably better times! đ
Not only in the benchmarks above, but also subjectively, the new machine feels fast!
Already in the first few days of usage, I notice how time-consuming tasks such as tracking down a Linux kernel issue (requires multiple Linux kernel builds), are a little less terrible thanks to the faster machine :)
The Fractal Define 7 case is great and will likely serve as a good base for upgrades over the next couple of years, just like its predecessor (but perhaps even longer).
As far as I can tell, the machine works well and is compatible with Linux.
]]>