profile picture

Raspberry Pi 3B+ netboot

November 04, 2021 - tech stack

It is not a secret that the most fragile part of a raspberry pi setup is the SD card. I had a RPi with docker setup and used netboot, but had an SD card for /var/lib/docker1. Of course this went wrong at some point; the machine did not boot anymore.

So naturally I was wondering:

Is it possible to run my raspberry pi setup with docker completely without sd-card?

(spoiler alert: yes!)

What did I consider

I briefly tried the vfs storage driver, but that was unbearably slow2.

So how about using a loop device mount on NFS? Basically, we create a big file on an NFS share and map that as a block device, which we can then mount. At first I thought it would be silly and not performant, but then I realized that this is how MacOS does timemachine backups on a NAS as well. I decided to give it a try.

Result

The loop device mount over NFS worked so much better than I thought, so I wanted to document this in case someone else (me, in the future?) wants to set this up and is wondering how I got this to work.

Ingredients

The parts that I used were:

The Raspberry Pi 3B+ has a built-in netboot mode, just take out the SD card3. The NAS I use for NFS shares.

The NAS can do TFTP boot as well, but for other reasons I need my router to handling DHCP (and bootp). If you do not have the requirement to run DHCP from your router, your setup might be simpler with just a RPi and a NAS.

Steps to reproduce

I tried to document all the steps to reproduce, but I'm sure I forget some part that is obvious to me now.

On my NAS

On my NAS I configured several shares:

How I exactly created these, I'm not sure since it's been a while, but I think I just copied /boot to nas.local:/tftpboot and / (without /boot) to nas.local:/nfsroot. The /docker share I started out as an empty one.

Since we are going to use /tftpboot as the boot loader, we can configure how we are going to boot in /tftpboot/cmdline.txt:

root=/dev/nfs nfsroot=192.0.2.2:/nfs_root,vers=4.1,proto=tcp rw ip=dhcp console=tty1 elevator=deadline rootwait cgroup_enable=memory cgroup_memory=1

Where 192.0.2.2 represents my NAS (but not really, I'm using RFC5737 documentation ranges). I think the important parts for netboot are root=/dev/nfs, nfsroot=, ip=dhcp and the rootwait arguments.

After getting the root filesystem ready, I updated the /etc/fstab file:

[...]
nas.local:/tftpboot	/boot	nfs	defaults,vers=4.1,ro	0	0
nas.local:/docker	/nfs/docker	nfs	defaults,vers=4.1	0	0

/nfs/docker/loop	/var/lib/docker	ext4 loop,x-systemd.requires=/nfs/docker 0 2

The /boot is mounted, more for convenience than anything else (if I want to change a boot configuration, I remount it read-write before changing and rebooting). Some things I want to point out: this will fail to mount /nfs/docker/loop since the file is not created yet, so for the first time, it's probably wise to comment out the last line.

Also, after we setup the loop device mount and enable this line, the x-systemd.requires=/nfs/docker option tells systemd to mount the last line after /nfs/docker has been mounted (since the file we need is on that share).

On my router

On my router I already ran ISC dhcp server and dnsmasq for other reasons, so I decided to extend that config to also provide TFTP boot.

I added configuration in /etc/dhcp/dhcpd.conf (taken from here):

    option space RPi code width 1 length width 1;
    option RPi.discovery code 6 = unsigned integer 8;
    option RPi.menu-prompt code 10 = text;
    option RPi.menu-item code 9 = text;

    group {
        vendor-option-space RPi;
        option RPi.discovery 3;
        option RPi.menu-prompt "PXE";
        option RPi.menu-item "Raspberry Pi Boot";

        host water {
            allow booting;
            allow bootp;

            hardware ethernet aa:bb:cc:dd:ee:ff;
            fixed-address 192.0.2.4;
            next-server 192.0.2.2;
            filename "bootcode.bin";
        }
    }

The next-server is the NAS, while fixed-address is the IP address of the RPi. This configuration makes sure that the RPi will get an IP address and get instructed to download bootcode.bin from 192.0.2.2 via TFTP4.

Speaking about TFTP, let's set that up. I added a new file in /etc/dnsmasq.d (local-pxe.conf)5:

enable-tftp=eth0
tftp-root=/tftpboot
tftp-no-fail
pxe-service=0,"Raspberry Pi Boot"

On my router, I have also mounted /tftpboot so TFTP can serve it. I do not like this part, maybe I can make it nicer by moving all the DHCP/TFTP stuff to my NAS so my router is not needed in this step. I added this line to /etc/fstab on my router:

nas.local:/tftpboot	/tftpboot	nfs	defaults,vers=4.1,ro	0 0

On my RPi

On the RPi we'll setup the loop device mount (only needed once, the rest is already in the fstab).

dd if=/dev/zero of=loop bs=1M seek=10000 count=1  # 10 GiB
losetup /dev/loop0 /nfs/docker/loop  # setup once manually for the next step
mkfs.ext4 /dev/loop0  # create a valid filesystem on the loop device mount
losetup -d /dev/loop0  # unlink the loop device mount
mount /var/lib/docker  # since we have setup the /etc/fstab already, it should now mount cleanly

Before starting docker, I wanted to make sure of another feature: user namespace remapping: Creating/updating /etc/docker/daemon.json:

{
    "userns-remap": "default",
    // other config
}

Conclusions

I like this setup (full netboot, no SD) better than my previous one (with SD card). It works reliably and performant. There are some things I do not like. One thing that I do not like is that the setup is a bit complex (NFS from NAS, /tftpboot mounted on my router, TFTP from router; why not everything from the NAS?). Another thing I do not like is that some config is not fully clear to me (see 4 and 5).

1

I have a NAS, but I did not use a network share because overlay2 (the recommended storage driver for linux) does not work over NFS (actually, only on xfs or ext4).

2

I mean, they literally say that the performance is poor; they are not kidding.

3

Other models apparently can do it too, but might need some extra configuration.

4

You probably noticed an inconsistency: I said the RPi boots via TFTP from my router, not my NAS. I think the RPi ignores the next-server and contacts the machine that serves TFTP (my router) instead. I should debug this further, but it is less important to me since it works. I know this will confuse me later on though.

5

The enable-tftp=eth0 is odd, since my router does not have an eth0 (interfaces are renamed), so I am not sure why that works. Something to improve.