Raspberry Pi 3B+ netboot
November 04, 2021 -It is not a secret that the most fragile part of a raspberry pi setup is the SD card. I had a RPi with docker setup and used netboot, but had an SD card for /var/lib/docker1. Of course this went wrong at some point; the machine did not boot anymore.
So naturally I was wondering:
Is it possible to run my raspberry pi setup with docker completely without sd-card?
(spoiler alert: yes!)
What did I consider
I briefly tried the vfs
storage driver, but that was unbearably slow2.
So how about using a loop device mount on NFS? Basically, we create a big file on an NFS share and map that as a block device, which we can then mount. At first I thought it would be silly and not performant, but then I realized that this is how MacOS does timemachine backups on a NAS as well. I decided to give it a try.
Result
The loop device mount over NFS worked so much better than I thought, so I wanted to document this in case someone else (me, in the future?) wants to set this up and is wondering how I got this to work.
Ingredients
The parts that I used were:
- a Raspberry Pi 3B+ with 5.1V power supply and network cable
- a NAS (Synology DS920+)
- a Linux router (homebrew)
The Raspberry Pi 3B+ has a built-in netboot mode, just take out the SD card3. The NAS I use for NFS shares.
The NAS can do TFTP boot as well, but for other reasons I need my router to handling DHCP (and bootp). If you do not have the requirement to run DHCP from your router, your setup might be simpler with just a RPi and a NAS.
Steps to reproduce
I tried to document all the steps to reproduce, but I'm sure I forget some part that is obvious to me now.
On my NAS
On my NAS I configured several shares:
- /ftfpboot: used for booting
- /nfsroot: the root filesystem
- /docker: used to store one big file
How I exactly created these, I'm not sure since it's been a while, but I think I just copied /boot
to nas.local:/tftpboot
and /
(without /boot
) to
nas.local:/nfsroot
. The /docker
share I started out as an empty one.
Since we are going to use /tftpboot as the boot loader, we can configure how we are going to boot in /tftpboot/cmdline.txt:
root=/dev/nfs nfsroot=192.0.2.2:/nfs_root,vers=4.1,proto=tcp rw ip=dhcp console=tty1 elevator=deadline rootwait cgroup_enable=memory cgroup_memory=1
Where 192.0.2.2 represents my NAS (but not really, I'm using RFC5737 documentation ranges). I think the important parts
for netboot are root=/dev/nfs
, nfsroot=
, ip=dhcp
and the rootwait
arguments.
After getting the root filesystem ready, I updated the /etc/fstab
file:
[...]
nas.local:/tftpboot /boot nfs defaults,vers=4.1,ro 0 0
nas.local:/docker /nfs/docker nfs defaults,vers=4.1 0 0
/nfs/docker/loop /var/lib/docker ext4 loop,x-systemd.requires=/nfs/docker 0 2
The /boot
is mounted, more for convenience than anything else (if I want to change a boot configuration, I remount it read-write before changing and rebooting).
Some things I want to point out: this will fail to mount /nfs/docker/loop
since the file is not created yet, so for the first time, it's probably wise to comment
out the last line.
Also, after we setup the loop device mount and enable this line, the x-systemd.requires=/nfs/docker
option tells systemd to mount the last line after /nfs/docker
has been mounted (since the file we need is on that share).
On my router
On my router I already ran ISC dhcp server and dnsmasq for other reasons, so I decided to extend that config to also provide TFTP boot.
I added configuration in /etc/dhcp/dhcpd.conf
(taken from here):
option space RPi code width 1 length width 1;
option RPi.discovery code 6 = unsigned integer 8;
option RPi.menu-prompt code 10 = text;
option RPi.menu-item code 9 = text;
group {
vendor-option-space RPi;
option RPi.discovery 3;
option RPi.menu-prompt "PXE";
option RPi.menu-item "Raspberry Pi Boot";
host water {
allow booting;
allow bootp;
hardware ethernet aa:bb:cc:dd:ee:ff;
fixed-address 192.0.2.4;
next-server 192.0.2.2;
filename "bootcode.bin";
}
}
The next-server
is the NAS, while fixed-address
is the IP address of the RPi. This configuration makes sure that the RPi will get an IP address and
get instructed to download bootcode.bin from 192.0.2.2 via TFTP4.
Speaking about TFTP, let's set that up. I added a new file in /etc/dnsmasq.d (local-pxe.conf)5:
enable-tftp=eth0
tftp-root=/tftpboot
tftp-no-fail
pxe-service=0,"Raspberry Pi Boot"
On my router, I have also mounted /tftpboot
so TFTP can serve it. I do not like this part, maybe I can make it nicer by moving all the DHCP/TFTP stuff to my NAS
so my router is not needed in this step. I added this line to /etc/fstab on my router:
nas.local:/tftpboot /tftpboot nfs defaults,vers=4.1,ro 0 0
On my RPi
On the RPi we'll setup the loop device mount (only needed once, the rest is already in the fstab).
dd if=/dev/zero of=loop bs=1M seek=10000 count=1 # 10 GiB
losetup /dev/loop0 /nfs/docker/loop # setup once manually for the next step
mkfs.ext4 /dev/loop0 # create a valid filesystem on the loop device mount
losetup -d /dev/loop0 # unlink the loop device mount
mount /var/lib/docker # since we have setup the /etc/fstab already, it should now mount cleanly
Before starting docker, I wanted to make sure of another feature: user namespace remapping: Creating/updating /etc/docker/daemon.json:
{
"userns-remap": "default",
// other config
}
Conclusions
I like this setup (full netboot, no SD) better than my previous one (with SD card). It works reliably and performant. There are some things I do not like. One thing that I do not like is that the setup is a bit complex (NFS from NAS, /tftpboot mounted on my router, TFTP from router; why not everything from the NAS?). Another thing I do not like is that some config is not fully clear to me (see 4 and 5).
I have a NAS, but I did not use a network share because overlay2 (the recommended storage driver for linux) does not work over NFS (actually, only on xfs or ext4).
I mean, they literally say that the performance is poor; they are not kidding.
Other models apparently can do it too, but might need some extra configuration.
You probably noticed an inconsistency: I said the RPi boots via TFTP from my router, not my NAS. I think the RPi ignores the next-server
and contacts
the machine that serves TFTP (my router) instead. I should debug this further, but it is less important to me since it works. I know this will confuse me later
on though.
The enable-tftp=eth0
is odd, since my router does not have an eth0
(interfaces are renamed), so I am not sure why that works. Something to improve.