Bringing zpool checkpoints to a FreeBSD bootloader

Aug. 18, 2020, 10:21 p.m.

Almost two years ago I wrote a blog post about checkpoints in ZFS. I didn’t hide that I was a big fan of them. That said, after those two years, I still feel that there are underappreciated features in the ZFS world, so I decided to do something about that.

Currently, one of the best practices for upgrading your operating system is to use boot environments. They are a great feature for managing multiple kernels and userlands. They are based on juggling which ZFS datasets are mounted. Each dataset has its own version of the system. Unfortunately, boot environments have their limitations. If we, for example, upgrade our ZFS pool, we may not be able to use older versions of the system anymore. 

The big advantage of boot environments is that they have very good tools. Two main tools are beadm (which was created by vermaden) and bectl (which currently is in the FreeBSD base system). These tools allow us to create and manage boot environments.

Boot environments also are easily managed with a bootloader. During boot, we can choose which boot environment we want to use.

Another way of upgrading your box is to use ZFS snapshots. In this case, we can snapshot all the datasets in the system. If something goes wrong, we can roll back to the previous version. I use them for upgrading databases, among other tasks. Before the upgrade, you create a snapshot, then you do some SQLs to change the scheme. If something goes wrong, you simply rollback the state of the datasets. 

This technique is very handy but also has some subtle nuances. For example, if you add some datasets, remove them, or upgrade the pool during your upgrade process, none of these operations can be rollbacked. That is why zpool checkpoints are so interesting. They are not based on a dataset but on the whole pool. Thanks to that, we can rollback the pool upgrade or the dataset creation or deletion. That ability can be very handy in a complicated upgrading process.

The zpool checkpoint remembers the whole Transaction Group (TXG). That means that no data will disappear as long as the checkpoint exists.

Another caveat is that you can rewind a checkpoint only when the pool is imported. That means that if we have a root on ZFS, we will not be able to rewind it when the system is booted. We are also unable to rewind it in single-user mode because the pool is already imported. We could boot from a separate file system and rewind it, but can’t we do better?

Just a friendly reminder, if you rewind the checkpoint, you will lose your data, so be careful!

As we can see, ZFS gives us a lot of options to securely upgrade our operating systems: boot environments, snapshots, and checkpoints. But with all great features, we need some tools to help us use them. Otherwise, they are only great features. I hope that with the support in FreeBSD bootloader, more users will use checkpoints. So I guess it’s time to upgrade your bootloader…