
Fedora defaults to Btrfs
You mentioned Debian and openSUSE but not Fedora, which defaults to Btrfs, https://docs.fedoraproject.org/en-US/workstation-docs/disk-config/.
The big names mostly can't yet, but some lesser-known Linux distributions offer the ability to undo updates and recover from damage, even automatically. With a new version of Linux Mint available, a lot of people – especially less-technical types – are about to upgrade their PCs. As with almost all major upgrades, inevitably …
There is some work going on to have Fedora use Snapper, see https://discussion.fedoraproject.org/t/snapper-by-default/113680/2
Timeshift can be made to work on Fedora, though it's a bit tricky as the partition layout TS expects is not that of Fedora. Not insurmountable, but there it is.
I have used snapshots on Fedora with BTRFs, but not integrated with the whole OS (e.g. boot into snapshot), using Timeshift, especially appreciated the ability to navigate/open files across snapshots as if they are the current version.
[Author here]
> NixOS supports snapshotting
Do you mean in the sense that any deployed NixOS image from the same specfication file is _effectively_ a snapshot? If so, given my limited understanding of NixOS, yes -- but that is an almost accidental byproduct of the way Nix works. It's not similar in implementation, or in how it can be used.
Every time the configuration is modified and applied, a new generation (at least I think that’s what Nix calls it) is created. If a new generation has issues, you can boot into a prior one in the boot menu as you would with snapshots, or you can (depending on what was changed) apply a previous generation to a running system.
to me, at least on a conceptual level, the 'dual-root' option seems an elegant and 'simple' way forward when it comes to mitigating a lot of possible vulnerabilities at a low level. It's the kind of thing one would have hoped would have long been a basic and well developed feature of any OS being used in critical and mass-usage environments.
But we live in a world where, more often than not, 'cheap trumps quality (and good sense)', or, to put it another way, 'money' takes priority over 'wellbeing'.
[Author here]
> It's the kind of thing one would have hoped would have long been a basic and well developed feature of any OS being used in critical and mass-usage environments.
Well, it *is* in ChromeOS, which is the most widely-used desktop Linux by at least an order of magnitude.
So, yes, I agree.
If you mean "this should be in mainstream Linuxes", well, yes, it probably should, but _look_ how upset people get at systemd, a much more modest change. Look at the near-fanatical loyalty to RPM or DEB formats that some folks hold.
Also, of course, see Vanilla OS:
https://www.theregister.com/2024/07/31/vanilla_os_friendly_radical
Dual-root failover profoundly changes the way packaging works, so it's only reasonable to combine it with an immutable distro. Immutability has great promise, IMHO.
I'm not sure systemd represents a more modest change.
The way I imagine this working in a fairly broad sense is
* A-root is mounted at /
* Snapshot / as B-root
* mount B-root on /B-root (and the various bind mounts)
* chroot to /B-root
* apt-get update && apt-get upgrade
* update fstab to mount B-root as / at next boot
Now all you need is a little script in initramfs that if the machine fails to boot say... twice in a row... you update fstab to mount A-root at / and now you're back where you were before the update.
If you wanted to go further you could define a set number of snapshots of root the same way something like rsnapshot does and rotates them out so that next time we snap C-root, shuffle them down the alphabet and repeat so there's always N good versions to revert to plus one experimental one.
That's all pretty well understood unixy stuff - hell you want to go old and slow with it just use two different physical disks and rsync rather than fancy snapshot capable filesystems - it'd make the update process take quite a long time, but... eh.
For sure, you have to reboot the system to get your updates applied but I'll take that for a desktop that gets rebooted basically every night when I turn it off since machines take all of 10 seconds to boot these days.
Systemd changes a lot more about how you go about doing every day tasks, which I think is the real reason so many people hate it. It's different in ways that don't seem to be obviously helpful just different for the sake of it. Whereas this sort of A/B updating could literally be a wrapper around apt and from a user perspective they wouldn't even notice the difference except that their disk usage goes up a bit, and disks are mostly pretty big these days.
Edit: This example does of course presuppose that you don't have a terribly complicated disk layout where things in other partitions need to roll their own snapshots, but that's still something that could be accounted for - I just didn't want to make this post even more of an essay.
Huh! I know English is not my first language but I didn't think I was that far off.
Jokes apart, looks like we measure these things differently. An A/B switch only affects a well-known aspect of the system in a specific, well-defined and largely beneficial way. Its definitely not a creeping cthulhu-wannabe that gradually pokes its tentacles into every subsystem that makes Linux tick, whether its needed or not.
Seems odd that this isn't yet a Linux "thing" - Solaris 9 had this yonks ago (at least the basics) and I think predates ZFS - Veritas filesystem and volume manager or Sun's solstice volume manager (I forget but perhaps they were the same?) could be used but I don't think were required.
Considering the parody of SMF that Systemd has become I shudder to think what perversion of boot environments Linux might eventually be saddled with. :)
Solaris (and thus illumos) have had boot environments in their current form (ie, on ZFS with snapshots) for about 15 years or more. You can have as many as you like (within reason, and within the arbitrary limits of grub when that was the bootloader).
Before that, certainly in Solaris 8 through early version of Solaris 10, we had the equivalent of the 2-partition setup with Live Upgrade. Which did work, but was clumsy and inefficient compared to the fully integrated and instant snapshot environments we're used to now.
You beat me to the Solaris boot environment comment. Amazingly, we're approaching two decades, and this concept is still not mainstream.
ZFSBootMenu finally brings boot environments to Linux. I use it for Rocky 9 installations on bare metal. Gone are the days of broken kernel updates forcing tedious repairs and long downtime. Each kernel or ZFS update is tested on a VM before rolling it out to the bare.
I do like the idea of the NixOS or Guix "atomic upgrade and rollback." It's like git for an operating system. Those distros are just too weird for me to actually want to use them. I'm comfortable using Timeshift to recover files or undo changes, and I *had* manually set up a poor-man's A/B partition scheme. However, I haven't seen a problem in years, so I ditched that for a default partition layout.
If a businesslike Linux distro (i.e. not blue-haired NixOS gamers) decided to create a new, more robust package manager, and use A/B boot partitions, I'd hop on board after a few years. I think mobile devices get this right—OTA updates, sandboxed applications, and fool-proof system upgrades...
"I think mobile devices get this right—OTA updates, sandboxed applications, and fool-proof system upgrades..."
Sandboxed applications work great until you want to do something with more than one of them that wasn't already thoroughly thought through by its author. If an application wasn't courteous enough to store your data in the accessible storage locations, then you have an annoying time getting it back out, and that's if you have root access. If you don't, you're probably not getting it out, and it doesn't matter too much if you can because you'll never get it back in. By the way, if it does store the data in accessible locations, then there isn't much sandboxing left because anything with access to the storage can read and write everything there.
Also, I'm not sure calling Android system updates "foolproof" is entirely honest. They rarely brick things, but that's mostly due to the small number of them and the fact that they should be easy to test before the manufacturer releases them. If they do brick things, there is no automatic recovery. There is no manually selecting a recovery. No backup image is stored because there's not much storage in phones and they've already taken plenty for the system-installed apps, so taking double that for a backup partition isn't considered a good option. Since everything is custom to the device, your only options to recover from a bricked device are these:
1. Find a system image from your manufacturer (a few of them have it, but mostly good luck). Manually boot to recovery and sideload it over (unless it won't let you, which is many of them).
2. Go to XDA and find a post by someone whose identity and trustworthiness you don't know. Download the image and tools they provided and run them. Hope that they don't contain malware and that they're actually for the device you have.
I don't think that mobile devices are a great model for how we should do updates. Ripping and replacing a root partition to update a single binary in it just means using about a hundred times more bandwidth than you need without providing any more certainty that it will work or that you can recover if it didn't. Having two root partitions, with or without full replacement on an update, at least lets you recover from a bad update. Snapshots give you even more ability to rewind.
Valid points on sandboxing. It's also not feasible for things like the coreutils and stuff.
I think my Pixel has automatic recovery—at least, that's the sense I get from the Coreboot stuff—and I'm sure it could be implemented on true computers.
I'm sure there is a market, as the author says, for making something as capable as Linux and as robust as a mobile OS. There's also a market for the desktop OSes we have now!
I think a lot of people are propagating outdated information about Btrfs. When Btrfs is full it doesn't corrupt, it goes read-only. That includes if it's only the metadata that's full because it uses a two-stage chunk allocator. Distros using Btrfs always have a regularly scheduled btrfs balance to prevent metadata going full though. Synology NASs which use Btrfs also do automatic balancing.
Btrfs does detect data corruption due to for example HW issues, where some file systems doesn't detect the corruption. If it runs RAID1/10 it will repair the corrupt data, otherwise it will log the error. If metadata is corrupted it will go read-only.
To repair there are btrfs check, btrfs scrub and btrfs check --repair (worst case).
About RAID5/6 - these are implemented but not ready for use. A solution for the Raid5/6 issue is implemented and being tested so they might eventually stabilize these features.
Being a long-ish term OpenSuse user (since 12.1 I think), I can't say I've had a genuine eat-my-files-in-normal-use case of BTRFS corruption but what I have had - and continue to have - is updates failing because of a lack of space on the root partition, leaving the system in an unusable (but always rollback-able, though sometimes more easily than others) state. This is Tumbleweed, by the way, which gets a huge number of updates, rather than Leap, and I've been tight with the root partition due to the need to have disc space for other things. I've stuck with the "old fashioned" partitioning scheme of root on BTRFS and /home on XFS and I'm convinced it's saved me a few bothersome issues over the years.
Usually the root partition is full because I haven't done housekeeping on the snapshots and haven't calculated the free space required properly. Both machines are on slowish connections and I like to be there to "babysit" an update, so I will often do a zypper dup with --download-only first (which I don't have to babysit) and do the actual update later. Both options create snapshots, so a 3GB update seems to spawn 6GB of snapshots. Or something like that, don't really know, not technical in that way. What I have learned to do is delete a couple of snapshots before starting to make sure that there is somewhere north of 3× the download size worth of free space before starting.
M.
I personally use the "if it breaks, stick in a USB stick with the installer on it" method of system recovery.
Why waste 2 hours hopefully fixing an issue when it takes 20 minutes to reinstall the OS? If you have concerns about randomly wiping your disk, you need to review your current backup strategy...
You are conflating two different issues here.
If you just "nuke & reinstall" (how very Windows) then you also have to redo your config - which may involve "some effort" depending on your setup.
But I have no issues about randomly nuking a disk and restoring from backups - but that is a very different operation to "nuke & reinstall".
Of course, you could put some effort into configuring an auto-provision service that would take your blank and freshly installed OS and redo all the customisation. But did you remember to capture the last ad-hoc "lets try this quickly and see how it goes" config change ? Will it break because at least one of the packages you use has changed something ?
I switched from Windows to Mint a few months back. Some things are better, some are the same, and some are worse.
I much prefer Linux's text file based configuration design over the Windows registry. It's so much easier to back up the config directory and edit human-readable configuration files than it is to go hunting through the Program Files, ProgramData, and %APPDATA% directories, never mind determining whether you should be looking in the Local, LocalData, or Roaming profiles. And then there's the registry, which can be incredibly convoluted and difficult to deal with.
My overnight backup went from 90 minutes under Windows to 18 minutes under Linux. Score one point for Linux.
On the flip side, VSS under NTFS made things like live imaging of the boot partition/disk possible under Windows. Linux isn't there, yet. I run Timeshift and backup, but if I really want to back up an image of my boot partition, I need to boot my PC from a Linux USB in order to use RescueZilla, dd, or the Gnome Disk Manager and back up the boot drive. Score one for Windows there.
I'm still happily running Mint 21.3, and from I'm reading on the forums, it looks like 99% of the Mint users upgrade with little or no problem. But, as always, the 1% can still bork their system, even if they do everything right. And even if they don't screw up the upgrade, you can still find the new OS version doesn't recognize your wifi card, whether the previous version did.
For whatever reason, people may want, or need, to go back to the previous release. Fortunately, with Mint, even the previous version gets security support until 2027, so there's no need to rush.
The moral of the story is : do a complete backup before you install, and don't rush out to install first day unless you need to. I'm quite happy to let the early adopters find all the bugs for me.
... if I really want to back up an image of my boot partition ...
Why would you want to do that ? Same question for other partitions BTW.
If you do an image backup, then you make it harder to restore other than a "nuke and setup exactly as it was before". Often, if I have to restore something, I'll take the opportunity to tidy up or upgrade - drop a bigger disk in (prices have dropped since I built the system), shrink a partition/filesystem I'd over-provisioned previously, and so on. All these are possible with an image backup, but are harder. Images also take up much more space.
Boot partition is a doddle - it's sometimes mounted r/o by default, and even if it isn't, it's static. So something as simple as "rsync -avHx --numeric-ids --delete /boot/ (some destination spec depending on your setup)" will create you an exact copy of the file in the boot filesystem. Restoring them is a simple as recreating the boot partition and filesystem, then rsync the files back. If you only need one file, just restore the one file. And when you refresh your backup, it only needs to copy what's changed - useful if you aren't lucky enough to have super-duper-mega-hyper-fast internet and are doing remote backups.
There's other options such as tar or cpio which will leave you with a single file (optionally with compression) - but then that leaves you with a harder task if you don't want to restore everything.
Yes, I know you can mount an image file as a read-only filesystem and copy stuff from it, but that restricts you in how you keep your image files. Potentially you might have to copy one from "somewhere" before you can mount it. That is trivial with the small /boot filesystem, but I like to handle all my filesystems in the same way, and some are TB in size.