back to article Linux kernel 6.2 promises multiple filesystem improvements

The forthcoming Linux kernel 6.2 should see improved filesystem handling, including performance gains for SD cards and USB keys, as well as FUSE. As for the next-gen storage subsystems… not so much. For a mature OS kernel, there are still considerable improvements being made in Linux's handling of existing disk formats, and …

  1. John H Woods

    Don't flame me but I think people should have stopped using RAID5 twenty years ago!

    I don't know much about BTRFS or XFS but I still think it's slightly disappointing that the Linux community had such apparently slow adoption of ZFS. However, that reluctance did eventually drive me to FreeBSD for my servers at least. Hasn't stopped me being a Linux fan but its nice to have some diversity.

    NB usual disclaimer (about not being an expert) applies.

    1. Anonymous Coward
      Anonymous Coward

      Not flaming you, but on SSD RAID-5 has it's use cases. On non solid state drives I agree it's be a foolish admin using it with today's disk sizes; the chances of ever getting a successful rebuild after a drive failure are approaching 0.

      The issue with ZFS on LInux has always been down to licensing issues, not technical ones: ZFS was released under CCDL not GPU, so rolling it into the kernel would present problems, and the danger that ORacle might decide to take action later, given their track record with Java/Android.

    2. Anonymous Coward
      Anonymous Coward

      The Linux community as a whole isn't accepting ZFS because of the resources. If you need a server of some sort, sure, but on the desktop RAM quantities have NOT CHANGED in nearly 15 YEARS... it's crazy to think, but true. With ZFS on a 4 TB HDD it could be recommended to reserve 4GB of RAM and considering 16GB of RAM is still a common quantity... ZFS just isn't happening.

      1. Nate Amsden

        This is not accurate. I've seen people write this 1GB per TB tons of times.

        The 1GB per TB was always about ZFS with dedupe enabled. Without dedupe you can get by with much less.

        Myself on my laptop I still use ext4 despite having 128G of ram, just because it's simpler.

        I do use ZFS in some cases, mainly at work, mainly for less used MySQL servers with ZFS compression enabled(and I use ZFS as a filesystem only, RAID is handled by the underlying SAN which is old enough not to support compression).

        My home server runs an LSI 8-port SATA RAID card with a battery backup unit, 4x8TB drives in hardware RAID 10 with ext4 as the filesystem and weekly scrubs(via LSI tools). I used ZFS for a few years mainly for snapshots on an earlier home server(with 3Ware RAID 10 and weekly scrubs) but ended up never needing the snapshots, so I stopped using ZFS.

        I do have a Terramaster NAS at a co-location for personal off site storage which runs Devuan, and ZFS RAID 10 on 4x12TB disks. Boot disk is a external USB HP 900G SSD with ext4 again. That's the only place I'm using ZFS' RAID.

        Haven't used anything but RAID 1/10 at home at least since about 2002, which was a 5x9GB SCSI RAID with a Mylex DAC960 RAID card. Largest number of disks in array at home since has been 4.

        At work I'm still perfectly happy with the 3PAR distributed sub disk RAID 5 (3+1) on any of the drives I have spinning or SSD.

        1. Jou (Mxyzptlk) Silver badge

          > The 1GB per TB was always about ZFS with dedupe enabled

          After many many years the first glimpses of background/offline deduplication start to appear. Most of the time you fill find a "not possible by design" answer, but that is not true. It is simply a programming issue not doing it as background task. The resulting deduplication rate doing it that way proves it should be included.

      2. Wayland

        Maybe 10 years ago RAM would be a problem. These days what are you going to use your massive RAM for if not caching the file system?

        1. Jou (Mxyzptlk) Silver badge

          Take a look why I needed a lot of RAM for some job in 2014... Apart from that, needing less RAM means more can fit into L1/L2 caches of the CPUs. Can make a big difference.

    3. Anonymous Coward
      Anonymous Coward

      ZFS is just TOO SLOW. It can't hit theoretical performance of a 550 MB/sec SSD... let alone 7GB/sec nvmes.

      1. sev.monster Silver badge
        Boffin

        If your ZFS is slow, you haven't configured it for performance.

        At home, I use 2x mirrored NVMEs over PCIe 8x adapter as a special vdev, with 64K special_small_blocks. This gives me NVMe-level performance for small files and offloads metadata transactions. (Lane saturation is a concern in this setup but it has not been an issue thus far; to help with that I use relatime and no normalization.) Mirrored SSD zroot pool on SATA3. lz4 compression and ashift 12 for all raidz2 7200 RPM SAS vdevs in the data pool over an IT mode LSI.

        Under most circumstances I get slightly less than RAM speeds due to ARC on writes and successive reads for all vdevs, and in the unfortunate and unlikely circumstance ARC fills, I have enough disks in the data pool for concurrent reads and writes to give me about 2x performance compared to a bare disk. Naturally the zroot performs close to ext4 as it is a simple mirror.

        lz4 overhead is either nonexistent or actually faster than no compression on my data vdevs, as large files (I have a lot of them) are read and decompressed faster than if the uncompressed data had been read. I have enough compute spare to easily eat the marginal CPU overhead. I don't use compression on the special or zroot as it would probably just end up increasing latency for not much tangible benefit.

        All in all I get better performance on my data vdevs than an equivalent bare ext4 filesystem would on the same disks when taking the ARC out of the picture, RAM speeds when using the ARC, redundancy without any significant performance impact on my root filesystem, incredibly fast metadata lookups (e.g. whole disk filename traversal/search) on data vdevs thanks to the NVMe special vdev, and full software control of the RAID so I can actually recover my data should it decide to shit the bed.

        (Aside on the above: I have had three hardware RAID solutions die on me and lost whatever data I couldn't grep the disks for, because there was no way aside from paying the vendor to even have a hope of restoring anything. Meanwhile, I was able to dig into my own damaged ZFS pools on two occasions with zdb to pull out data that I thought might be lost forever. Since then I learned my lesson and take much more frequent snapshots which I incrementally transfer off-site. Snapshots being instant and incremental is also a huge boon compared to most hardware RAID I have used. If you can't tell, I don't like purely hardware RAID.)

  2. Anonymous Coward
    Anonymous Coward

    Let a hundred flowers bloom -- Mao Tse Tung

    .....but some of us would like to get on with our use of applications, and not worry too much about OS layers way below the applications we use.

    Quote: "There are even some bugfixes for the now-venerable ext4."

    As it happens, here at Linux Mansions, we have multiple machines running Fedora 37, and we quite like the fact that ext4 is "venerable"........i.e. stable and reliable........

    In fact, there's a case for saying that hundreds of options in the Linux world (distributions, desktops, filesystems........), it's these myriad options which prevent Linux from being more popular!!

    Just saying...............

  3. allyw

    Next gen?

    I wouldn't say BTRFS or ZFS are still considered 'next gen' , I'd say they are current gen.

    The next hot-shit/up and coming file system looks like it'll probably be bcachefs, and even then that is really mimicking their features, rather than introducing something new, it's just a new code base.

    1. VoiceOfTruth

      Re: Next gen?

      I would say that ZFS is current gen with a long history of reliability in the Solaris and FreeBSD world. BTRFS is, in my view, do not use it yet gen as it still has problems.

    2. Liam Proven (Written by Reg staff) Silver badge

      Re: Next gen?

      [Author here]

      ZFS *on Linux* is next-gen, inasmuch as only one of the big distro vendors was really looking at it, and now it is backing away, which is a great shame IMHO.

      Stratis definitely is.

      Copy-on-write was going mainstream in the later era of proprietary UNIX, but Linux has yet to adopt it. I keenly await bcachefs going somewhat more mainstream, and I wish RH was not doing this "NIH syndrome" thing.

      I have experimented a tiny bit with ZFS on FreeBSD and while it works well, is well-integrated into the OS, and generally I am happy, I am not aware of any integration of snapshot support in OS updates, which is a key feature, and one also neglected by Fedora.

      1. VoiceOfTruth

        Re: Next gen?

        Greetings Liam, and Happy Christmas.

        -> I am not aware of any integration of snapshot support in OS updates (for FreeBSD)

        That's a fair point. Solaris has beadm (boot environment administration) which is a great feature. There is a FreeBSD port of this (sysutils/beadm), but as your correctly say this is not integrated with the base OS. In the Solaris world beadm was a saviour once a few years back. It may not seem like much, once in several years, but the consequences of not having it would be days of fixing.

        1. VoiceOfTruth

          Re: Next gen?

          I have to admit to an error on my part here.

          There IS a bectl on FreeBSD, which has been there for a few years now. It is an implementation of beadm, but it is in the standard OS - no need for a port. I'm not sure how this slipped my mind because I do now seem to recall it. Whatever, this article prompted me to go and check and have a brief play with FreeBSD boot environments. It is very similar to beadm on Solaris.

          1. Liam Proven (Written by Reg staff) Silver badge

            Re: Next gen?

            Thanks for that!

            For me, and for now, (open)SUSE's `snapper` tool remains the state of the art for this. I am not a big fan of Btrfs and I am very glad that on all my openSUSE boxes I kept `/home` on ext4 or XFS, but being able to boot directly into an *automatically created* snapshot is a lifesaver.

            This is why I think SUSE may be on to something with MicroOS and ALP.

            If they are able to configure a single-container environment that looks and works just like Leap/SLE for customers who want the old way, then a self-repairing OS could be a game-changer in the enterprise Linux world. So long as the Btrfs volumes don't get irretrievably corrupted, of course...

      2. DoContra

        Re: Next gen?

        -> Copy-on-write was going mainstream in the later era of proprietary UNIX, but Linux has yet to adopt it. I keenly await bcachefs going somewhat more mainstream, and I wish RH was not doing this "NIH syndrome" thing.

        In what way "Linux has yet to adopt it"? btrfs has CoW (exposed to user-space and used for snapshots), and XFS has been slowly but surely working to support CoW (don't quite know if it has it yet). I will agree that the tooling for CoW is lacking (cp --reflink, tools to mark duplicates in partitions/subtrees as CoW -- dupremove, bees, etc), and that's your lot.

        In particular, I run quite a few Ubuntu/Debian LXC containers on Ubuntu hosts (mostly on ext4, but could possibly finagle creating a new btrfs partition) which could be deduplicated (except for updates). BEES is looking intriguing for my use case, and I might take it for a spin on the next Ubuntu LTS (24.04).

        1. Liam Proven (Written by Reg staff) Silver badge

          Re: Next gen?

          [Author here]

          > In what way "Linux has yet to adopt it"?

          I put some backlinks in the article, and my previous stories on OpenZFS and on Btrfs in Fedora and others go into some depth on this.

          Btrfs is not as stable as it should be, is prone to corruption when unexpected events occur -- such as the filesystem filling up -- and does not have working repair tools. It doesn't even have a working `df` command, which is *why* the filesystem filling up is such a problem.

          Snapper is a wonderful tool but it only supports one filesystem; that is a problem.

          OStree is a clever kludge to get around the OS lacking a COW filesystem. If the OS had this basic facility, OStree would be unnecessary.

          Bcachefs is unfinished, period.

          ZFS can't be built into the kernel for licensing reasons. Because it can't be, it is memory inefficient and most vendors won't touch it. Oracle could fix this, and could even build ZFS into Oracle Linux, but it will never resolve this, because it doesn't care.

          Zsys and Stratis are both unfinished because their sponsors seem to have lost interest.

          LVM is too complicated, is not integrated wth filesystems, and Linux would have been better off going with EVMS instead, IMHO.

          This stuff is relatively straightforward and easy on FreeBSD and on commercial UNIX. Linux is signally failing to catch up. This ought to be integrated and on by default, and it should have been years ago.

          Red Hat, SUSE and Canonical have all failed to tackle this, partly through lack of will, partly because of petty politics and NIH syndrome, and it's shameful.

    3. l8gravely

      Re: Next gen?

      The problem with building a new filesystem is that people expect you to be compatible with all the OLD filesystems and their features. This is hard, because the POSIX spec (and the xfstests the XFS devs use (among others) for validating changes) means that it's not simple to do filesystems.

      Most people use 25% of the code base, but then you have DBAs who want direct block level access to a file, or people who open a file then spawn a hundred different threads and expect writes to be ordered properly, etc. And don't get me started on ACLs, performance for both desktop and huge parallel workloads across a variety of media like USB thumb drives, SSDs, NVMe, spinning disks, groups of disks, mixed media, etc.

      Then add on security, redundancy, reliability, snapshots, the ability to handle directories with large numbers of files, etc. Who remembers how NNTP servers had to break down posts into sub-directories to keep up performance because UFS, ext2 and other filesystems from that era would fall off a cliff performance wise once you had too many files in a directory?

      All these problems are hard, and handling the corner cases properly are hard.

      1. Anonymous Coward
        Anonymous Coward

        Naive question Re: Next gen?

        This might be a naive question, but...

        Why is it necessary to have a single FS that performs well across all media, workloads, logging/archiving needs, usage patterns, etc.? Is it not possible to have a variety of implementations that have different characteristics but expose the same API to user processes and are mounted under the same namespace?

        Even features like atomicity of read-modify-write need not be guaranteed; and not all FSes need to offer synchronous writes. Even access control could be something that is subsetted for simpler use cases.

        I suspect that existing APIs such as Posix are ill-suited for this task, in part because Posix tried to be all-encompassing, and in part because it wasn't designed with a clear distinction between application concerns and admin concerns; or between function and (implied) implementation.

        [insert usual xkcd cartoon here]

    4. l8gravely

      Re: Next gen?

      I wouldn't trust btrfs with a ten foot pole after all the troubles I went through upgrading a SUSE 12.5 to 15.1 (not as big a jump as implied...) system which I ended up in such a horrible state of failure that I had to trash it and re-install the system.

      Filesystems are complicated enough that adding in support for RAID, snapshots, CopyOnWrite, etc is just asking for trouble in my opinion.

      I was around when DEC put out advfs as a replacement for UFS on their DEC OSF/1 systems. It was a nice idea, but it hasn't gone anywhere.

      The biggest vendor of log based filesystems is probably Netapp with their WAFL filesystems. But it's also a tightly controlled environment which only supports NFS and CIFS and block based protocols, which severely limits the problem space they have to address.

      As I alluded to in my previous comment here, writing a filesystem that has all the features people want is *hard*. Most people use probably 25% of the filesystem's abilities. But it's the remaining mix which causes all kinds of problems.

      1. Paul Crawford Silver badge

        Re: Next gen?

        I would say that if there was one single reason I would like to have ZFS on my machines it is the copy-on-write snapshot system so it is transparent and low overhead, and yet it allows a simple roll-back to before you did the Doh! moment, or some scum launched ransomware on it.

        The latter assumes the snapshots have a different admin account/password from yours!

        Second reason is the checksums, not a big issue for typical personal computing but a big help for BIG file systems and heavy usage to know that the data is still as-written.

  4. TJ1
    Mushroom

    ZSYS due to be removed from Ubuntu installer

    See "[FFe] Remove zsys from installer " https://bugs.launchpad.net/ubuntu/+source/ubiquity/+bug/1968150

    1. Liam Proven (Written by Reg staff) Silver badge

      Re: ZSYS due to be removed from Ubuntu installer

      [Author here]

      That Launchpad bug is linked inside the article to which you are commenting.

      It is in the paragraph that ends:

      > some users are starting to ask what's happening, as well as whether it should be removed.

  5. Jou (Mxyzptlk) Silver badge

    "LVM, on RAID, on LVM"

    A classic. The amount of different variants I meet out there, with the simple requirement "increase disk size of this linux VM", is amazing. LVM directly on disk, LVM in a partiton, each filesystem with its own tools and weirdness and so on.

    I still miss following combination on Linux: RAID-5 like, encryption, deduplication, snapshots, as a relatively simple setup. Any recommendation which works well? Which does not need a full resync after an unscheduled shutdown caused by kids?

    The W OS from M can do this combination, surprisingly reliable, since about 2011. Albeit with slow write RAID5-like implementation, the slowest disk dictates the write speed. Unless you know how to fine tune interleave, journaling, allocationunitsize and writecachesize to get better speed, since the defaults are not well chosen by M. RAID0 / 1 like is as fast as the combined disks can get with their defaults. (Don't try dynamic disks variant, that is NT 4.0 technology)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like