back to article Does Linux need a new file system? Ex-Google engineer thinks so

Former Googler Kent Overstreet has announced that a long-term project to craft a new Linux file system is at a point where he'd like other developers to pitch in. Since you're already asking why bother with a new file system, the answer Overstreet provides in this post to the kernel mailing list is that he wants to “match ext4 …

  1. This post has been deleted by its author

    1. thames

      Can you say that Hammer2 is any more complete, or even as complete, as Bcachefs? Bcachefs isn't just starting out. As the developer says, it has grown out of work done at Google over a number of years on caching data on SSDs. This announcement is simply to say that it is now becoming a stand alone file system rather than just a cache for other file systems.

      Overall, it sounds very interesting. It might eventually make a good replacement for ext4 on desktops and laptops. However, adoption of new file systems tends to be slow, because rarely change their file systems.

      P.S. - I wasn't the one who down-voted you by the way.

      1. Anonymous Coward
        Anonymous Coward

        "has grown out of work done at Google over a number of years ..."

        So it's Google's IP and so it will probably phone home to snitch on your content to them then?

        1. Robert Helpmann??
          Childcatcher

          Grown out of work done at Google

          That would explain the Googler's use of the term "metastasizing." It is refreshing when someone is up front about the fact their employer, past or present, is a cancer.

          1. nematoad Silver badge

            Re: Grown out of work done at Google

            "“the bcache codebase has been evolving/metastasizing..."

            Aye, a poor choice of words, if he really wants this project to be taken seriously.

            Is it just me or are Google metastasizing into the new Microsoft?

            1. Anonymous Coward
              Joke

              Re: Grown out of work done at Google

              Flip that statement around and it still seems true. We will have found the Antichrist when they merge.

        2. oldcoder

          nope.

          Being part of the kernel makes it GPL... any "phone home" attempts would be shown up, VERY loudly... and removed.

    2. oldtaku
      Go

      "On the other hand, why not just put the effort behind the development and porting to Linux of the Hammer2 filesystem?"

      Practically, it'd be more productive if everyone got together and worked in perfect harmony on the UberFileSystem. But helping debug some else's already 3 years overdue file system isn't nearly as fun as writing your own new one. And it's the *nix way to have 5 projects when you could have 1. I can't blame him.

      1. thames

        @oldtaku - "Practically, it'd be more productive if everyone got together and worked in perfect harmony on the UberFileSystem."

        Well, there's also the thing that there's no one size fits all file system either. File systems that work best on large file servers can be excessively resource hungry on PCs. File systems that work best on PCs don't have all the advanced features or performance for large file servers. Systems like ZFS or BTRFS tend to take up to take huge gobs of RAM to run properly and that's probably inherent to the features which people want to use them in.

        The article and the original post don't say is what applications they are targeting with Bcachfs. What I would like to see is a replacement for Ext4 before that file system runs out of steam due to increasing disk size. Things like checksumming are important with large disks because of how long it takes to run a conventional scan on them. Right now the plan seems to be for BTRFS to replace Ext4, but I'm not sure that's the right decision.

        1. Adam Inistrator

          btrfs memory requirements

          1. "... BTRFS tend to take up to take huge gobs of RAM to run properly ... "

          2. "... Right now the plan seems to be for BTRFS to replace Ext4."

          something contradictory in these statements. I think 1 is false

      2. Dinsdale247

        FreeBSD

        "Practically, it'd be more productive if everyone got together and worked in perfect harmony on the UberFileSystem. But helping debug some else's already 3 years overdue file system isn't nearly as fun as writing your own new one. And it's the *nix way to have 5 projects when you could have 1. I can't blame him."

        No, that's the GNU/Linux way of doing things. That's why GNU/Linux has 5 projects and FreeBSD has zfs.

      3. Dr. Mouse

        But helping debug some else's already 3 years overdue file system isn't nearly as fun as writing your own new one. And it's the *nix way to have 5 projects when you could have 1. I can't blame him.

        This is the software equivalent of https://xkcd.com/927/

        Also, I have in the recent past developed a new feature for existing software. The amount of work it took just to understand where it would fit in almost pushed me to rewrite from scratch. In the end I shoehorned the software in, but only for my own use as making it all fit with the project's guidelines was too much of a ball ache. It did the job I wanted, but writing my own from scratch would have been no more difficult and far more fun (at least for my own use). It would have been more restricted in it's use, but it would have done the job I needed at the time.

        It's always the way, I find. Learning how to develop for an existing codebase is a lot of hard work. Once you know, great.

      4. Gary Bickford

        Multiple projects can be a good thing

        Back in the heyday of Japanese 'Asian Tiger" when their cars were eating everyone's lunch, one of the methods they were using for development of new products (or so I read at the time) was to assign the same project to two or three independent teams, who each competed to come up with the best new design. At some point either a winning team was selected, or the best points of all of them were merged and the most successful team was given the lead to finish.

        This seems more expensive, but it probably greatly reduces the probability of abject failure so is probably cheaper and almost certainly better in the long run.

        1. Number6

          Re: Multiple projects can be a good thing

          That's how venture capitalists work - fund ten projects in the expectation that most of them will fail, only they don't know at the start which one or two will be successful so they have to start with all of them.

    3. bazza Silver badge

      integrated filesystem checksumming, for example, should really be everywhere.

      Well in effect it has been for a long time, though not necessarily at the level of the file system. Physical storage has had error detection / correction for a long time now.

      It was only with the advent of very large storage devices that their on board ECC became inadequate for "ensuring" (there's no such thing as a guarantee) data accuracy. That's led to file systems like ZFS putting in an extra layer of ECC of their own to compensate.

      Incidentally I think the characteristics of the ECC in ZFS were carefully chosen to accommodate the typical bit error rate achieved by HDDs. Getting that right in a file system design is important; just slapping in a CRC something-or-other makes no sense unless one matches it's parameters to the BER of the underlying physical devices. Too much in the file system and you're wasting space and throughput, too little and the BER might be higher than desired. Of course, choosing the BER that's right for the business is another matter...

    4. Jim 59

      We've all needed a new file system since about 1992, viz some kind of tagged file system that frees us from the directory structure, while still somehow preserving it. Something equivalent to a nosql database. But this is still a vague fantasy.

      In the real world, I'd like ZFS-style deduping please, and optimisation for both flash and rotating drives. The deduping alone would allow us to bin 99% of the worlds disks. Maybe that's why it hasn't taken off in the way everyone expected.

      1. Preston Munchensonton
        Pint

        We've all needed a new file system since about 1992, viz some kind of tagged file system that frees us from the directory structure, while still somehow preserving it. Something equivalent to a nosql database. But this is still a vague fantasy.

        In the real world, I'd like ZFS-style deduping please, and optimisation for both flash and rotating drives. The deduping alone would allow us to bin 99% of the worlds disks. Maybe that's why it hasn't taken off in the way everyone expected.

        Damn it, Jim! You've had 27 years. Why isn't this done yet?

    5. Anonymous Coward
      Anonymous Coward

      "On the other hand, why not just put the effort behind the development and porting to Linux of the Hammer2 filesystem?"

      Hammer2 is a great file system, but wouldn't there be licensing issues? Would the GPL nutters try and shackle it with GPL v3 licenses?

  2. oldtaku
    Thumb Up

    Sure, why not.

    ... as long as we can default to ext4 and pick others based on our level of devil-may-careness. And as long as you don't kill your wife.

    1. Anonymous Coward
      Anonymous Coward

      Re: Sure, why not.

      But what if she's _really_ annoying?

      1. John H Woods Silver badge

        Re: Sure, why not.

        "But what if she's _really_ annoying?" --AC

        Or FAT?

    2. Anonymous Coward
      Anonymous Coward

      Re: Sure, why not.

      "And as long as you don't kill your wife."

      Cold, man. really cold...

  3. Martijn Otto

    Ah, so it's like ZFS but unfinished.

    Let's reinvent the wheel!

    1. Ian Michael Gumby
      Devil

      Re: Ah, so it's like ZFS but unfinished.

      Thats the silicone valley way!

      1. Anonymous Coward
        Holmes

        Re: Ah, so it's like ZFS but unfinished.

        >"silicone valley"

        Been dreaming about your mother recently?

    2. bri

      @ Martijn Otto - You mean btrfs, surely

      Did Oracle release ZFS under GPL in full? As far as I am concerned, ZFS doesn't exist as it's not supported in any major Linux distro. And if I wanted something proprietary, I'd go GPFS (I prefer IBM to Oracle).

      Anyway, btrfs still doesn't feel production ready, so bcachefs may be interesting option (license permitting).

      1. Six_Degrees

        Re: @ Martijn Otto - You mean btrfs, surely

        "Purity of Essence. EOP. OPE. It's one of those!"

        1. bri

          @Six_Degrees - You mean btrfs, surely

          Oh come on. Even Oracle Linux doesn't come with ZFS as supported standard. And they should be able to ship and support it.

          One thing is hobby, where ZFS may be gold, you can bolt it on Fedora or whatever, other thing is production, where only supported configurations or at least widely used stable combinations with major distros count. ZFS on Linux is neither (sorry but home & hobby doesn't count).

          1. Alan Brown Silver badge

            Re: @Six_Degrees - You mean btrfs, surely

            "Even Oracle Linux doesn't come with ZFS as supported standard."

            Of course not. If it did they wouldn't be able to charge you 6 figure support fees for ZFS on Solaris.

            1. admiraljkb

              Re: @Six_Degrees - You mean btrfs, surely

              @Alan Brown

              ""Even Oracle Linux doesn't come with ZFS as supported standard.""

              "Of course not. If it did they wouldn't be able to charge you 6 figure support fees for ZFS on Solaris."

              +1 for Oracle and crazy support fees, but in this case, BTRFS is their green field replacement for ZFS. It started off that way before the Sun acquisition, but continued afterwards. Hence ZFS doesn't come as an official option with Oracle Linux as its considered a bit "legacy" on top of the maneuvering needed for licensing compatibility issues... I wish they'd push more resources onto BTRFS and finish off the last 10% of whats needed.

              1. phil dude
                Thumb Up

                Re: @Six_Degrees - You mean btrfs, surely

                I agree. I am a zfsonlinux user (it is amazing and I am a *minor* kernel contributor -solved a problem I had, of course!).

                BTRFS is clearly the future as it will be in *every* kernel. Rather than ext4 which is , well, in need of a replacement...

                Remember, once you have COW, checksums and snapshots per volume, you have sufficient flexibility to manage many different distributions on the same machine.

                P.

      2. Antonymous Coward
        Holmes

        Re: @ Martijn Otto - You mean btrfs, surely

        BCacheFS is hardly "production ready" either. Its advertised performance is FAR from reality too.

        1. bri

          @AC

          Where do I say that it is? OTOH it still *may* achieve production-level readiness faster. 'May' is the word, but until either of them is production ready, no one knows which is ultimately better way. So it is beneficial to pursue development of both of them.

          And that was my point, which is not shared by the initial poster in this thread.

      3. Arthur the cat Silver badge

        Re: @ Martijn Otto - You mean btrfs, surely

        "Did Oracle release ZFS under GPL in full?"

        No, Sun released it under CDDL, a variant of MPL. Perfectly free and usable by everybody. It's only rms' desire to make GPL The One True Licence To Be Obeyed By All that prevents its integration with Linux.

        Once you've seriously used ZFS (and recovered from dead disks with no data loss) you wouldn't want any other file system.

        1. Microchip

          Re: @ Martijn Otto - You mean btrfs, surely

          zfsonlinux.org has a kernel module, but as mentioned in other comments, non-GPL compatible. Fast though, seemed to work well when I was testing as an alternative to FreeNAS with ZFS. I've zero issue throwing a non-GPL module in, but purist may be unhappy with the idea.

          1. Bronek Kozicki

            Re: @ Martijn Otto - You mean btrfs, surely

            purist GPL zealots may be unhappy with the idea

            FTFY, because CDDL is open source. Unfortunately because it's not GPL, it cannot be included in kernel source tree - but it can be (and has to) built when deploying your own kernel. There are source (as opposed to binary) packages included in major distributions just for this.

        2. Anonymous Coward
          Anonymous Coward

          Re: @ Martijn Otto - You mean btrfs, surely

          > No, Sun released it under CDDL, a variant of MPL. Perfectly free and usable by everybody.

          And indeed, you can even sell commercial appliances which use ZFS without having to release the source code. This is a much more open and free licence than the GPL.

          It seems unlikely Oracle chose this licence to prevent ZFS being used with Linux, when by releasing under GPL they could have maintained a revenue stream by licensing the code to commercial users.

        3. John Hughes

          Re: @ Martijn Otto - You mean btrfs, surely

          "perfectly free and usable by everyone"?

          No, the CDDL was explicitly written to be incompatible with the GPL.

      4. Richard_L
        Unhappy

        Re: @ Martijn Otto - You mean btrfs, surely

        I just wish Oracle would change the licencing of ZFS out so it can be included with distros by default, instead of being cast out into a legal wilderness as it is now. Then we could take BTRFS off to the woodshed and put it out of its misery and not have to wait years for this bcachefs to be ready.

        I've been using BTRFS as my filesystem on SUSE and it's a pain in the arse. SUSE recommend it as the default filesystem for root, yet it has a nasty habit of filling up and silently refusing to write any more data to the disk even though standard utilities like df continue to report plenty of free space.

        Neither SUSE nor BTRFS appear to monitor for the approach of this condition nor alert the user or administrator to it when it happens. SUSE have a tool that merrily snapshots the **** out of any btrfs filesystem, but will they monitor that same fileystem for chronic constipation? Nope... go and write your own script to periodically rebalance your chunks. Grr.

        1. bazza Silver badge

          Re: @ Martijn Otto - You mean btrfs, surely

          I just wish Oracle would change the licencing of ZFS out so it can be included with distros by default, instead of being cast out into a legal wilderness as it is now.

          Well, it's up to them I suppose. It's their code, and I think everyone is grateful that they chose to share it at all. They obviously had specific goals on control and re-use that they felt GPL wouldn't achieve, so wrote a license to suit. It's not Sun/Oracle's fault that Linux is under GPL2. We can make do and mend with building our own kernel modules or getting some pre built ones.

          FreeBSD has had no trouble at all adopting ZFS. There's OSX implementations, and reportedly MS briefly considered putting it into Windows. Rigid and unwavering adherence to the current GPL2 guarantees that Linux is always going to be hampered this way, which ultimately is not beneficial for the Linux community.

          1. HPCJohn

            Re: @ Martijn Otto - You mean btrfs, surely

            I agree.

            I am a fan of GLusterfs, and it just seems such a natural good fit with ZFS - ZFS filesystems on the bricks, then scale out your filesystem, replicate etc. using Gluster.

            You can easily do this yourself with zfsonlinux.

            However I'm told that Redhat can't roll this out as a supported configuration with Redhat storage, because of this license restriction. Arggh...

          2. future research

            Re: @ Martijn Otto - You mean btrfs, surely

            Sun released the code under the CDDL, and FreeBSD adopted it. My real problem is that Oracle then stopped releasing any updates, so the OpenZFS has had to create a fork from the last Sun release of opensolaris. This fork will therefore be incompatible with the later version of Oracle ZFS.

        2. thames

          Re: @ Martijn Otto - You mean btrfs, surely

          @Richard_L - The CDDL license used by ZFS was carefully crafted to make it incompatible with the license used by Linux. Sun was trying to establish OpenSolaris as a major open-source OS, and didn't want people to simply take the best bits of it (basically ZFS and Dtrace) and stick them into Linux. That would have killed OpenSolaris as there would have been no reason for most people to use it.

          http://www.cnet.com/news/sun-open-source-license-could-mean-solaris-linux-barrier/

          "The CDDL is not expected to be compatible with the GPL, since it contains requirements that are not in the GPL," Claire Giordano of Sun's CDDL team said in its submission. "Thus, it is likely that files released under the CDDL will not be able to be combined with files released under the GPL to create a larger program."

          ZFS on BSD was likely not considered a problem, because the user base of BSD is so much smaller.

          1. bazza Silver badge

            Re:Thames

            The CDDL license used by ZFS was carefully crafted to make it incompatible with the license used by Linux.

            Isn't the one defining point of GPL that any other license, no matter what it says, is essentially incompatible with it?

            1. phil dude
              WTF?

              Re: Re:Thames

              No. The goal of the GPL was that software that was free could stay free, and not become locked down by accumulated changes.

              The reason why parasitic companies (like Oracle) do what they do is because the ambiguity they introduce with a "special" license, makes it difficult for other businesses to count on it. Look at all the problems Google is having...

              Hence we are in the insane situation that the source code, currently running on *your* machine, could become illegal to possess due to a court decision. If you are at home, who cares. But corporations would become liable and that FUD changes the market, so that inferior products become entrenched through this proxy extortion. (look up FAT on android and Microsoft's own version of this anti-consumer trend)

              RMS might have seemed extreme a few years ago, but he was spot on.

              P.

            2. thames

              Re: Re:Thames

              @bazza - "Isn't the one defining point of GPL that any other license, no matter what it says, is essentially incompatible with it?"

              Er, no. There is non-GPL code in the Linux kernel, for example some MIT stuff. However, one of the major characteristics of any GPL-type license is the "no additional restrictions" clause. That is, you may not impose any additional licensing restrictions or requirements on the software which were not originally present. That is intended to keep the software open.

              The reason that CDDL is incompatible with GPL is exactly what the Sun CDDL license drafter said in on of my earlier posts.

              "The CDDL is not expected to be compatible with the GPL, since it contains requirements that are not in the GPL," Claire Giordano of Sun's CDDL team"

              As you can see, according to the licensing experts at Sun, CDDL would impose additional licensing requirements which are not present in GPL.

              There's a list at the following link which tells you which common FOSS licenses are compatible with GPL.

              http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses

              Here's a few compatible examples from their list:

              * Artistic License 2.0 (Perl).

              * Berkeley DB License (Sleepycat Software license).

              * Boost Software license.

              * Newer BSD (2-clause BSD - very early versions had an third clause with a mandatory advertising requirement which was not compatible and which caused loads of headaches for other people as well).

              * MIT (basically the same as 2 clause BSD).

              * Intel Open Source License.

              * MPL - Mozilla Public License.

              * Public Domain.

              * Python 2.1.1 or newer.

              * Ruby license.

              * X11 License.

              * etc.

              There's more, but I've just picked the more common ones.

              In addition, GPLV3 is compatible with Apache 2.0, but not the older GPLV2 license which the Linux kernel still uses. One of the big reasons for updating the GPL to version 3 was to make it compatible with the Apache license. GNU recommends Apache 2.0 over a BSD/MIT style license because it deals with patent issues (which GPLV3 also addressed in its own update).

              The things that usually tend to make licenses incompatible with GPL are those which have more restrictions. For example, the Eclipse License says that the EPL is governed by the laws of New York. The US isn't the whole world and you're not allowed to impose a restriction like that on GPL software. A lot of the "corporate" open source licenses have clauses like that, which quite frankly makes them pretty useless to anyone other than the original author. There are other explanations at the above web site as well.

              Another good example is the original JSON license had a clause which said: “The Software shall be used for Good, not Evil.” That made it non-GPL compatible because you have the right under the GPL to use the software for evil if that is what you want. That might sound like nit-picking, but these are the sorts of details which compatibility can fall down over.

              However the big licenses today are GPL, MIT (BSD), Apache, and MPL. MIT and MPL are compatible with versions 2 and 3 of the GPL, and Apache is compatible with version 3 of the GPL.

              So yes, CDDL is not compatible with GPL, but it's not because there is anything exceptional about the GPL itself.

          2. oldcoder

            Re: @ Martijn Otto - You mean btrfs, surely

            BSD also allows them to take any improvements and make them proprietary...

            GPL does not.

        3. Anonymous Coward
          Anonymous Coward

          Re: @ Martijn Otto - You mean btrfs, surely

          > I just wish Oracle would change the licencing of ZFS out so it can be included with distros by default,

          Sod off!

          At the most, maybe a dual-licensing setup. Many are grateful that the legalistic cancerous GPL hasn't been used.

          Here's a thought: Why not instead wish your system wasn't licensed under such a restrictive environment, rather than have the arrogance to expect others to change their code to fit your narrow-world license?

          1. Adair Silver badge

            Re: @ Martijn Otto - You mean btrfs, surely

            In reply to: 'Here's a thought: Why not instead wish your system wasn't licensed under such a restrictive environment, rather than have the arrogance to expect others to change their code to fit your narrow-world license?'

            Quit your whining. You're free to pick whatever licence suits you philosophically and economically. If someone else doesn't like the licence you have chosen, well they're always free to go away and do the work you have done and give the result a licence that suits them.

            The GPLs suit certain purposes, they're not supposed to be a panacea.

            1. Anonymous Coward
              Anonymous Coward

              Re: @ Martijn Otto - You mean btrfs, surely

              > In reply to: 'Here's a thought: Why not instead wish your system wasn't licensed under such a restrictive environment, rather than have the arrogance to expect others to change their code to fit your narrow-world license?'

              > Quit your whining. You're free to pick whatever licence suits you philosophically and economically. If someone else doesn't like the licence you have chosen, well they're always free to go away and do the work you have done and give the result a licence that suits them.

              Funnily enough, your reply reads more like an agreement with the post you are replying to. It's the PREVIOUS poster that was whining about the license being chosen.

              I assume English isn't your first language?

              1. Adair Silver badge

                Re: @ Martijn Otto - You mean btrfs, surely

                RE: 'Funnily enough, your reply reads more like an agreement with the post you are replying to. It's the PREVIOUS poster that was whining about the license being chosen.

                I assume English isn't your first language?'

                Old wisdom: never assume anything.

                There's nothing wrong with my English, thanks. But comprehension could sometimes be better---especially after 48+ hours on the go travelling round the planet. Well, that's my excuse anyway.

                I stand rightly corrected.

          2. Teiwaz Silver badge
            Childcatcher

            Re: @ Martijn Otto - You mean btrfs, surely

            "At the most, maybe a dual-licensing setup. Many are grateful that the legalistic cancerous GPL hasn't been used."

            @ Anonymous Coward

            - Wow - Mr Balmer, here in person (quick, hide the chairs)

        4. fuzzie

          Re: @ Martijn Otto - You mean btrfs, surely

          I've been prowling trying to decide between ZFS and btrfs for my home PC for a while and the biggest stumbling block is ZFS' RAM requirements (it's a small file server, ~5TB disk, but "only" 8GB RAM). SailfishOS uses it for the Jolla phones/tablets, so it must be production ready enough, but it doesn't like that's your experience or does the rebalancing trick fix it?

          1. Anonymous Coward
            Anonymous Coward

            @fuzzle Re: @ Martijn Otto - You mean btrfs, surely

            ZFS. I did the same a few years ago for a small-office server and found BTRFS nice, in theory, but like one of the posters above I quickly got over it after I was told my disk was full when it wasn't (and worse, remained full after deleting some files). With no apparent way to remedy this I tried zfsonlinux and it's a joy to work with - if Git were at one end of the principle-of-least-surprise spectrum, ZFS is at the other. The best word I can come up to describe using it is "obvious", and when it's not the documentation is excellent.

            We use this box for a VM host so very, very rarely upgrade the kernel, but when I've done so it's gone smoothly. One of the best decisions we made was partitioning each SSD in the machine identically: a small boot partition which is EXT4 and RAID-1 mirrored on each drive, and the large secondary partition given over to ZFS. No fannying around with USB boot drives or other single points of failure - I can always boot and log in, even if ZFS fails to come up, and although ZFS claims to run better when given a whole disk I haven't noticed any lag.

            I can't comment on use in a restricted memory environment, but I doubt you'd have issues with 8GB.

          2. John Brown (no body) Silver badge

            Re: @ Martijn Otto - You mean btrfs, surely

            "the biggest stumbling block is ZFS' RAM requirements (it's a small file server, ~5TB disk, but "only" 8GB RAM)."

            That's not a problem. Really. I've been running a 6TB home server with 1GB of RAM and RAIDz (4x2TB) for two years now. I *think* I may have set a tuneable to limit the RAM used by ZFS, not sure, it's pretty much fire and forget. The high RAM requirement comes into play when you have lots of files being accessed.

            I have a backup process running as we speak rsyncing to another RAIDz pool copying ~4TB while watching video streamed to the TV and transcoding some video on a desktop from files on the server. The ARC is using 100MB and, unusually, there is 500MB of swap in use. Normally there's no swap in use but this is a "heavy load" in terms of my normal day to day use of the server.

            On the other hand, I just order an 8GB ECC RAM stick for it. It'll be interesting to see if there are performance difference once tuned for the extra RAM.

          3. Alan Brown Silver badge

            Re: @ Martijn Otto - You mean btrfs, surely

            "it's a small file server, ~5TB disk, but "only" 8GB RAM"

            For a home server that isn't "busy", you'll get away with it. Just don't enable compression or deduping.

          4. ragnar

            Re: @ Martijn Otto - You mean btrfs, surely

            If it helps you with your decision, ZFS is fine with 8GB of RAM on my 8TB server. I think it's deduplication that gobbles RAM like it's going out of fashion.

          5. ragnar

            Re: @ Martijn Otto - You mean btrfs, surely

            If it helps, 8GB is more than enough for a 5TB file server. I have an 8TB file server with 8GB of RAM and it's just fine, unless you need the deduplication feature.

        5. John Brown (no body) Silver badge

          Re: @ Martijn Otto - You mean btrfs, surely

          "I just wish Oracle would change the licencing of ZFS out so it can be included with distros by default, instead of being cast out into a legal wilderness as it is now. "

          You could switch to FreeBSD. ZFS is built in. The installer works for ZFS on boot device too.

          The problem isn't the ZFS licensing. The problem is GPL requires that everything being integrated into a GPL system becomes GPL too and not all licenses will allow that change in status. Maybe if GPL was loosened a bit things would be a bit better.

        6. admiraljkb
          Boffin

          Re: @ Richard_L - Your BTRFS issues used to be mine

          Yep, been there. No idea why SuSE put BTRFS as the default when they did (SuSE 11.2), when the tools weren't ready, and they pulled EXT4 support out of the kernel on the same release... Unfortunately, SUSE has been making experimental filesystems default for a while, which is inappropriate for a Enterprise product. (SuSE 10 has ReiserFS as a default...)

          Kernel 3.18 mostly fixes your issue along with formatting the filesystem with a 16K leaf/node size which finally became default with the newer tools. The 4K leaf sizes that the SuSE releases cause never ending suffering like yours and reduced performance. Sorry, I'm now off SuSE, so I'm not tracking their Kernel any longer, so no idea if they've backported crap or not. Meanwhile the following will recover the wastage:

          btrfs balance start -dusage=55 /mnt/your_btrfs_fs &

          This guy has some good BTRFS tips:

          http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html

        7. Alan Brown Silver badge

          Re: @ Martijn Otto - You mean btrfs, surely

          "I've been using BTRFS as my filesystem on SUSE and it's a pain in the arse. "

          I tested BTRFS for a couple of months. After the 3rd total data loss I switched to ZFS and that's been rock solid stable no matter what I've thrown at it (including dodgy SATA controllers)

          As for Suse: That's the company which sells an enterprise linux and who ran away when things got too difficult for them in our cluster environment (almost literally,. They simply stopped responding entirely - including refusing to respond to Novell management when we escalated it)

      5. Alan Brown Silver badge

        Re: @ Martijn Otto - You mean btrfs, surely

        "Did Oracle release ZFS under GPL in full?"

        No, CDDL - essentially incompatible with GPL for distribution, but fully open.

        (The incompatibility means that you can't distribute a linux distro with ZFS built in, but you can add it during the setup phase)

        There are ZFS repos for most distros. I'm using it on Ubuntu and RHEL6

      6. This post has been deleted by its author

    3. Cloud 9

      Re: Ah, so it's like ZFS but unfinished.

      RE: "Let's reinvent the wheel!"

      I thought that ZFS on Linux ran in user space - hence not fully integrated with the kernel?

      And if bcachefs does things better, why not? BBC do TV - should Sky not do TV? BT do 'internets' should Virgin not? What other things are being undertaken by multiple people that are worthy of ire and scorn? Perhaps we should only have one Linux distribution?

      I suspect that bcachefs will offer new opportunities for granular levels of caching. The fact that it's incorporating things such as snapshots is purely because these are the new baseline for file system requirements.

      1. Nigel 11

        ZFS on LInux

        Yes. ZOL (ZFS on Linux) runs ZFS in user-space. That gets around the licensing issues, because it's not a non-legal derived product of both CDDL and GPL-licensed code. But running in userland reduces efficiency ( and probably makes it impossible to access ZOL filesystems early in the boot process).

        Just because two free open-source licenses both permit you to use code without payment, doesn't mean that one can produce and distribute a work derived from both. In this case GPL is the more restrictive. You'd have to get the whole Linux kernel re-licensed or dual-licensed under CDDL in order to merge ZFS in the kernel. This is practically impossible. So no ZFS in the Linux kernel. Sad.

        There's also a fork in ZFS licensing. When Oracle took over, it ceased to distribute newer versions of ZFS under CDDL, and nobody trusts Oracle! So now ZFS refers to two feature-incompatible filesystems (with a common base feature set). Also sad.

        BTRFS is coming along quite nicely. I've been using it for one level of backing up (huge numbers of snapshots), with other backups in case the btrfs falls apart. So far it's worked perfectly for me. Not sure why another similar FS is needed, but competition between projects is probably a good thing.

        1. Bronek Kozicki

          Re: ZFS on LInux

          @Nigel 11 you are mistaken, ZOL runs in Linux kernel. If you look carefully there is a number of kernel modules, among them spl (GPL-licensed Solaris Portability Layer) and zfs (actual filesystem). It does use user mode (notably tools for the user zpool and zfs, and calls from kernel to /sbin/mount to handle automatic mounting of snapshots) but the actual filesystem runs in kernel mode only.

          Which is why it is actually possible to do both 1) boot from ZFS (difficult and little benefit) 2) keep your root filesystem i.e. / on ZFS (not difficult, lots of benefits).

    4. David Austin

      Obligatory XKCD Link

      https://xkcd.com/927/

  4. Anonymous Coward
    Anonymous Coward

    sounds good

    Many years ago I looked into how easy it would be to implement a storage layer with COW/snapshots on Linux. Given my lack of skills, not very. Only the other day I was reminded of this and how relevant it would be today given things like more ubiquitous flash storage (append-only stores suit them) and even self-encrypting ransomware (where automatic checkpointing could be as good as a backup, even if all it does is alert you to the large amount of disk writes). At the time I'd only been considering file system snapshots as an aid to making coherent backups on a live system, but if the fs is designed from the ground up to incorporate a robust history of all file changes, it could be really useful.

  5. Steve Davies 3 Silver badge

    The modern day YACC?

    now it is

    YABLFS

    Do we really need this? IMHO, answers could fit on a pin-head

    1. Bronek Kozicki
      Coat

      Re: The modern day YACC?

      I think that, actually, there is one good reason why we "need" this. It is checksumming.

      In the "olden" days when the most common filesystem features were designed (tree structure, file attributes, security etc.), disk space was scarce but disk reliability, compared to amount of stored data, was quite fine (and if it wasn't, you were supposed to restore data from backup). I think it is slightly different now : disks are cheap (compared to price of other components), amount of stored data grew by many orders of magnitude and disk reliability has fallen behind. Worst of all, because of the data recovery logic implemented in disk firmware, they now will take "educated guess" at the state of data and then silently move it to spare sectors/blocks, rather than tell you to restore data from backup. Hence, bitrot is taking place at a much higher rate (compared to size of whole datasets) than anticipated by architects of early filesystems. And it is occurring silently.

      I am surprised that so few filesystems have checksums now; if a new filesystem is needed to demonstrate how important this feature is, so be it. It might not be popular or even useable, but it might just prompt authors of other, more established filesystems, to eventually implement this feature.

  6. Khaptain Silver badge

    Advantages for the Average Joe

    Not being very up to date on the "fors" and "againsts" of filesystems, what advantages would bcachef bring to the avergage Lamda type user.

    [Serious question]

    1. Suricou Raven

      Re: Advantages for the Average Joe

      COW lets linux do the 'previous versions' thing that Windows can do, but better. It makes rolling back system changes or documents trivial.

  7. Six_Degrees

    Just what Linux needs - another half-finished, half-baked filesystem.

    After the embarassingly premature square-wheeled rollout of btrfs, the stench of poorly built, poorly managed filesystems has permeated the Linux landscape

    1. Anonymous Coward
      Linux

      I thought it was the stench of poorly built, poorly managed desktop window managers that permeates the Linux landscape. My bad :)

      1. Teiwaz Silver badge
        Linux

        (WTF?) Managers

        Very poor.

        Do you moan when you visit a sweet shop and they have a huge range of sweets, many of which you don't like? Clove Rock or Black Jacks may be an acquired taste to many, but others like them.

        It's much the same with Window Managers on BSD and Linux, there's a plethora of unusual flavours, and yes, some never got their recipes perfected, others are from companies that went out of business, but as long as you can find the code in an archive you can potentially try a new recipe and see if it's sweet, sour or tangy..

    2. bri

      Nobody forces you to use it

      When Red Hat goes to XFS (!) as their primary system it shows that in this space Linux sorely lacks. Development of complex POSIX FS takes long time and the best hope (as of today) is btrfs. Which has some fundamental problems of its own. It will take time, maybe even major redesign to make it work. If something takes ten years to develop and we are not limited in number of people, then what's wrong with pursuing multiple strategies?

      Kernel is fairly modular. You can turn off features, you know?

      1. BinkyTheMagicPaperclip

        Re: Nobody forces you to use it

        xfs now is not the same as xfs on an SGI box in the 90s, and its performance and reliability is pretty decent. It's only a stopgap solution, however, as ridiculously large disks will start hitting hard limits in the future.

        1. InsaneGeek

          Re: Nobody forces you to use it

          Stopgap solution because of ridiculously large drives??? I'm guessing you are meaning raid protection as a stopgap not XFS, because XFS can scale upto 9 exabytes: i.e. 9 million terabytes, which is more than a million times larger than the largest capacity drive out today. Pretty sure you can't attach anywhere near a million drives to a system anytime soon and it's going to take a long time for capacity of storage to get a million times larger than it is today (particularly with the dramatic slowing in Kryder's law)

    3. Anonymous Coward
      Anonymous Coward

      People were like "WTF" when Linus rolled out GIT, yet it has taken off quite well.

      There are loads of filesystems and one that suits embedded but also doesn't require the Microsoft FAT tax will be a good thing.

      1. Jamie Jones Silver badge

        Companies don't use FAT for its suitability, but the fact you can plug your device/card into any windows machine and it 'just works' without needing drivers.

        Android using FAT for its sdcards causes all sorts of security problems, there's no reason to use it apart from compatibility

      2. tekHedd

        Past tense?

        That's funny, I'm still like "WTF" every time I use GIT. So convoluted!

  8. Frank Rysanek

    any improvements for "a myriad threads reading slowly in parallel" ?

    There's one use case which traditionally used to be a problem for cheap spinning rust in servers: multiple (many dozen or hundred maybe) slow parallel threads, each reading a sequential file from disk drives. For optimum throughput, the FS and the OS block layer should minimize the seek volume required. With enough memory for read-ahead, it should theoretically be possible to squeeze almost the sequential rate out of a classic disk drive. A couple years ago, Linux wasn't there. Too many out-of-sequence seeks for metadata, read-ahead not aligned on stripe boundaries in a RAID (there were other unaligned things if memory serves), no I/O scheduler really tuned for this use... There was allegedly some per-flow read-ahead magic in the works, but I have no news. Not sure if a new FS even has a chance of improving this. Not that anyone has claimed any such thing, regarding bcachefs or otherwise.

  9. Anonymous Coward
    Anonymous Coward

    FAT-free

    They could start by engineering out the need for the FAT filesystem, therefore eliminating the Microsoft patent revenue.

    1. bazza Silver badge

      Re: FAT-free

      It's a pity that this ever came about in the first place, and was entirely avoidable. All it needed was for someone to do a decent ext2/3/4 (or whatever) file system driver for Windows and then there would have been no need for FAT in cameras, mobiles, etc.

      Of course, there isn't a good finished free one. If the vendors pursued by MS clubbed together to make one it would make things cheaper for them all. It's typical of the short term cost conscious thinking that a lot of companies exhibit, instead of the more ambitious we'll-win-in-the-end-we-can-beat-the-incumbent long term advice that engineers routinely give only to be routinely rejected by company boards.

      1. thames

        Re: FAT-free

        @bazza - The reason why cameras and other such things use FAT is because Windows already has FAT without having to load a special driver. There's lots of other files systems already available which are more suited to the application from a technical standpoint, but Microsoft has no incentive to include anything which would improve compatibility with other operating systems. Thus, we are stuck with FAT.

        1. bazza Silver badge

          Re: FAT-free

          @thames,

          No, we are not stuck with FAT. The only reason FAT is used is because too many people lack the imagination to see that there is Another Way.

          If there were a free, well known, acknowledged and widely accepted ext file system driver for Windows then no one would have to use FAT ever again. If such a driver were available it wouldn't matter a damned what MS did or did not ship, because everyone would be using a driver beyond MS's control for cameras, etc. Whatever concerns there used to be about the inefficiency of squeezing a HDD f/s on to a small SD cards is now irrelevant given the huge capacity of even the cheapest one.

          There are ext drivers available, but they either cost money or are free but incomplete. The cost of assembling a team to do this properly is surely far less than the money all the manufacturers pay to MS to use FAT.

        2. Sandtitz Silver badge

          Re: FAT-free @thames

          The reason why cameras and other such things use FAT is because Windows already has FAT without having to load a special driver.

          The cameras and other devices could just as well use UDF since it's supported by practically every OS since Windows 95.

      2. Mage Silver badge
        Coat

        Re: FAT-free

        I've had an Ext2 driver integrated to windows for nearly 10 years.

        I must see is there an upgrade for Ext4

        1. Named coward

          Re: FAT-free

          You're an ElReg reader - you at least know what a filesystem is. Most people have no clue and do not want to install something extra (ext2 driver) just to get their USB drive/camera to work. And USB drive/camera manufacturers don't want to tell their customers to have to install something extra to get their device to work. I suspect that even if MS were to include an Ext driver now it would be too late - too many devices are already FAT formatted, FAT is already supported everywhere where typical users use their devices. In short, from the manufacturer's point of view: why bother?

          1. Teiwaz Silver badge

            Re: FAT-free

            "Most people have no clue and do not want to install something extra (ext2 driver) just to get their USB drive/camera to work."

            'Most people' would say 'yes' to anything going onto their system as long as their shiny new camera/phone/satnav worked afterwards. I can see the average user just shrug and click 'OK' if asked do they want to install 'pink petunas' as part of device driver install* for a new camera.

            * I am assuming many devices still come with CD or website based install for many pc-linkable devices - I've not used Windows much since 2007 (probably still have a box of CDs full of crap S/W for printers, camera,camcorders,satnavs, mp3players etc. somewhere though).

            1. Named coward

              Re: FAT-free

              Given your assumption I would agree. But most cameras nowadays use MTP and require no installation. USB drives also require no installation.

  10. Steve Davies 3 Silver badge

    About time there was...

    A filesystem for Linux that had file versioning. I used this back in 1978 on a VAX/11780 running VMS.

    I would use it if it was generally abailable. I'm sure it would appeal to more general users than this sort of Filesystem.

    Just MHO though.

    1. bazza Silver badge

      Re: About time there was...

      I can remember having to type 'purge' (or something like that) a lot to keep within my space quota on the VAX we had at university...

      1. Phil O'Sophical Silver badge

        Re: About time there was...

        I can remember having to type 'purge' (or something like that) a lot

        You could set limits using a command like: set directory /version_limit=n for all files in a directory, or set file /version_limit=n for any single file. IIRC the "set file" command came in a later release, maybe 4.x?

    2. Jim 59

      Re: About time there was...

      I used VAX/VMS file versioning in '88 at Sunderland Polytechnic. It was nice, but never seen it since. For a few very important files, I have scripted a poor man's solution.

  11. Anonymous Coward
    Anonymous Coward

    As long as they don't try and hook it into crucial but unrelated parts of the system, as seems to be the trend these days...

  12. Joseph Haig

    Github?

    Not wanting to be picky but if the "project is on GitHub" then why do the links take me to a url I have never heard of?

  13. tekHedd

    "Does Windows need a new file system? Too bad, deal with it."

    I instantly imagined the corresponding MS Windows-oriented article.

    I've been moving large files on a USB stick between a Linux and Windows box this week, and of course NTFS is the only real option. Icky poo. And now all of my other devices (PS3 etc) can not use the USB stick, so I have to reformat it again if I want to use it anywhere else.

    1. Bronek Kozicki

      Re: "Does Windows need a new file system? Too bad, deal with it."

      Since you seem to be google-deficient, here is little hint from me: Paragon ExtFS for Windows

      1. Chika

        Re: "Does Windows need a new file system? Too bad, deal with it."

        I'd forgotten about that! I think I last used Paragon on a W98 box (I might still have the install floating around on one of my drives somewhere).

        As for the rest, I'll keep quiet as I avoided btrfs on SUSE after reading so many horror stories about it...

    2. Steve---d

      Re: "Does Windows need a new file system? Too bad, deal with it."

      "...and of course NTFS is the only real option."

      Samba, NFS, sshfs, ftp, all come to mind.

  14. iOS6 user

    Wow .. after 10 years of availability on the market ZFS as production solution Linux developers are thinking (so) that Linux needs new FS. I'm really in deep shock.

  15. razorfishsl

    Problem with all this "de-duping" and Checksumming is that it really requires hardware pipes to deal with it.

    you don't want to be having to read data from a 6/8/12GBs link into a cpu to try and work out checksums to dedup.

    christ you only have to take a look at something like a synology 'rackstation' to see that it takes several days to rebuild or clean a 20tb array, and to really mess it up, just throw an apple backup partition at it...

    Sorry it needs FPGA/custom SATA pipes to calculate checksums in real time as the data flows over the link... with signaling (checksum) available on the falling edge of the last byte if the block was valid or not anything else is just unacceptable overhead.

    Then you still have extra steps to manipulate pointers for de-dup setups, this shit works fine... when it works fine.... but throw in some deliberate corruption and it becomes a different matter.

    1. phil dude
      Linux

      i'm not sure...

      but I think most modern CPU's have instruction to help checksumming....I mean the Linux kernel tests this when the RAID modules starts up....

      [ 4.829822] xor: automatically using best checksumming function:

      [ 4.869329] avx : 3122.000 MB/sec

      More sophisticated functions might require more work, but there is probably no shortage of silicon for this problem...

      Encryption and checksumming are related logic.

      P.

    2. Bronek Kozicki

      checksumming is cheap, only few fast instructions are needed for that, low memory overhead and tiny (when no errors) I/O overhead. Absolutely no reason to do anything "custom" for it to work, because your bottleneck was and will remain in I/O. It is deduplication (online one) which is hard.

  16. CaptainBanjax

    Im hearing a lot of...

    "Like a database" wishes here...

    Have we forgotten about this:

    https://en.m.wikipedia.org/wiki/WinFS

    I cant even remember why it was canned.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021