Given SUSE's habitual brown nosing of RedHat, I suspect that it will probably bin btrfs eventually.
Red Hat banishes Btrfs from RHEL
Red Hat has banished the Btrfs, the Oracle-created file system intended to help harden Linux's storage capabilities. The Deprecated Functionality List for Red Hat Enterprise Linux 7.4 explains the decision as follows: The Btrfs file system has been in Technology Preview state since the initial release of Red Hat Enterprise …
COMMENTS
-
-
Wednesday 16th August 2017 05:10 GMT thames
I doubt that Suse will want to spend the resources required to keep Btrfs going. There's no market advantage to having it. I cant' see anyone one else stepping in to take over either. Suse will deprecate Btrfs at some time appropriate to their release cycle, and then gradually phase it out.
Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage. Rotating disk will be used more like tape and have file systems to suit that role.
-
Wednesday 16th August 2017 11:40 GMT Doctor Syntax
"Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage."
I think that, in response to malware, we might have to start looking at storage in a new way. Rather than letting any old application write to whatever lump of storage to which the user has access it will need to ask a service to do the writing and the service will ensure that the application has the appropriate credentials.
-
Wednesday 16th August 2017 14:33 GMT Anonymous Coward
@doctor syntax
"Rather than letting any old application write to whatever lump of storage to which the user has access it will need to ask a service to do the writing "
I can't think of any malware or virus that got its way into a system via writing to the filesystem.
Anyway it would be hideously slow due to paging data around the kernel and because of this will have to have exceptions for programs to write direct (eg for RDBMS's) so making it full of potential security issues which will be yet another thing for admins to worry about.
-
-
Thursday 17th August 2017 08:25 GMT Anonymous Coward
Re: @doctor syntax
"And once it gets in it never does stuff like, let's say, overwrite all a user's files?"
And your point is what exactly? Have you never heard of file permissions. Anyway, the filesystem is not a security weakness, its a basic facility. Also on unix/linux it would be quite easy to intercept any filesystem calls using LD_PRELOAD when test running a binary, having another layer on top of the filesystem solves nothing. Even MS realised this when they dumped WinFS.
-
-
-
Wednesday 16th August 2017 18:28 GMT elip
But you've just described exactly what a file system does.
Whether you have a regular human user account with access to the data, or a service account/application token writing to the data store, the file system is responsible for the reads/writes/access enforcement. Why would we want even more abstraction?
-
Wednesday 31st January 2018 06:27 GMT Tom Samplonius
"I think that, in response to malware, we might have to start looking at storage in a new way. Rather than letting any old application write to whatever lump of storage to which the user has access it will need to ask a service to do the writing and the service will ensure that the application has the appropriate credentials."
Congratulations, you have just discovered SELinux. SELinux can enforce file access on a per application basis, plus network access, ports, etc.
-
-
Wednesday 16th August 2017 11:44 GMT TVU
"I doubt that Suse will want to spend the resources required to keep Btrfs going. There's no market advantage to having it. I cant' see anyone one else stepping in to take over either. Suse will deprecate Btrfs at some time appropriate to their release cycle, and then gradually phase it out".
However, they have in-house experts and I expect that they'll keep on working on Btrfs.
As for Red Hat, I get the distinct impression that they are still not happy with the development and stability of Btrfs and so they are switching to the more mature XFS which has been around for years. They will be developing XFS and their recent acquisition of Permabit is entirely in line with a long term XFS development strategy.
-
Wednesday 16th August 2017 19:50 GMT Alan Brown
"Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage."
That's already where things are headed
"Rotating disk will be used more like tape and have file systems to suit that role."
Which is a pretty good description of the area that ZFS fits in (Spinning media fronted by gobs of flash as cache, with checksumming and proactive correction at every level to combat the inevitable errors that creep in every 45TB or so a disk reads. Compression, dedupe and encryption are all optional). Think of it as a faster Hierarchical filesystem
Tape is still king in the archival format area though - and will be for a long time to come. Nothing beats it for cold storage.
-
Wednesday 16th August 2017 23:40 GMT pwl
@thames
"There's no market advantage to having it. I cant' see anyone one else stepping in to take over either. "
- every admin command on SLES including OS updates gets a snapshot before & after thanks to btrfs. any errors can be rolled back virtually instantaneously. downtime due to errors is significantly reduced. in the event of catastrophic failure, you can recover from a previous "good" on-disk version of the OS : no reinstall/rebuild needed. only SUSE offers this on enterprse linux
- redhat's contribution to OSS is always important, but the main contributors to Btrfs were always Facebook, Fujitsu, Intel , Oracle, SanDisk, Netgear, & SUSE ... none of those companies have announced dropping work on the project...
-
Thursday 17th August 2017 03:57 GMT dajoker
Btrfs Market Advantage
The market advantage is definitely there if you are interested in enterprise products, which is why it is a little weird that RedHat would not want to be in on that, but their decision to not include it is about one drop in the overall bucket of contributions to the Btrfs filesystem. Go check lkml and see who is involved and maybe the reasoning for dropping Btrfs becomes more-clear as they lack expertise and may want to cut costs rather than employing people who know it well, instead focusing on other technologies that they know better despite lost functionality/
In the meantime, I've saved more than a couple servers, and my own laptop, from needing to be reinstalled thanks to Btrfs. Whether it's a bad patch (even a bad kernel), a bad use of the great tools that mange the system (which automatically snapshot), or user error in some other way, reverting a snapshot is just awesome. Also I get a lot of pleasure from comparing before/after snapshots from non-RPM-packaged software to see what they really do to my system. Comparing in realtime, both before and after snapshots from the running system, is just wonderful. I've even seem clients using it for forensics, finding out what some miscreant did, or tried to do anyway, to cover up tracks, giving investigators exactly what they needed to prosecute all because the snapshot recorded changes made (creates, modifies, deletions) that were done to avoid incrimination; it's like somebody making a lot of all the things they want to do and handing it to the authorities.
Disclaimer: I worked at SUSE several years ago, and am now a consultant on, among other things, SUSE and RHEL server products.
-
Thursday 17th August 2017 19:32 GMT AdamWill
Optional
Note: I work for Red Hat, but not on the relevant teams, so I absolutely don't have the authoritative answer on this, it's just my impression.
"Go check lkml and see who is involved and maybe the reasoning for dropping Btrfs becomes more-clear as they lack expertise and may want to cut costs rather than employing people who know it well"
My impression is it's kind of the other way around. Generally we hire people to work on stuff we're interested in shipping. There was a period when there was quite a lot of belief within RH that btrfs was The Future, and IIRC, at this time, we actually did employ multiple folks to work on btrfs full-time. (This was the period when it was a running joke that btrfs was going to be the default in Fedora in the *next* release...every release...it kept getting proposed as a feature, then delayed).
Over the next several years, enthusiasm for btrfs just kinda generally declined internally, because it always seems to be perpetually not quite ready and running into inconvenient problems like eating people's data. (Again, this is not my main area of focus so this is only a vague impression I have, I'm not in a position to cite lots of data or state this authoritatively; I'm not going to get into a "who's right, SUSE or RH?" debate because I just don't have the expertise and I have nothing at all against SUSE), and in this time, most of the development resources we had for btrfs got reallocated elsewhere.
So it's not so much "we don't want to ship btrfs because we don't have anyone to work on it" - I mean, it's not like we don't have the money to hire some more storage engineers if we *wanted* to - it's rather more "we're not hiring people to work on btrfs because we decided we don't think it's the right horse to bet on any more".
-
-
Wednesday 13th July 2022 10:39 GMT cosmodrome
f2fs?
> Given the way that technology is going, I suspect that the future is going to involve file systems that were designed specifically for flash storage
F2FS has been around and stable for years. I've been using it on USB drives and SSD for quite a while, so have Samsung on their smartphones, AFAIK.
-
Wednesday 16th August 2017 23:32 GMT pwl
probably shouldn't feed the trolls...
...but RedHat tends to follow where SUSE quietly leads without fanfare, such as providing xfs in the standard bundle (rather than 000's extra per cpu), using corosync/pacemaker for HA (after a 10+ year lag), showing an interest in Ceph (2 years after SUSE signed an enterprse deal with inktank), and properly supporting other platforms like IBM z.
Btrfs isn't going away from SUSE - it's a shame RedHat won't contribute anymore, but since most of the involvement was from Fujitsu, SanDisk, Intel, Netgear, Facebook, and others as well as SUSE, it probably won't slow things down too much.
Meanwhile, if you want full-OS snapshot/rollback on your Linux, it looks like SUSE will be your only choice for the forseeable future...
-
Sunday 18th November 2018 02:18 GMT hrudy
Re: probably shouldn't feed the trolls...
Having switched recently to OpenSuse and SLES, Btrfs has quite a learning curve. Some of their features like subovols almost seem like a solution in search of a problem. Perhaps the btrfs community and Suse has done a poor job in clearly explaining what the advantages (and Liabilities) of btrfs are.
One enterprise user I talked to said that his metadata fills up on his root volume and locked up his system, I know that you can run janitor programs that balance the system. However, it seems that these systems are tuned to resolve esoteric problems like bitrot and system rollback after updates but create more issues for the Sys Admins then they solve. I know that Kernel 4.19 has improvements for btrfs. However, t unless you are running OpenSuse tumbleweed or unless Suse decides to bacport thes fixes for SLES 15 not going to help in the short term future.
-
-
Thursday 17th August 2017 03:57 GMT Anonymous Coward
Brown-nosing... what a crock
While it's both easy and popular to think the Linux world revolves around [your distribution of choice here] let's not somehow make vacuous claims about one distro vs. another without at least, you know, evidence. SUSE doesn't follow RHEL as is evident to anybody familiar with both products. Sure, they have a lot of common products, like Linux, and OpenStack-based things, etc., but the timelines based firmly in reality and documented online show a lot of non-following, at least by SUSE, when it comes to adopting technologies. The filesystem is definitely no exception; XFS was free and commonly -used by SLES long before RHEL finally made it available to everybody. BtrFS's primary contributor isn't RH, but SUSE, followed by several others, and then possibly RH gets in there with a few commits now and then.
In other news, the [pick some government] announced it would no-longer support Bitcoin for transactions with the government..
-
-
Wednesday 16th August 2017 05:36 GMT Anonymous Coward
After so many version of Fedora that promised brtfs as the default filesystem
Now they're binning it entirely? I find it hard to believe that lack of encryption was the only reason, especially since 1) every enterprise drive already supports encryption and 2) you can implement it using md.
I'll bet this has something to do with politics and Oracle.
-
Wednesday 16th August 2017 05:57 GMT Anonymous Coward
Re: After so many version of Fedora that promised brtfs as the default filesystem
Er, I think it was Google who aren't keen on it for its lack of encryption, not RedHat.
Stallman is going to blow a fuse if ZFS gets widely adopted.
This doesn't bode particularly well for the future of large scale GPL projects. Seems like more and more people are willing to mix code under different licenses even if they're incompatible. The allure of someone else's code can be too great to ignore. If the GPL is to remain influential, and not get routinely ignored, they need to get ZFS in Linux knocked on the head. If ZFS+Linux becomes the norm, unchallenged, it becomes harder to enforce GPL. Especially with court cases like Google vs Oracle which may yet pronounce on loosely similar copyright breaches being "fair use".
-
Wednesday 16th August 2017 06:44 GMT Teiwaz
Re: After so many version of Fedora that promised brtfs as the default filesystem
This doesn't bode particularly well for the future of large scale GPL projects.
- Any license issue that gets in the way will eventually get sorted, maybe not GPL, but something.
I can't actually fathom what was going on around Btrfs - lots of talk about it being the future default filesystem for 'Linux - yet it was left with Oracle?
I'm just waiting to for the next great 'Linux schism when it's announced Lennart Poettering is doing a new filesystem for Red Hat.
-
Wednesday 16th August 2017 10:49 GMT JSTY
Re: After so many version of Fedora that promised brtfs as the default filesystem
> I'm just waiting to for the next great 'Linux schism when it's announced Lennart Poettering is doing a new filesystem for Red Hat.
Don't worry, I'm sure it's on the SystemD roadmap
(Not even sure if the joke icon is appropriate here ...)
-
Wednesday 16th August 2017 07:39 GMT Anonymous Coward
@AC
Stallman is going to blow a fuse if ZFS gets widely adopted.
Being a huge fan of ZFS myself I'd pay to see that ;)
But I'm not sure I fully understand though. I mean, all because of a license? Because ZFS uses a license (CDDL) which people don't happen to like or anything? What happened to the free software philosophy? Because CDDL is just as much an open source license as the GPL is. Sometimes I can't help worry that certain people completely lose focus of the eventual goals.
Seems like more and more people are willing to mix code under different licenses even if they're incompatible.
Can you imagine that... People apparently like the freedom to use the (open) source as they want to use it. One important note though: it's the GPL which is usually incompatible, not the other licenses. Many open source licenses (I'm mostly familiar with the BSD, CDDL and Apache licenses) have no problem at all with mixing things together, as long as the license continues to get respected.
I think that's an important aspect here. GPL demands that everything gets redistributed under the GPL (all newly entered code) whereas the other licenses only demand that the original software simply continues to stay licensed under the same license it was given out with. Hardly an unfair demand to make I think, especially if you keep in mind that mixing that with another license is usually no problem.
-
Wednesday 16th August 2017 09:47 GMT Jon 37
Re: @AC
The issue isn't that "ZFS uses a license (CDDL) which people don't happen to like", the issue is that it is widely believed that distributing a binary that includes both GPL and CDDL parts is not allowed by the licenses, and therefore copyright infringement.
Only a court can eventually rule on this and give us a definite answer one way or another.
As for "What happened to the free software philosophy?", certainly the GNU developers were (and are) very careful not to infringe on anyone's copyrights, and if someone offers code freely available subject to certain conditions they would either follow the conditions or not use the code.
If you don't like that "GPL demands that everything gets redistributed under the GPL", then don't use GPL code. The *BSD distros and kernels are available as an alternative to Linux, and someone could port (maybe has ported?) CDDL-licensed ZFS to them. Alternatively, someone could look at the existing ZFS code and write a decent spec for ZFS, and someone else could implement it in new GPLed code for Linux.
-
Wednesday 16th August 2017 10:13 GMT Bronek Kozicki
Re: @AC
OpenZFS is natively available on FreeBSD. It is the same OpenZFS which is also available for Linux, under "ZFS on Linux" project, and which is included in Ubuntu since version 16.04. The parts of OpenZFS which are interacting with Linux kernel are doing so via modules called Solaris Porting Layer, which are all licensed under GPL (and not CDDL).
-
Wednesday 16th August 2017 11:46 GMT TVU
Re: @AC
"OpenZFS is natively available on FreeBSD. It is the same OpenZFS which is also available for Linux, under "ZFS on Linux" project, and which is included in Ubuntu since version 16.04. The parts of OpenZFS which are interacting with Linux kernel are doing so via modules called Solaris Porting Layer, which are all licensed under GPL (and not CDDL)".
^ Absolutely this. This is why it is unlikely that there will be any challenge and why it will be even less likely that any such challenge will succeed.
-
-
Wednesday 16th August 2017 10:32 GMT Anonymous Coward
Re: @AC
>> Because CDDL is just as much an open source license as the GPL is. Sometimes I can't help worry that certain people completely lose focus of the eventual goals.
Those people are called bureaucrats and lawyers. Their goals never change, they exist only to make more work for themselves, and they love arguments like this. Normal people would just say "It's public domain, go have fun".
All this fannying about over "my license is better than yours" reminds me of nothing more than the various "Palestinian" organisations in Life of Brian.
-
This post has been deleted by its author
-
-
Wednesday 16th August 2017 17:10 GMT Anonymous Coward
@AC - Re: After so many version of Fedora that promised brtfs as the default filesystem
I gave you a down vote wholeheartedly for your failure to see the separation between developers wish and lawyers will. And also for equating 4 lines of code in Google vs Oracle with the whole ZFS code.
You don't seem to understand much about licensing issues. Problem is not with ZFS adoption in Linux but with rights to distribute ZFS with Linux.
According to your logic even Microsoft's hand could be forced because people really want to use their technologies for free.
-
Wednesday 16th August 2017 19:55 GMT Alan Brown
Re: After so many version of Fedora that promised brtfs as the default filesystem
"If the GPL is to remain influential, and not get routinely ignored, they need to get ZFS in Linux knocked on the head. "
The incompatibilty between them is only if you _distribute_ the two together as binaries. There are a number of packages with this same issue and the response has been pragmatic for years - setup a script to pull in the required items from a separate source _after_ the OS has been installed.
Apart from the feature work, there's been a _lot_ of work put into ensuring that ZFS doesn't touch GPL symbols in the kernel and the reality is that the FS is mature, reliable plus blazingly fast for spinning media (and even faster if you run it on all-flash systems)
-
-
Wednesday 16th August 2017 16:42 GMT Oh Homer
Re: After so many version of Fedora that promised brtfs as the default filesystem
Personally I'm breathing a huge sigh of relief, as the ironically named btrfs is an utter clusterfsck, in my experience.
Sorry, but any filesystem that gets heavily fragmented by design is definitely not on my Christmas list, especially when this results in a performance degradation so crippling that I have to pull the plug, risking massive filesystem corruption in the process, then don't even have a fully functional (or even safe) fsck utility to recover afterwards.
I'm shocked that Red Hat took this long to finally ditch it.
ZFS has its own problems, of course, and I'm not even talking about the dreaded CDDL. For a start, it has a massive memory overhead. It also seems to be impossible to tell how much, if any, free space is available, due to the strange way that it views storage, and worse still there doesn't seem to be any way to reclaim "freed" space either.
Frankly I'm not sure what problem these highly dysfunctional filesystems are trying to solve, but whatever it is they don't seem to have succeeded, so I'll just stick with ext4. Thanks anyway.
-
Wednesday 16th August 2017 20:21 GMT Alan Brown
Re: After so many version of Fedora that promised brtfs as the default filesystem
"ZFS has its own problems, of course, and I'm not even talking about the dreaded CDDL. For a start, it has a massive memory overhead"
It's designed to be used on FILE servers, not your average desktop system. It also starts on the premise that "Disks are crap, cope with it and don't be a snowflake" - it doesn't demand you put in premium components and then have a tantrum when one goes down. It _expects_ things to break or give errors and heals automatically, on the fly when using commodity components (the only "critical" demand is ECC memory and you're not going to have that on your average desktop or laptop anyway). It plays to the strengths of hard drives and remediates their weaknesses using memory and flash drives to allow sustained simultaneous high IOP and high throughput activity.
That memory is cache, with even more cache on SSD and it will maximise cache whenever possible, in line with its mission of being a fileserver's filesystem (You can tune the memory usage down but why? The more memory you throw at it, the better it becomes!)
The advantage of doing it this way isn't obvious on your desktop or laptop (apart from the block checksumming, which is worthwhile by itself), but it shows up in spades when someone puts 200,000 files in one directory or you have thousands of users banging hard all over a 400TB directory tree containing 500 million files. This is why outfits like Lawrence Livermore laboratories have invested so much effort into it - and why I dropped £120k a couple of years back on a dedicated ZFS fileserver that has 256Gb of ram and 1TB of fast SSD cache
Yes, you could run ext4 on this system - If you don't mind periodic downtime for housekeeping (fsck), and add the expense of an expensive fast hardware raid controller, or the complexity of MD-raid (which only allows 2 stripes instead of ZFS's 3 - and that's a big deal when you run 100TB+ installations as we've lost RAID6 arrays before) plus LVM, plus myriad individual filesystems to herd. You'd find that the overall performance with the same amount of memory would be substantially lower and if you want to try and match ZFS you'll have to do a lot of fiddling with dm-cache, or put your writes at risk of power failure/crash. After all that, you'll still have it fall over when a user puts 250,000 files in one directory.
After spending years nursemaiding systems which suffered poor latency and got temperamental when users piled on the load it's a relief to have one which doesn't suddenly slow down 90%, pause for 4-5 minutes because a user did something stupid, (or become unstable), or be a major headache when a disk flipped a few bits (or died) - and for less money than comparable clustering "solutions" - which bitter experience has shown are not fit for purpose.
ZFS is the right tool for the job it's designed for. Putting it on a low-load, low-memory desktop or laptop is on par with using a bucketwheel excavator when you only wanted to shift a wheelbarrow load. It will do it and it can be tuned to do it relatively well, but that's not what it's intended for.
-
Thursday 17th August 2017 23:10 GMT Oh Homer
Re: "It's designed to be used on FILE servers"
Tell that to the masses who seem to think that ZFS is simply the better alternative to btrfs on the Linux desktop, for whom ZFS is the standard response to all criticism of btrfs, and who then set about defending the CDDL to allay fears that mass adoption might be stymied by licensing issues.
No, I don't get it either, which is why I like to constantly remind everyone about how unsuitable ZFS is for the Linux desktop.
-
-
-
Wednesday 16th August 2017 17:06 GMT John Sanders
Re: After so many version of Fedora that promised brtfs as the default filesystem
>> I'll bet this has something to do with politics and Oracle.
And lack of robust RAID5/6, or robust anything for that matter.
I was once in love with BTRFS, but time and time again I found myself going back to mdadm + lvm, or just pure lvm.
I wanted BTRFS to succeed, but how many years its been in beta?
In the same time ext4 has gained all sort of niceties (like built-in ACLs or native encryption) and at the current rate we'll end getting native snapshots or Raid functionality way before BTRFS is ready.
And yes I know Ted Tso (maintainer of ext4) said BTRFS is the future.
-
Thursday 17th August 2017 19:39 GMT AdamWill
Re: After so many version of Fedora that promised brtfs as the default filesystem
Few points:
1. If you look at the proposed / accepted 'Features' / 'Changes' for the last several Fedora releases (we changed the name from 'Features' to 'Changes' a few cycles back...) you'll notice that whole circus of 'we promise it's going to be the default next release!...no, wait, we're delaying it again' stopped happening several releases back; it hasn't actually been proposed as a feature/change for several releases. Which some savvy folks interpreted as something of a sign about RH's declining keenness on btrfs, and...well, I guess now it's not revealing any secrets to say they weren't wrong. :P
2. btrfs isn't being 'binned entirely' from *Fedora*; this announcement is specific to RHEL. Its status in Fedora for a long time has basically been "it's there, and it's approximately as supported as any other filesystem which is included and selectable from the installer but isn't the default", and that's still its status at present. Though I know the installer team has made noises about how their lives would be rather easier if they could kick it out of the installer again, I don't think that idea's live *right now*.
(Also FWIW, which is very little as I'm certainly not plugged into to all the internal channels on this, I don't *think* there's anything particularly political about this, it's purely the case that our storage folks have been kinda gradually losing confidence in btrfs being really good enough for our customers for some time.)
-
-
Wednesday 16th August 2017 06:28 GMT Anonymous Coward
ZFS is the right choice for a server system
ZFS contains a superset of btrfs features, plus a great number of nice bits for dealing with large and complex disk subsystems and NFS file serving. If your target is beefy enterprise servers (it's RHEL, after all!), deprecating btrfs in favour of ZFS seems to be an obvious choice.
On a client system, I'd choose btrfs over ZFS any day - it still has the COW, the snapnoshots, and the cloning, which accounts for 99% of what I need on a client. It also has a much lighter resource footprint, and is harder to misconfigure.
Sometimes, an apple is just an apple.
-
Wednesday 16th August 2017 08:20 GMT Anonymous Coward
Re: ZFS is the right choice for a server system
Actually no.
ZFS is NOT automatically "the right choice for a server system".
For a start. The very founding principle of ZFS (that many people forget) is that it was designed as, and continues to be maintained as a JBOD DAS file system.
If you are running an abstraction layer over your storage (such as a RAID controller, as many people do), then running ZFS on that is very much not recommended and WILL (not if), come back to kick you in the backside one day.
-
Wednesday 16th August 2017 12:00 GMT Anonymous Coward
maintained as a JBOD DAS file system
ZFS does implement RAID features at the file system level instead of the hardware level. That's necessary to implement the resiliency features, which are more sophisticated than what RAID implements in its firmware.
One advantage is you can move the disks from one controller to another, and mount them without issues. One disadvantage is expensive RAID controllers or enclosures may be useless, and the CPU/RAM requirements are high.
Anyway, data are still distributed across disks. It's not a JBOD where data are on a single disk. You don't create a JBOD at the RAID level, you leave the disks as single ones, and ZFS will manage them.
-
Wednesday 16th August 2017 16:03 GMT Dazed and Confused
Re: maintained as a JBOD DAS file system
> One disadvantage is expensive RAID controllers or enclosures may be useless, and the CPU/RAM requirements are high.
CPUs and CPU licenses are far more expensive than a HW RAID controller and not only that they are slower too when it comes to things like the checksum calculations. These jobs are better off offloaded to a dedicated piece of HW IMHO.
-
Thursday 17th August 2017 04:45 GMT fnj
Re: maintained as a JBOD DAS file system
CPUs and CPU licenses are far more expensive than a HW RAID controller and not only that they are slower too when it comes to things like the checksum calculations. These jobs are better off offloaded to a dedicated piece of HW IMHO.
Years ago, there used to be SOME validity to this. It's long gone now. Today's CPUs can burn through checksumming and parity calculations much faster than crappy RAID controllers can, and the load is inconsequential.
-
-
-
Wednesday 16th August 2017 17:53 GMT Daniel B.
Re: ZFS is the right choice for a server system
For a start. The very founding principle of ZFS (that many people forget) is that it was designed as, and continues to be maintained as a JBOD DAS file system.
This is actually a feature. You simply stick disks into your system, and set up zpools with RAIDZ1/2/3 instead. You'll get exactly the same functionality offered by RAID5/6, but without the dependency on the RAID controller. Ever had a RAID controller failure? Back in 2009, I found out that fakeraid controllers do weird stuff and thus their "RAID" arrays can't be read by other controllers, only the ones from the same brand/chipset you originally used.
ZFS pools can be imported to any system and will always work.
So yes, I'd rather have ZFS on raidz2 than a RAID controller that might leave me SOL if it breaks down and I can't get the same chipset when it does.
-
Wednesday 16th August 2017 20:42 GMT Alan Brown
Re: ZFS is the right choice for a server system
"Back in 2009, I found out that fakeraid controllers do weird stuff and thus their "RAID" arrays can't be read by other controllers, only the ones from the same brand/chipset you originally used."
More recently I found the same problem with high end Adaptec controllers (£2k apiece at purchase in 2010). Much time and effort was spent trying to reassemble the raidset before giving up, restoring from backup tapes and dropping a HBA into the box. We found that on the originally installed hardware (E5600 based), MD-RAID6 was significantly faster than Adaptec's battery-backed-with-SSD-cache RAID6 controllers that got so hot you could fry an egg on them - and it did that without even running over 25% on one cpu core under full load.
-
-
Wednesday 16th August 2017 20:32 GMT Alan Brown
Re: ZFS is the right choice for a server system
"If you are running an abstraction layer over your storage (such as a RAID controller, as many people do),"
Then you are not using ZFS as instructed - but that hasn't stopped a number of vendors selling expensive "ZFS servers" which have hardware RAID arrays in them and end up in various states of borkage under high load, especially coupled with the tendency to skimp on memory and cache drives.
ZFS is _NOT_ a filesystem.
ZFS is: An individual disk management system, a RAID manager with up to 3 stripe parity, a volume manager and a filesystem manager all rolled into one (and more).
Bitter experience has shown that £2k RAID array controllers or £40k FC disk arrays all have severe limitations on their performance. Our old FC disk arrays handle 1/10 the IOPS of the same spindle count running on a ZFS system and that's down to both the inability of the onboard controllers to keep up, the pitiful amount of write cache they offer and their inability to prevent writes seeking all over the platters.
-
-
-
Wednesday 16th August 2017 10:08 GMT Bronek Kozicki
Re: ZFS is the right choice for a server system
ZFS provides strong checksums for protection against bitrot - which is one of the main reasons why people are using it. Neither LVM nor XFS provide such protection, hence I do not see such combination as viable replacement for ZFS. With the increasing data storage needs (but without corresponding increase in medium reliability) protection against bitrot is only going to get more important, so the whole direction seems a bit of a non-starter to me.
-
Wednesday 16th August 2017 12:38 GMT jabuzz
Re: ZFS is the right choice for a server system
Yes it does it is called DIF/DIX and if you actually care about bit rot is better than anything that ZFS can ever provide. Mostly because ZFS will only tell you that there is a problem *AFTER* the event. That is if during the write something goes wrong and the data gets corrupted you will only get to find out when you try to read it back, by which time you won't be able to do anything about it, but you will at least know the data is bad.
On the other hand DIF/DIX will stop the corrupted data from being written to the storage device (disk, flash, or whatever comes along) in the first place. It will also highlight any corruption to the data while it sits on the storage device. As such it is a *BETTER* solution than ZFS.
Further ZFS is based around RAID5/6, which is frankly does not scale. Excuse me while I switch to dynamic disk pools.
-
Wednesday 16th August 2017 13:50 GMT John Riddoch
DIF/DIX
Nope, it's not better than ZFS for data protection if you have mirroring or RAID. Here's why:
While DIF/DIX will tell you at time of writing, it does sod-all after the fact, so if your data is corrupted due to any other reason, it will merely give an error (probably a SCSI read error, I'd assume). It won't even try to correct the fault.
Looking at Redhat's note on it, there are limitations on it (direct IO on XFS only - see https://access.redhat.com/solutions/41548). ZFS doesn't have those restrictions. The Redhat doc mentions it as a "new feature in the SCSI standard", so old disks won't support it. ZFS doesn't care what disks you use as long as they appear as an appropriate block/character device.
If you have ANY data corruption on ZFS, it'll detect it on read and if you have multiple data copies (mirrored, RAID-z or whatever), it'll fix it on the fly. If you only have a single copy, it'll error out and tell you which file(s) are unavailable, prompting you to recover those files.
Oracle do recommend you run a zpool scrub periodically (once a week on standard disks, once a month on enterprise level storage) to capture errors - that will also automatically fix any errors on the checksums.
ZFS does have a number of flaws (performance on a full zpool is pretty awful, for example), but it is very good at data integrity.
-
Wednesday 16th August 2017 20:36 GMT Alan Brown
Re: DIF/DIX
"If you only have a single copy, it'll error out and tell you which file(s) are unavailable, prompting you to recover those files."
Even if you only have a single drive, the metadata is replicated in several places by default and you can tell ZFS to store multiple copies of the data too. That's available on _top_ of the RAID functionality for times when you're feeling utterly paranoid.
-
Wednesday 16th August 2017 14:48 GMT Bronek Kozicki
Re: ZFS is the right choice for a server system
if during the write something goes wrong and the data gets corrupted That's one case I do not really care much about, because both software stack and controllers are pretty good at avoiding these kinds of errors (as long as hardware can be trusted - but then ECC memory is not really that expensive and no one really has to use overclocking)
The kind of bitrot I care about, is storing my personal videos, pictures or ripped CDs or other data worth archiving, on magnetic medium, which then silently gets corrupted few years down the line. If stored on ZFS with data redundancy, then not only the error will be detected, but also the original data will be silently restored from redundant copies. With filesystems measured in terabytes (like, your usual archive of DSLR RAW pictures and a small library of ripped CDs which I own) this kind of bitrot is all but inevitable. Which is why I'm using ZFS with mirrored disks (and my offsite backups are also on ZFS, although my filer is Linux and backup is FreeBSD)
-
Wednesday 16th August 2017 15:56 GMT jabuzz
Re: ZFS is the right choice for a server system
How do you know that the error didn't occur at write time though? You don't. So DIF/DIX will make sure the write was correct *AND* tell you down the line if it is corrupted. Sure ZFS is better than nothing, but if you really care then there are better solutions than ZFS. I guess you could get ZFS to do a verify on write, but performance is going to suffer in that scenario in a way it does not with DIF/DIX.
-
-
Wednesday 16th August 2017 20:35 GMT Alan Brown
Re: ZFS is the right choice for a server system
"That is if during the write something goes wrong and the data gets corrupted you will only get to find out when you try to read it back, by which time you won't be able to do anything about it, but you will at least know the data is bad."
Which shows you haven't bothered to familiarise yourself with how ZFS works.
"Further ZFS is based around RAID5/6, which is frankly does not scale. "
Which shows the same thing.
Are you trying to sell the competing software by any chance?
-
Tuesday 22nd August 2017 09:31 GMT PlinkerTind
Re: ZFS is the right choice for a server system
@jabuzz
This is wrong. DIF/DIX disks does not protect against data corruption sufficiently. Have you ever looked at the specs for disks with DIX/DIF? All enterprise hard disk specs say something like "1 irrecoverable error on every 10^17 read bits", fibre channel, sas, etc - all equipped with DIX/DIF. The moral is that those disks also encounter corruption and when that occurs - they can not repair it. Also, these disks are susceptible to SILENT corruption - corruption that the disks never noticed. That is the worst corruption.
ZFS detects all forms of corruption and repairs them automatically if you have redundancy (mirror, raid, etc). DIF/DIX disks can not do that. Even if you have a single disk with ZFS, you can provide redundancy by using "copies=2" which makes all data duplicated all over the disk, halving disk storage.
.
"...ZFS is based around RAID5/6, which is frankly does not scale..."
This is pure wrong. A hardware raid card can only manage a few disks, so hw raid cards does not scale. OTOH, ZFS utilizes the disks directly, which means you can connect many SAS expanders and JBOD cards, so a single ZFS server can manage 1,000s of disks or more - you are limited by the number of ports on the server motherboard. ZFS scales well because it can use all the JBOD cards. A single hw raid card can not connect to all other hw raid cards - hw raid does not scale. ZFS scales.
In fact, the IBM Sequioa supercomputer has a Lustre system that uses a ZFS pool with 55 Petabyte data and 1TB/sec bandwidth - can a hardware raid card handle a single Petabyte? Or sustain 1TB/sec? Fact is, a CPU is much faster than a hw raid. So a server with TB of ram and 100s of cores, will always outclass a hwraid card - how can you say that ZFS does not scale? It uses all the resources of the entire server.
Regarding high RAM requirements, if you google a bit, there are several people running ZFS on raspberry pie with 256MB RAM. How can that be?
Also, you can change OS and servers without problems with ZFS. Change disks to another server, change OS between Solaris, Linux, FreeBSD and MacOS. You are free to choose
ZFS is the safest filesystem out there, scales best and is most open. Read the ZFS article on wikipedia for research papers where the scientists compare ZFS against other solutions, such as hw-raid and conclude that ZFS is the safest system out there. CERN has released several research papers saying the same thing - read the wikipedia article on ZFS.
-
-
-
-
-
Wednesday 16th August 2017 07:50 GMT Adam 52
"Michael Dexter feels that even with a vigorous ZFS community hard at work, we may be approaching the point at which open source file systems are reduced to a non-useful monoculture."
For those that misread this awkward sentence in the same way I did. His concern was the monoculture and ZFS is the monoculture. The "vigorous ZFS community" has nothing to do with preventing a ZFS monoculture, so the "even with" is curious phrasing.
-
Wednesday 16th August 2017 10:41 GMT Anonymous Coward
> For those that misread this awkward sentence in the same way I did.
Ah, ta for that. I thought he was implying that for some unfathomable reason people would want a free operating system with a non-free (or disputably free) filesystem.
ZFS is great. I am sure, but a general purpose FS it certainly is not. But Red Hat's decision is pretty inexplicable. What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision.
-
Wednesday 16th August 2017 11:52 GMT TVU
"What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision".
I get the impression that they still don't quite technically trust it being the less mature file system plus, and this is the speculation bit, they don't and can't control the development and direction of Btrfs hence the move to using and developing XFS in house.
-
Wednesday 16th August 2017 14:10 GMT Gordan
"What harm can there be in including BTRFS in their mix of supported file systems, especially at this stage in its development. It definitely smells like a political decision."
There's no harm, but it is a huge amount of labour intensive work to backport patches back into the RH distro kernel. RH don't follow LT kernel releases, them take a snapshot of a kernel with a .0 point release, and after that, everything they merge is cherry picked. This is extremely labour intensive and error prone, and they couldn't care less about it if by a miracle a mispatch doesn't blow up spectacularly at build time due to the very specific kernel config they use. (Example: https://bugzilla.redhat.com/show_bug.cgi?id=773107 )
In fairness, RH aren't to be singled out for not taking advantage of the, IME, more stable mainline LT kernel trees; most distros seem to engage in this pointless and laborious rejection of upstream kernels for "not invented here" reasons.
-
Wednesday 16th August 2017 20:47 GMT Alan Brown
"RH don't follow LT kernel releases, them take a snapshot of a kernel with a .0 point release, and after that, everything they merge is cherry picked. This is extremely labour intensive and error prone,"
They don't just do that with the kernel.
EVERY part of RHEL is full of hand-merged backports without bothering to change the major version numbers. Just because something _SAYS_ it's foo version 2.5.5-35.el7_x86_64 doesn't mean that it's not got parts of (or all of) upstream foo version 4.5 merged into it.
You make changes to a Redhat system at your peril. Beware, here be dragons. Nothing is what it seems.
-
-
-
-
Wednesday 16th August 2017 09:01 GMT iTheHuman
So, had anyone informed the Facebook and Oracle devs that the fs they've been working on its finished?
You know, it's not as though rh had ever been a vigorous backer of btrfs since most of their fs folks are on team xfs..not to mention clustering filesystems like Ceph and Gluster.
Btw, some rh dev announced a new project that seems to be aiming for a more UNIX-y zfs (that is, without the layer violations, but with many of the same features). It actually looks kind of interesting with most of neat stuff happening in the fs daemon.
-
-
-
-
Thursday 17th August 2017 06:46 GMT Bronek Kozicki
I have unpleasant experience with D-bus failing and I'd rather keep it away from the filesystems I use, because it was impossible to troubleshoot properly and even clean system shutdown was difficult (I have enabled SysRq on my system since then). Also, D-bus is a higher abstraction than a filesystem is, so making it a critical dependency in the management of a filesystem turns the dependencies in the system upside down, making it more difficult to recover when things go wrong. I think this is very rational evaluation.
-
Wednesday 30th August 2017 05:30 GMT iTheHuman
I'm not doubting that you've had problems involving dbus, but I can't say that those were problems caused by dbus (indication of a deeper issue). That you couldn't determine the actual issue provides additional support for my assertion.
Dbus is just ipc, and stratis is going to make heavy use of IPC as the daemon which can hold additional state so that better global decisions can be made than would otherwise be possible (this is exactly how they plan to be able to take advantage of these well defined, existing, services while enjoying many of the features that monolithic fs like zfs/btrfs have without needing to poke holes through the vfs/block boundary).
Something else to keep in mind, userspace is far more forgiving of errors than a kernel.
-
-
-
-
-
-
-
Wednesday 16th August 2017 18:06 GMT Daniel B.
Re: I <3 btrfs
Light years ahead of anything Windows can do.
Everything is light years ahead of anything Windows, period.
As for snapshots, that's available on ZFS too, mostly because btrfs was originally born as an Open Source equivalent to ZFS, mostly sponsored by Oracle. But then Oracle bought Sun and they got access to ZFS, so btrfs was "no longer important". :(
I did try btrfs at some point, but it just didn't work well, so I had to move to ZFS. The latter is supported on pretty much every single OS except Windows (again, everyone's light years ahead of Redmond's OS) so it also serves as a multiplatform FS.
-
-
Wednesday 16th August 2017 11:57 GMT Phil Bennett
People are still using btrfs?
After the RAID5/6 issue which still isn't fixed a *year* later(!), people are still trusting their data to btrfs?
For small storage, there are loads of options (and boot from ZFSoL is still new enough to be a concern, if not a blocker)
For huge storage, you probably aren't using ZFS - you're looking at cluster filesystems (gluster, ceph, hdfs etc)
For medium scale storage, ZFS is hard to beat. Work out a way to get ZFS on Linux compliant (even if that is to reverse engineer it) and move on.
-
Wednesday 16th August 2017 13:42 GMT DougMac
Re: People are still using btrfs?
> After the RAID5/6 issue which still isn't fixed a *year* later(!), people are still trusting their data to btrfs?
Umm, the RAID5 issue which isn't fixed correctly *since the beginning of the project*.
The devs have known of conditions which will corrupt RAID5 since the start, and while there was a promising bug fix a while ago, they then found it only fixed one of the bugs, but others are known.
The people doing btrfs have known about these issues for some time, and they never get properly fixed.
Most likely, that is why RH is dropping support for it.
-
-
-
Thursday 17th August 2017 23:02 GMT Anonymous Coward
Re: People are still using btrfs?
Bluestore is going mainstream with the new version of SUSE Enterprise Storage 5 coming out next month (Ceph based for those unaware).
I’ve seen some early pre-beta performance data (disclaimer: I work at Suse). Beats anything we were able to get out of gluster on the same kit hands down!
-
-
-
-
-
-
Wednesday 16th August 2017 22:30 GMT John H Woods
Re: Anyone else just use ext4?
Ext4 locally, ZFS on my fileserver.
My fileserver snapshots my few TB or RaidZ3 every minute. If I've set it up right, there's no remote admin login, so you need physical access to delete snapshots.
I cryptolockered the lot from a throwaway VM attached via NFS and it was possible to rapidly recover every single file from snapshots... I didn't even need to restore anything from backup.
ZFS is marvellous... Let's just get the licence issue resolved...
-
-
Wednesday 16th August 2017 19:32 GMT P.B. Lecavalier
JFS!
My attitude toward BTRFS has long been: "This is very promising. We can use it now? It doesn't seem like it's a simple drop-in replacement for ext4... I'll wait for others to try this Kool-Aid." Seems like it does not taste so good after all.
In the mean time, I'm happy with JFS on my system. And will take the label odd-ball.
-
Thursday 17th August 2017 11:06 GMT simpfeld
RH looking at a different solution "Stratis"
Asking about this elsewhere it looks like Red Hat are doing work on the "Stratis Storage Project" .
This seems to be a bit of a management system that will allow you to emulate pretty much all features of an next generation file system using existing layers (LVM, MD , XFS). But adding things like block level checksumming to MD/LVM to allow the equivalent of individual file check sums. The argument seemed to be to, building this in a single layers like BTRFS and ZFS is too hard. The layering allows you to make the programming/debugging more manageable. I guess the key would be communication between layers, bad block checksum tells XFS the file is corrupt etc.
Details here:
https://fedoraproject.org/wiki/Changes/StratisStorage
https://stratis-storage.github.io/StratisSoftwareDesign.pdf
They also seem to have some interest in "BcacheFS". A "Bcachefs" developer says there are fundamental issues with the BTRFS design:
"btrfs, which was supposed to be Linux's next generation COW filesystem - Linux's answer to zfs. Unfortunately, too much code was written too quickly without focusing on getting the core design correct first, and now it has too many design mistakes baked into the on disk format and an enormous, messy codebase "
https://www.patreon.com/bcachefs
Not sure the truth of this, I don't know enough about it.