I think it's 'quartarius'
IDC has just revealed one of the worst storage quarters for a couple of years as buyers went on a high-end storage strike. There's been a quite spectacular slump in the storage business in the first 2014 quarter, according to the beancounters' Storage Tracker. Notwithstanding the digital-universe-drowning-us-in-Big-Data …
It's not that the (Big) volumes of storage are decreasing, but rather the eclipse of high end (and very expensive) storage controllers, brought about by a) increasing amounts of storage attached directly to servers (e.g. in HDFS clusters) and b) various SAN virtualisation (Software defined Everything) technologies.
The last time "other" led the pack with this much growth was before the 2008-9 downturn. At that point, the vendors who did the most with cheap disks in arrays (Data Domain, Isilon, 3Par, Equallogic, etc.) got bought, and "other" crashed, leading to a solid growth wave for the acquirers. Seems like this could also be the beginning of a repeat, but this time it's the vendors who figured out better early recipes for flash.
Um, doesn't anyone realize that both HDS and EMC are at the trailing stages of their higher end product cycles? HDS has announced G1000 publicly, but big customers were told earlier under NDA and who've chosen mostly to wait for it instead of older VSP. It's pretty obvious that a corresponding new VMAX is imminent with the same purchase delays by EMC's big customers, who've also been notified early.
"Put another way; are we seeing the cloud and flash put a temporary or permanent kink in the disk array business?"
No, you're seeing HDFS propagate through the enterprise, and those enterprises have suddenly realised that disk arrays are snake oil; a solution looking for a problem, laden with drawbacks for all use cases. We've won five contracts this year alone, including a couple of major public sector jobbies, and I can tell you none of the hardware purchases have gone to HP or EMC!
ODM sales figures have gone through the roof, as have Cloudera's, Hortonworks's etc.
Flash will continue to have its niche, as will "cloud", but the real disruption here is Hadoop. That shouldn't surprise anyone in the storage business. Its storage costs are on the order of 1/10-1/100 that of a "high end" disk array, but its MPP SQL engine (i.e. Impala/Parquet) outperforms Oracle/Teradata et al. Plus you get MapReduce, Solr, Spark and HBase all for "free" on common commodity hardware with complete linear scalability.
Arrays are, rightfully, falling back to the niche in which they belong.
I have read some about hadoop, and I get how cool it sounds and that you can use your own systems and storage and not pay a lot for a bunch of proprietary hardware, but I still don't see how it allows a company to stop using traditional storage arrays, especially if they are now.
My company has SQL and Oracle clusters on physical servers, and scores of VMware hosts that use mostly fiber channel storage.
We also have loads of enormous CIFS and NFS shares that hold unstructured data for our applications.
All of this is accomplished using one type of storage device (NetApp), and it works very well. It isn't inexpensive, but its also not too bad for what it does for us.
Again, hadoop sounds great, but so far I have only found a project to run NFS gateways to it, and that's about it for access. Can you enlighten me on how we can get rid of NetApp with hadoop without rewriting all the applications to use some new API?
And btw, I and my firm would give my left arm to get rid of Oracle RAC.
It helps if we break it down into a few areas.
The one area Hadoop can't really displace FOGB storage arrays is in driving big virtualisation farms*. But then, at the end of the day, most companies don't have big virtualisation farms.
*(this may well change soon as secondary projects mature and Docker-On-Yarn becomes a one-stop-solution for all distributed computing)
Storage purchases are driven largely by the need to back big databases. Oracle RAC being a prime example. Those bulk storage requirements in databases tend to be driven by "warm" storage - warehousing. We're not usually talking realtime read/write, or supporting massive numbers of users. We're talking write once, change almost never, read often.
In that use case, Hadoop excels - and that should be no surprise as that is exactly what it was designed to do. I've personally been involved in 3 Oracle-to-Hadoop migrations (there was an article here recently about AMD doing exactly that), and am currently involved in a project that was going to be a £30m+ Teradata warehouse consolidation, but is now instead a Hadoop project at a fraction of the cost.
For these cases the development overhead is almost zero. You go to Cloudera (or Hortonworks if you're that way inclined), you pay their trifling license fee, maybe their extortionate day-rates for deployment support, send off to china for your £2k-a-pop boxes and you're pretty much done. After that, in this age of Impala and Parquet, it just comes down to churning out SQL and Sqooping all your tables across. We pay 19 year olds to do it. And it still manages to be faster for something like 19 out of 20 queries.
If performance starts grinding? No worries, just throw more boxes at the problem. No going off to Oracle/Teradata/Whoever to get tuning consultancy or more licenses. It's cheap. Really cheap. That's the main advantage.
Yes, that BI use case is pretty narrow, but it is common as muck and a *major* driver of storage purchases. Another major use case of bulk file storage is more than doable - HDFS now has an HTTP API, and can be mounted as transparently as any other fileshare. Plus you get all the other perks (MapReduce, Hbase, Solr etc.) free out of the box.
Stray further from these write-once cases and you're absolutely right that development becomes more difficult (HBase isn't exactly easy), but what we've found is that once organisations are bought into Hadoop and have realised just how cheap-yet-bloody-effective it is, they're quite willing to start pulling in other components to build it out.
Could it be as a result of a lack of confidence in the security of such solutions?
Usually, the value of the data is many times that of the cost of storage, even if you have to do it yourself. Could it be that data owners think the extra cost of storing it themselves is a worthwhile investment?
Biting the hand that feeds IT © 1998–2020