back to article Failure to launch: Hadoop will be 'anaemic' for years, says Gartner

Forget the hype and the vendors chinning each other – Hadoop is still not ready for take-off. That’s according to Gartner, whose latest short story in the Hadoop-is-not-ready saga has predicted “anaemic” adoption of Hadoop for at least the next two years. The biggest problem is that people allegedly still can’t use Hadoop, …

  1. Anonymous Coward
    Anonymous Coward

    Catch 22?

    The fundamental problem for HortonWorks et al. is that they need to make Hadoop easier to consume in order to drive adoption... but their business model is based on providing services because Hadoop is hard to consume.

  2. Anonymous Coward
    IT Angle

    The key is finding business reasons to deploy Hadoop

    What information does a company have? What is the value of crunching that data in certain ways? How does that value compare to the hardware, software, services and personnel expense of a Hadoop deployment? What other analytical packages might be "good enough" analytically, and less onerous from an expense perspective?

    Doesn't sound line Hadoop is making enough progress in positioning itself vs. lesser analytical solutions or communicating customer success stories in an intelligible "wow! We could do that!!" Manner.

  3. Anonymous Coward
    Anonymous Coward

    Java too inefficient, unstable, and information design is wrong

    The problem is that it shouldn't require rocket scientists to operate a damn database - when you have a correctly engineered DB it should just run. Witness SQLite, that thing can handle gigabytes of relational data, and it requires zero administration once the structures are defined. Do you see Apple deploying DBA's to go along with every iPhone?

    Java deployments can achieve a modicum of stability with enough people to babysit it and machines to run them on, but it's overall an inherently unstable technology. This is why the only companies that run these things are companies with loads of cash to keep throwing into the furnace, but no company that has to justify its infrastructure expenses against the value-add of the system can possibly run something like this year after year.

    The other issue is the database world is hopelessly out-of-whack when it comes to information design - key-value databases are indeed the way to go, but everyone is trying to force relational schemas onto them and this is retarded - you are taking a highly structured problem domain, then breaking it down into relational concepts like columns and rows, then forcing that mess into a key-value DB, then trying to re-constitute the structure you just threw away by doing very expensive map-reduce operations to create column views.

    Completely retarded.

    1. Anonymous Coward
      Anonymous Coward

      Re: Java too inefficient, unstable, and information design is wrong

      OMG, Gigabytes, wow.

      You do realise that Hadoop is designed for PB - or 6 orders of magnitide greater. When you essentially stuff everything into a single disk file that is rarely larger than available RAM and it's being accessed by a single process in a single machine, then everything is pretty simple I would agree.

      I think that the world of Hadoop where clusters of 4000+ servers with > 10PB of data and accessed by hundreds of users running complex analysis is a little different than looking up an address in an iPhone address book.

      I can only surmise that 'Completely retarded' is your signature, not your final comment.

    2. Andy 73 Silver badge

      Re: Java too inefficient, unstable, and information design is wrong

      Oh dear.

      Java's actually very good for this sort of stack - portable, robust and flexible enough to embrace the steady shift of development frameworks as we better understand the problems we're trying to resolve. It's far from inefficient and the tooling around it is excellent. If you're still comparing your iPhone sized database with the sort of thing we're doing with Hadoop, you need to re-read the part of the manual that explains how not all databases are the same. The system I'm currently working on consumes about a terabyte of data a day and retains that indefinitely. Not only that, but it delivers value - we're talking eight figure sums annually here for a single use case, but that is a long, long way from the sort of thing you'd achieve with SQLite.

      What is the problem is the issue of wrangling a cluster of a few hundred machines and providing production level SLAs on processes running on that cluster. The tooling is catching up, but we're working from the ground up on technology that is still desperately immature. There are a LOT of moving parts in a typical deployment, and whilst Hortonworks, Cloudera and the others are getting better at bringing a working system up, there is much to be done to ensure BAU services are business as usual, not a series of experiments. It doesn't help that there are dozens of different frameworks and approaches to implementing a given solution - nearly everyone I talk to has found a new combination of tools to use and that's preventing companies from focussing their efforts on making one particular toolset feature complete and robust.

    3. Anonymous Coward
      Anonymous Coward

      Re: Java too inefficient, unstable, and information design is wrong

      If you think Hadoop is a database then you've fundamentally misunderstood what Hadoop is.

  4. SecretSonOfHG

    Wrong on all counts

    In the order you present your points:

    Rocket scientists: call me back when you've managed to scale your SQLite database so that you can run a query piecemeal on an arbitrary number of machines of different CPU architectures, word sizes and instructions sets. It's not Hadoop by itself that is complex, is the whole parallel programming paradigm that is difficult to grasp. There are other languages than Java to write parallel programs, but the inherent complexity is just shifted from learning the parallel paradigm in a general purpose language to learning the special purpose language and living with its limitations.

    From an efficiency standpoint, Java is on par if not better than other VM architectures. The only thing more efficient than a JVM would be compiling directly C/C++ to native code. Which would mean programs much harder to write if you want to account for all the architectural differences in an heterogeneous environment. From a complexity standpoint, it is exactly the same to launch an OS instance with a Java environment installed or not. Same applies for C#, by the way.

    Information design: perhaps you think that there is one and only one true way of storing data. Key-Value storage is a wide definition that applies also to RDBMS engines. The reality is that for each usage context there is a storage layout that is more efficient than the others. As a corollary, you have to translate between them when you want to use the data for a different purpose. This is unavoidable, you're not going to find a storage schema that fits equally well storing transactions with ACID-like properties and at the same time digesting huge amounts of data and perform transformations on it. Well, perhaps you've found it, in which case I'd suggest you stop posting comments and starting a business around it. You'll conquer the world.

    1. emkay2015

      Re: Wrong on all counts

      > It's not Hadoop by itself that is complex, is the whole parallel programming paradigm that is difficult to grasp.

      The original Hadoop (1.0) was supposed to be used only for MapReduce programming, not for general purpose parallel programming though the latter was possible using some trickery. Map Reduce is actually a very simple SIMD kind of paradigm and simple to grasp.

      When I became a Hadoop engineer, I was coming from the world of TCP/IP kernel stack programming and device drivers. I knew next to nothing about databases or about the world of data and information design. But I was able to fully grasp the MapReduce paradigm and the design of Hadoop in about 2 weeks.

      However the next gen Hadoop called Yarn does actually allow more generic parallel programs beyond Map Reduce.

  5. Anonymous Coward
    Anonymous Coward

    Can we leave the religion out of it?

    The Hadoop environment just feels so unpolished, even using one of the "supported" (cough) distributions. Getting even simple queries to work right felt like building some kind of Rube Goldberg/Heath Robinson machine. No wonder there's a lack of enthusiasm from anyone other than the cool chasers.

    1. emkay2015

      Re: Can we leave the religion out of it?

      > Getting even simple queries to work right felt like building some kind of Rube Goldberg/Heath Robinson machine.

      Actually Hadoop is not for queries at all. It is not a database technology. You should not use it for searching and querying and maintaining databases.

      Hadoop is only for big data analytics and processing of the kind where all rows of data are processed and that too using the same algorithm.

  6. Alistair
    Windows

    I'm a rocket scientist apparently

    It certainly isn't pretty -- but our Elephants are up and running. Quite happily trundling along sucking up whatever we throw at them.

    I've got no less than 12 business units in queue to jump in to one or more of these and start participating in the data sloshing.

    I'm keeping the bulk of them on hold for now - the single biggest roadblock for me right now is getting the security folks to agree to a framework, and to agree to the access controls I need to finish building.

    Second hardest component is getting network engineering to comprehend that we actually can consume better than 60% of a 10Gb/s link, and yes, we do need that 10Gb access layer component for the top of rack switches to keep from slaughtering our out of data 1Gb/s server infrastructure network.

    That and teaching folks that, no, you don't get to see *everything* that is in the pool, there is stuff no human being should be able to connect in here.

  7. Anonymous Coward
    Anonymous Coward

    Spark should fix this

    Hadoop's achilles heel has always been map reduce - not very fast and too low-level. Apache Spark is already driving much better adoption as it provides a unified core - so there's a lot less moving parts.

  8. JayTee253

    "rocket scientists, not the masses" Really?!?

    "rocket scientists, not the masses" Really?!? - Not sure I agree here.

    1. Getting data in - if nothing else you can upload CSV files.

    2. If you can't program - use Hive (or similar) to submit SQL queries.

    3. If you're a little bit braver than SQL, have a go at writing a PIG script.

    4. If the above seems to easy, pick your favourite language Python, Java, Scala and go wild!

  9. sw41
    Go

    The Cluster Half-full View

    Log crunching is definitely the sweet spot, but more workloads on Hadoop are moving to real-time as well thanks to new compute engines like Spark and advances in Apache HBase. Here's a (mostly) quantitative view from a 50+ cross-section of Hadoop adopters on the gains they see:

    https://www.mapr.com/blog/hadoop-adoption-is-the-cluster-half-full

  10. Anonymous Coward
    Anonymous Coward

    Another (solid?) remark by Gartner

    Yet another "Linux is the hype du jour" like comment by Gartner. Do they actually live on planet Earth with the rest of us?

    I have not seen a market penetration in the past few years that has been as considerable as Hadoop since (the aforementioned) Linux. Maybe getting this sort of comments from Gartner is actually a sign of good health as proven before.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like