back to article Who knew? Hadoop is over, says former Hortonworks guru Scott Gnau

Tech history comes in waves. One minute you’re riding a crest, the next you’re wiped out. Just ask Scott Gnau, former CTO of Hortonworks, the company once seen as the figurehead of the big data boom. Now head of data platforms at InterSystems, Gnau has had time to reflect on the Hadoop legacy and where those with torrential …

  1. Anonymous Coward
    Anonymous Coward

    Apache Hadoop isnt dead yet... but dying...

    Apache Hadoop has some serious flaws that were never corrected.

    HDFS should have followed MapR many moons ago but didn't. When MapR came out with a better mouse trap, Cloudera expected and demanded that MapR open up its MapR-FS as a giveback to the community. (Fat chance of that happening.) Rather than update HDFS to compete, they let it go. They key here is a lack of innovation on the part of Cloudera and also Horton when it came out. Horton's go to market strategy was to be the 'keeper of Apache' claiming to have a closer fork to native Apache. (There were also other flaws in their business model.)

    What also killed Apache Hadoop was YARN. Flawed from the beginning it was never really fixed.

    As technology evolved, Hadoop didn't.

    (BTW Gnau came to the party late....)

    Today, HPE has the technical advantage by combining two 'best of breed' technologies (Blue Data , MapR) So do you still call it Hadoop?

    Cloudera is dying... while they tout CDP and are raising prices to become profitable... take a look at the job postings from their major clients.

    All indications is the move off CDP and onto the cloud.

    And while everyone is ignoring the enterprise cloud, Cloudera isn't the best choice. HDFS is still the bottleneck.

    Posted Anon for all the obvious reasons and even from what I said, there are a lot of people who can guess who I am. ;-)

    1. This post has been deleted by its author

    2. Anonymous Coward
      Anonymous Coward

      Re: Apache Hadoop isnt dead yet... but dying...

      HAHAHAHAHAHAHA. Are you an HP shill? Blue data has always been a joke and MapR is smoldering wreckage.

      Name one technology where HP have been a leader in the last 10 years? Name one time they have successfully bought and borg'd something over the same period.

      Vertically integrated data stacks, with all the open source cruft hidden behind the scenes and with completely segregated storage and compute are where its at. Which is why Snowflake has a 33Bn market Cap compared to HP Inc;s at 11Bn. (Okay its a massive VC bubble but you get the point).

      1. Anonymous Coward
        Anonymous Coward

        Re: Apache Hadoop isnt dead yet... but dying...



        I just happen to have over 10 years in the Big Data space.

        I'm certified on both MapR and Apache Hadoop.

        I doubt you can say the same.

        I post anon for the obvious reasons.

        You segway to Snowflake. A company who's price is 77x forward earnings.

        Its a joke. If IBM and HCL got their act together they could re-introduce Informix XPS w a major upgrade to use object storage underneath to easily fit in to the cloud.

        You simply don't know or understand the technologies as well as you seem to think you do.

  2. Steve Channell

    Hadoop prototype

    Map/Reduce in general and Hadoop in particular was a good start for data-centric computing for those that could not budget for Teredata scale kit - somewhat unfair, but 30yr behind the leading edge.

    There'll always be a place for cheap/simple tools like Hadoop in the Data Grid, the problem is that better tools are now cheaper, and distributed file-systens are moving into hardware

    1. Anonymous Coward

      @Steve Re: Hadoop prototype


      Map/Reduce (MR1) was a poor man's HPC.

      MR2 was an attempt at containerization which was never done right and never fixed.

      Hadoop is going to go away.

      You have to watch the license revenue and PS revenue dry up as their customers move towards Cloud and don't want to run CDP in the cloud.

      Cloudera raised their prices and that gave them a revenue boost. However, for the same price... there are better alternatives.

  3. Yet Another Anonymous coward Silver badge

    This is why

    I stick to Fortran

    1. Anonymous Coward
      Thumb Down

      Re: This is why

      Fortran is a programming language. Not a framework.

      1. Anonymous Coward

        Re: This is why

        It's one of the tragedies of computing that, 60 years after it became obvious that programming languages and frameworks were the same thing (or, really, that frameworks were just shitty programming languages) people still don't know they are.

        1. Anonymous Coward
          Anonymous Coward

          @tfb Re: This is why


          I suggest you go back to school and do some actual language theory course work.

          Frameworks are written in a language but they are themselves not a language.

          An OS is written in C. Does that make the OS a language or a Framework? Or C a Framework?

          I think not.

          1. Anonymous Coward
            Anonymous Coward

            Re: @tfb This is why

            "... actual language theory course work. Frameworks are written in a language but they are themselves not a language. An OS is written in C. Does that make the OS a language or a Framework? Or C a Framework? I think not."

            "Frameworks"*** are usually written in API and thus anything with interface must have language. Your questioning within comparisons isn't great, as unequivocally the answer is C == OS == "Framework" as all 3 are symbiotic and ironically in this exact case... written and interfacing in the exact same programming language (hopefully this was intentional humor?). ANY API must be written to speak to its caller in a language or else it is garbage, try printing a binary or using mixed bauds (although eehh, still language just bad serial).

            *** A "Framework" is marketing speak for a library or an overly complex/out-of-hand software "widget" (ie. X.500 or ACE). That said, the word "Framework" looks good before version numbering. "Super Duper v2.0" .... "Super Duper Framework v2.0"

          2. Anonymous Coward

            Re: @tfb This is why

            Well, you see, I did go to 'school' and do some language theory. And it's entirely because I did that, and because of the languages I was and am interested in that I understand what you don't seem to: the division between a language and a library or a framework is something which, if you pick adequately expressive and extensible languages simply goes away. My 'we've known for 60 years' was a hint you obviously don't know enough history to get.

            But never mind, those battles are long lost. As someone once wrote:

            I’ve spent my life trying to build elegant tools to solve hard problems. Now I am old and tired and somehow I find myself in a world of mud where the chosen weapon is a club with nails hammered into it, used by swinging it wildly about, spattering the ground with fragments of skull and brain of friend and enemy alike. The nails, formerly rusted iron, are now stainless steel scavenged from a vast, broken needle made of strange metals: no-one now remembers it was once a spacecraft.

            Have fun with your club.

            1. CheesyTheClown

              Re: @tfb This is why

              I've been saying this for some time about COBOL. (Oh and I work with FORTRAN in HPC quite often)

              People make a big deal about COBOL programmers being in short supply and that it's an antiquated language. Honestly though, what makes programmers really confused about it is that you don't really write programs in COBOL, it's more of a FaaS (Serverless, function as a service) platform. You write procedures in COBOL. The procedures are stored in the database like everything else and when a procedure is called, it's read from the database and executed.

              The real issue with "COBOL programmers" is that they don't know the platform. The platform people are usually referring to when they say "COBOL" is actually some variation of mainframe or midrange computers. Most often in 2020, they're referring to either IBM System/Z or they're referring to IBM Series i ... which is really just a new name for what used to be AS/400.

              The system contains a standard object storage system... or more accurately, a key/value store. And the front end of the system is typically based on CICS and JCL which is job control language. IBM mainframe terminals (and their emulators) have a language which could be kind of compared to HTML in the sense that it allows text layout and form entry as well as action buttons like "submit".

              Then there's TSO/ISPF which is basically the IBM mainframe CLI.

              What is funny is that, many of us when we look at AWS, all we see is garbled crap. They have a million screens and tons of options. The same is said for other services, but AWS is a nightmare. Add to that their command line tools which are borderline incomprehensible and well... you're screwed.

              Now don't get me wrong, if I absolutely must use AWS, it wouldn't take more than watching a few videos and a code along. I'd probably end up using Python even though I don't care much for the language. I'd also use Lambda functions because frankly... I don't feel like rolling my own platform from scratch. Pretty much anything I'd ever need to write for a business application can be done with a simple web server to deliver static resources, lambda functions to handle my REST API, and someplace to store data which is probably either object storage, mongodb, and/or some SQL database.

              Oddly, this is exactly what COBOL programmers are doing... and have done since 1969.

              They use :

              - TSO/ISPF as their command line instead of the AWS GUI or CLI tools.

              - JCL to route requests to functions as a service

              - CICS (in combination with a UI tech) to connect forms to JCL events as transactions... instead of using Amazon Lambda. Oh, it's also the "serves static pages" part. It's also kind of a service mesh.

              - COBOL, Java or any other language as procedures which are run when events occur... like any serverless system.

              It takes a few days to learn, but it's pretty simple. The hardest part is really learning the JCL and TSO/ISPF bit because it doesn't make sense to outsiders.

              What's really funny is that IBM mainframes running this stuff are pretty much infinitely scaleable. If you plug 10 mainframes in together, they practically do all the work for you since their entire system is pretty much the same thing as an elastic Kubernetes cluster. You can plug in 500 mainframes and get so much more. The whole system is completely distributed.

              But like you're saying FORTRAN is its own entire platform/ecosystem is entirely true. Everything you would ever need for writing a FORTRAN program is kind of built in. But I will say, I would never even consider writing a business system using FORTRAN :)

            2. Anonymous Coward
              Anonymous Coward

              Re: @tfb This is why

              I suggest you get back to school.

              You still don't understand the difference between a library and the underlying language.

  4. Anonymous Coward
    Anonymous Coward


    The demise of Hadoop, as a distributed file system, and as an investment darling, has been linked to the growing popularity of object storage, integrated with the cloud hyperscalers' data offerings in the form of AWS S3, Azure Blob Storage, and Google’s Object Storage, as they are seen as accessible and cheap.

    The demise is in terms of poor performance.

    MapR-FS actually crushed HDFS in terms of benchmarks.

    Add to it multi-MFS and the use of SSDs and NVMe drives... HDFS couldn't keep up.

    Now that SSDs are the norm... HDFS is going to lag in terms of performance.

    Also w K8s, you have off cluster compute so you don't need data locality and the purpose of the cluster is for a DFS.

    S3 is going to be a better option.

    MapR-FS supports S3, HDFS and POSIX.

    So you have the best of 'all' worlds.

  5. ghudson

    The future is MUMPS?

    Who knew? The future is MUMPS :-)

    1. Anonymous Coward
      Anonymous Coward

      Re: The future is MUMPS?

      It’s clearly not going away, but I’m not sure I’d ‘bet the farm’ on it.

    2. Dr Who

      Re: The future is MUMPS?

      Good call ghudson. I remember the InterSystems reps coming to visit us soon after the launch of Cache for a demo. Must have been over 20 years ago. It was very impressive which is why I still remember the demo.

      1. ghudson

        Re: The future is MUMPS?

        Oh don't say that! I used to work for Micronetics (better implementation IMHO) - still hurts :-(

  6. Stephendeg

    “unique” proprietary data platform.

    I don’t know how unique it it, but I don’t mind sharing my limited experience (using the health integration product); it ran fast on old hardware, was easy to integrate with RDBMS, it would quite happily ‘pretend’ to be an RDBMS, years of uptime without a crash/restart, and excellent support here in the UK.

    I don’t work for them and have no intentions of doing so - but I’d love to know if others fared worse.

    1. devilsinthedetails

      Re: “unique” proprietary data platform.

      Work with Cache and Iris in our product and still very fast and stable with great support and flexibility. License costs can be awkward depending on your usage

      Just don't call it MUMPS, they don't seem to like that, trying to get away from the legacy image.

      1. Dwarf

        Re: “unique” proprietary data platform.

        Ahh, don't worry, MUMPS has been replaced by COVID-19.

        I'm just hoping that they upgrade COVID-19 to COVID-21 and resolve all the glaring defects in it such as the lack of the critical spaceship operator <-> that was recently added to C++20.

  7. Steve Channell

    Hadoop, the new Globus Grid

    One day we’ll look back on the idea of building a Distributed File System and scheduler in Java and just laugh and laugh. There is nothing wrong with Java or Scala, but the nature of critical infrastructure software is that it will be optimised with hardware assists (object-storage appliances, software-defined network load-balancing switches) and migrate into Operating System Kernel drivers to reduce latency – you simply cannot do that with a managed language.

    If you accept that it will be rewritten in C++/Rust at some point to take advantage of appliances, the question becomes when, and that seems to be sooner than any of us expected – which is shame. The biggest advantage of Hadoop Hive and all the other variants is that it drives down the cost of commercial DBMS – fragmentation is reducing this pressure.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon