back to article Hadoop 2 stampedes onto world's mega compute clusters

The Apache Software Foundation has branded the data analytics Hadoop platform with version 2 and sent the Elephant-logoed system stampeding out into the wild. The second version of the open-source technology comes with a refreshed compute engine via the YARN data processing and service engine, and the addition of high- …


This topic is closed for new posts.
  1. Captain Save-a-ho

    Weren't we already here long ago?

    I'm pretty sure most of the discussion around Hadoop and Big Data is just the 360 degree come around from Cloud Computing 1.0 (aka mainframes). Didn't System 38 do most of this, although minus the free and open bits?

  2. Steve Loughran

    sort of

    You could look at a big chunk of the grid schedulers: condor, platform, mesos and say "quelle difference?", but there are some

    * designed to place work close to the data: your code can ask for specific machines & racks, with the scheduler trying to place it there, but if you say "best effort" then it will do it as close as it can network wise. This lets us run Hadoop without the high-cost SAN networks and so make storing petabytes of data affordable.

    * designed for algorithms that have to handle failure. MapReduce does this by splitting up the work, retrying failed jobs, recognising slow machines and re-issuing the work -and even blacklisting the slow boxes. Those slow ones are the enemy as these stragglers slow everything down. Apache Tez can do checkpoints, then roll back to them. The Streaming algorithms need to replay the streams, which is a different problem.

    If you do go back to the 1980s era massively parallel designs, some of the architectures do look familiar. Is the scale that's different -a scale that makes failures a fact of life that everything has to handle, rather than a disaster that needs someone to be paged and your on-site HDD replacements (for which you pay a lot for) wheel out. Even so -there are lessons there that we should learn from. After all, aren't VMs and their hypervisors just descendents of VM/360 -which had billing in from the outset too.

  3. asdf

    Google success story

    Hadoop 1&2 is just everyone else working together to keep up with Google. Engineering talent isn't everything but it sure is underrated at far too many places.

This topic is closed for new posts.