back to article Apache lets fly Hadoop 1.0 data muncher

The Hadoop project at the Apache Software Foundation is beating its chest for delivering the v1.0 version of the open source MapReduce data analysis tool, its Hadoop Distributed File System (HDFS), and other related code. While software version and release numbers can sometimes be arbitrary, they are often also symbolic, and …


This topic is closed for new posts.
  1. Dave 124

    Another Triumph for Computing in SLOOW Motion( or Why the choice of a toy elephant is appropriate)

    Running at 1/30th of equivalent C/C++ implementations Hadoop creates a new standard for doing it slow across large numbers of machines. Don't plan on using this elephant right away. From experience all of the how-to's and interfaces are now out of date and you will have to wait for a new set of e-books to buy. Once up it will take daily care and feeding to keep it from crashing. I for one would like to see them fix their installs so they actually work out of the box rather than working to bump release numbers....

    1. Justin 3

      Really? Having used this in production with a customer for the last 12 months, the deployment was pretty trivial, the pains came from other tools we were using. Deployment/config is pretty straightforward compared past experiences I've had with things like WebLogic and WebSphere. Using a decent packaged distro does take the pain out of it, we used Cloudera's CDH3. As for performance burning through several TB of log data can be done in minutes on a small cluser i.e. less then £35,000 worth of kit. I guess your mileage may vary...

      1. multipharious

        Trivial is about right

        Setting up a Hadoop + Nutch and setting them free to crawl Enterprise targets is more or less easy...if you can get past the old Nutch documentation...much improved as of last year.

        Happy to see Hadoop hitting 1.0!

      2. Dave 124

        Apparently You Missed the Point...

        The point is that the appearance of speed Hadoop relies on a huge hardware investment because of the choice of implementation language. Current commercial implementations using C++ are hitting a 30x speed improvement over the Java implementation of Hadoop and maintaining compatibility with the original API. Yes, when fully deployed both systems will give answers - Hadoop in an extended coffee break and the C++ versions in seconds. Apache continuing to push the Java based product will eventually be the death of the project as the commercial products burn past them.

  2. Justin 3

    Cassandra and Hadoop aren't related

    Cassandra is independent of Hadoop its not an add on of any sort. The only similarities are that Cassandra and HBase are column family based table stores.

    1. foxyshadis

      You misread it

      Mahout is the add-on, Cassandra is an alternate data store. In case you're behind the times, yes, Cassandra can be used with Hadoop's MapReduce and other APIs for quite some time now, even though HBase is the most common data store.

      1. Justin 3

        I may have misread the mahout point, however Cassandra and Hadoop aren't interoperable, Hadoop can't use Cassandra to natively store files, similarly Cassandra doesn't use Hadoop to store or process its data natively. If the writer had looked down the Apache site you would see that Cassandra is a seperate project from Hadoop.

This topic is closed for new posts.

Other stories you might like

Biting the hand that feeds IT © 1998–2022