Failure to launch: Hadoop will be 'anaemic' for years, says Gartner • The Register Forums

Wednesday 13th May 2015 15:28 GMT Anonymous Coward

Catch 22?

The fundamental problem for HortonWorks et al. is that they need to make Hadoop easier to consume in order to drive adoption... but their business model is based on providing services because Hadoop is hard to consume.

3 0 Reply

Wednesday 13th May 2015 16:23 GMT Anonymous Coward

The key is finding business reasons to deploy Hadoop

What information does a company have? What is the value of crunching that data in certain ways? How does that value compare to the hardware, software, services and personnel expense of a Hadoop deployment? What other analytical packages might be "good enough" analytically, and less onerous from an expense perspective?

Doesn't sound line Hadoop is making enough progress in positioning itself vs. lesser analytical solutions or communicating customer success stories in an intelligible "wow! We could do that!!" Manner.

1 0 Reply

Wednesday 13th May 2015 16:41 GMT Anonymous Coward

Java too inefficient, unstable, and information design is wrong

The problem is that it shouldn't require rocket scientists to operate a damn database - when you have a correctly engineered DB it should just run. Witness SQLite, that thing can handle gigabytes of relational data, and it requires zero administration once the structures are defined. Do you see Apple deploying DBA's to go along with every iPhone?

Java deployments can achieve a modicum of stability with enough people to babysit it and machines to run them on, but it's overall an inherently unstable technology. This is why the only companies that run these things are companies with loads of cash to keep throwing into the furnace, but no company that has to justify its infrastructure expenses against the value-add of the system can possibly run something like this year after year.

The other issue is the database world is hopelessly out-of-whack when it comes to information design - key-value databases are indeed the way to go, but everyone is trying to force relational schemas onto them and this is retarded - you are taking a highly structured problem domain, then breaking it down into relational concepts like columns and rows, then forcing that mess into a key-value DB, then trying to re-constitute the structure you just threw away by doing very expensive map-reduce operations to create column views.

Completely retarded.

3 2 Reply

Wednesday 13th May 2015 20:26 GMT Anonymous Coward

Re: Java too inefficient, unstable, and information design is wrong

OMG, Gigabytes, wow.

You do realise that Hadoop is designed for PB - or 6 orders of magnitide greater. When you essentially stuff everything into a single disk file that is rarely larger than available RAM and it's being accessed by a single process in a single machine, then everything is pretty simple I would agree.

I think that the world of Hadoop where clusters of 4000+ servers with > 10PB of data and accessed by hundreds of users running complex analysis is a little different than looking up an address in an iPhone address book.

I can only surmise that 'Completely retarded' is your signature, not your final comment.

4 0 Reply
Thursday 14th May 2015 08:39 GMT Andy 73

Re: Java too inefficient, unstable, and information design is wrong

Oh dear.

Java's actually very good for this sort of stack - portable, robust and flexible enough to embrace the steady shift of development frameworks as we better understand the problems we're trying to resolve. It's far from inefficient and the tooling around it is excellent. If you're still comparing your iPhone sized database with the sort of thing we're doing with Hadoop, you need to re-read the part of the manual that explains how not all databases are the same. The system I'm currently working on consumes about a terabyte of data a day and retains that indefinitely. Not only that, but it delivers value - we're talking eight figure sums annually here for a single use case, but that is a long, long way from the sort of thing you'd achieve with SQLite.

What is the problem is the issue of wrangling a cluster of a few hundred machines and providing production level SLAs on processes running on that cluster. The tooling is catching up, but we're working from the ground up on technology that is still desperately immature. There are a LOT of moving parts in a typical deployment, and whilst Hortonworks, Cloudera and the others are getting better at bringing a working system up, there is much to be done to ensure BAU services are business as usual, not a series of experiments. It doesn't help that there are dozens of different frameworks and approaches to implementing a given solution - nearly everyone I talk to has found a new combination of tools to use and that's preventing companies from focussing their efforts on making one particular toolset feature complete and robust.

0 0 Reply
Thursday 14th May 2015 09:17 GMT Anonymous Coward

Re: Java too inefficient, unstable, and information design is wrong

If you think Hadoop is a database then you've fundamentally misunderstood what Hadoop is.

1 0 Reply

Wednesday 13th May 2015 17:43 GMT SecretSonOfHG

Wrong on all counts

In the order you present your points:

Rocket scientists: call me back when you've managed to scale your SQLite database so that you can run a query piecemeal on an arbitrary number of machines of different CPU architectures, word sizes and instructions sets. It's not Hadoop by itself that is complex, is the whole parallel programming paradigm that is difficult to grasp. There are other languages than Java to write parallel programs, but the inherent complexity is just shifted from learning the parallel paradigm in a general purpose language to learning the special purpose language and living with its limitations.

From an efficiency standpoint, Java is on par if not better than other VM architectures. The only thing more efficient than a JVM would be compiling directly C/C++ to native code. Which would mean programs much harder to write if you want to account for all the architectural differences in an heterogeneous environment. From a complexity standpoint, it is exactly the same to launch an OS instance with a Java environment installed or not. Same applies for C#, by the way.

Information design: perhaps you think that there is one and only one true way of storing data. Key-Value storage is a wide definition that applies also to RDBMS engines. The reality is that for each usage context there is a storage layout that is more efficient than the others. As a corollary, you have to translate between them when you want to use the data for a different purpose. This is unavoidable, you're not going to find a storage schema that fits equally well storing transactions with ACID-like properties and at the same time digesting huge amounts of data and perform transformations on it. Well, perhaps you've found it, in which case I'd suggest you stop posting comments and starting a business around it. You'll conquer the world.

1 0 Reply

Thursday 14th May 2015 12:03 GMT emkay2015

Re: Wrong on all counts

> It's not Hadoop by itself that is complex, is the whole parallel programming paradigm that is difficult to grasp.

The original Hadoop (1.0) was supposed to be used only for MapReduce programming, not for general purpose parallel programming though the latter was possible using some trickery. Map Reduce is actually a very simple SIMD kind of paradigm and simple to grasp.

When I became a Hadoop engineer, I was coming from the world of TCP/IP kernel stack programming and device drivers. I knew next to nothing about databases or about the world of data and information design. But I was able to fully grasp the MapReduce paradigm and the design of Hadoop in about 2 weeks.

However the next gen Hadoop called Yarn does actually allow more generic parallel programs beyond Map Reduce.

0 0 Reply

Wednesday 13th May 2015 17:53 GMT Anonymous Coward

Can we leave the religion out of it?

The Hadoop environment just feels so unpolished, even using one of the "supported" (cough) distributions. Getting even simple queries to work right felt like building some kind of Rube Goldberg/Heath Robinson machine. No wonder there's a lack of enthusiasm from anyone other than the cool chasers.

2 0 Reply

Thursday 14th May 2015 12:04 GMT emkay2015

Re: Can we leave the religion out of it?

> Getting even simple queries to work right felt like building some kind of Rube Goldberg/Heath Robinson machine.

Actually Hadoop is not for queries at all. It is not a database technology. You should not use it for searching and querying and maintaining databases.

Hadoop is only for big data analytics and processing of the kind where all rows of data are processed and that too using the same algorithm.

0 0 Reply

Wednesday 13th May 2015 21:54 GMT Alistair

I'm a rocket scientist apparently

It certainly isn't pretty -- but our Elephants are up and running. Quite happily trundling along sucking up whatever we throw at them.

I've got no less than 12 business units in queue to jump in to one or more of these and start participating in the data sloshing.

I'm keeping the bulk of them on hold for now - the single biggest roadblock for me right now is getting the security folks to agree to a framework, and to agree to the access controls I need to finish building.

Second hardest component is getting network engineering to comprehend that we actually can consume better than 60% of a 10Gb/s link, and yes, we do need that 10Gb access layer component for the top of rack switches to keep from slaughtering our out of data 1Gb/s server infrastructure network.

That and teaching folks that, no, you don't get to see *everything* that is in the pool, there is stuff no human being should be able to connect in here.

1 0 Reply

Thursday 14th May 2015 12:34 GMT Anonymous Coward

Spark should fix this

Hadoop's achilles heel has always been map reduce - not very fast and too low-level. Apache Spark is already driving much better adoption as it provides a unified core - so there's a lot less moving parts.

0 0 Reply

Thursday 14th May 2015 17:00 GMT JayTee253

"rocket scientists, not the masses" Really?!?

"rocket scientists, not the masses" Really?!? - Not sure I agree here.

1. Getting data in - if nothing else you can upload CSV files.

2. If you can't program - use Hive (or similar) to submit SQL queries.

3. If you're a little bit braver than SQL, have a go at writing a PIG script.

4. If the above seems to easy, pick your favourite language Python, Java, Scala and go wild!

0 0 Reply

Friday 15th May 2015 23:19 GMT sw41

The Cluster Half-full View

Log crunching is definitely the sweet spot, but more workloads on Hadoop are moving to real-time as well thanks to new compute engines like Spark and advances in Apache HBase. Here's a (mostly) quantitative view from a 50+ cross-section of Hadoop adopters on the gains they see:

https://www.mapr.com/blog/hadoop-adoption-is-the-cluster-half-full

0 0 Reply

Tuesday 19th May 2015 15:59 GMT Anonymous Coward

Another (solid?) remark by Gartner

Yet another "Linux is the hype du jour" like comment by Gartner. Do they actually live on planet Earth with the rest of us?

I have not seen a market penetration in the past few years that has been as considerable as Hadoop since (the aforementioned) Linux. Maybe getting this sort of comments from Gartner is actually a sign of good health as proven before.

0 0 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Catch 22?

The key is finding business reasons to deploy Hadoop

Java too inefficient, unstable, and information design is wrong

Re: Java too inefficient, unstable, and information design is wrong

Re: Java too inefficient, unstable, and information design is wrong

Re: Java too inefficient, unstable, and information design is wrong

Wrong on all counts

Re: Wrong on all counts

Can we leave the religion out of it?

Re: Can we leave the religion out of it?

I'm a rocket scientist apparently

Spark should fix this

"rocket scientists, not the masses" Really?!?

The Cluster Half-full View

Another (solid?) remark by Gartner

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

Staff say Dell's return to office mandate is a stealth layoff, especially for women

Microsoft signals expansion of APAC datacenter fleet with 'land acquisition' hire

Singapore wants datacenters, clouds, regulated like critical infrastructure

Activist investor KKR buys 20 percent of Asian datacenter outfit

Microsoft India poaches former AWS India boss Puneet Chandok

Singapore's government writes a standard for datacenter ops in the tropics

China outlines plan for National Integrated Government Affairs Big Data System

UK.gov finds billions in cash for big data contracts

Airbus pulls up hard, no longer buying 29.9% stake in Atos-owned Evidian

Ex-BigQuery exec and Motherduck CEO: For some users, the answer is to think small

UK datacenter biz Ark goes on date with private equity suitors

If you have a fan, and want this company to stay in business, bring it to IT now

About Us

Our Websites

Your Privacy