Reply to post: Re: Wow so many experts...

You're doing Hadoop and Spark wrong and they will probably fail

Anonymous Coward
Anonymous Coward

Re: Wow so many experts...

Could not agree more - so many reasons for failure but articles like this just add foobar to the mix.

The main reasons I have seen for failure:

1.) Lack of qualified use cases:

a.) Big data is not GB's, 10s of TBs or more

b.) Lack of identified business pain (what are we trying to solve)

c.) Lack of metrics (what are we trying to accomplish and how do we measure it)

2.) Expecting "magic" DW offload... ignoring the fact that existing DW's have decades of procedural work to analyse and identify candidates to migrate.

3.) Lack of an understanding that an offload will likely not entail moving everything across - it's usually a cost rationalisation exercise

4.) SI's dropping the ball. I'm sorry but I've seen this far too often: recently had an SI hire their Hadoop administrator who neither knew Linux or Hadoop... and also weren't prepared to send him on any training. A good SI would tick all three boxes.

5.) Ignoring that existing DW's had their own wobble to maturity

6.) A last one: SI's bringing their own pseudo Hadoop technologies into the mix... watering down many of the value props Hadoop (and associated technologies) bring.

The counterpoint to this article is that I have seen some very very successful customers, but ironically they will often be customers who don't to advertise their success.

Also the AWS comment... I'm stunned:

1.) AWS is good for transient use cases. Period. Spin up, Read S3, do work, Write S3, spin down. Long running clusters aren't suited for this due to lack of any kind of upgrade path for EMR clusters.

2.) AWS may provide the "latest" products but don't perform certifications with existing BI tools which is how most BI teams will integrate with Hadoop

3.) AWS has zero committers on either Hadoop or Spark. Good luck getting fixes in. They ride on the efforts on the major Hadoop distros.

4.) Many clients deploy Hadoop on AWS precisely for this reason (EC2).

5.) S3 may be a much better cost alternative to instance storage but sheesh will it be slow as dogs balls. Have seen clients do an about face when this materialises. Back to cost vs value.

6.) AWS dedicated Hadoop support... good luck with that.

In summary: this article is terrible. It confuses the issue of why Hadoop fails and gives people a fluff direction to look at AWS without any real meat behind it except, "They baseline quicker" without discussing why this might not be a good idea.

Also posting anonymously because this industry is small.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon