for facebook will probably buy cloudera for $30 billion in stock.
Intel is canning its own Hadoop distribution in favor of using Cloudera's flavor of the analytics software. And the chip giant is making a "significant equity investment" in the upstart that's higher than had been reported elsewhere. This strategic move was announced by Intel and Cloudera on Thursday following years of heavy …
Fb have a long history with Hadoop -including committers, & use HDFS and MapReduce heavily is some of the larger clusters.
But they've diverged: Corona is their MapReduce-only scheduler; their HDFS branch is very distant from Apache standard HDFS.
Buy Cloudera? No. Recruit engineers? Maybe -especially now their stock options have just been diluted
IDH was doomed from the start -no customers,. And not that much of a direct channel to potential customers with no compelling reason for its nominal hardware partners to ship.
Microsoft happy with Hortonworks-based distro (easier to negotiate with than Intel); server vendors usually partnering with Cloudera, Hortonworks and Mapr for completeness. Redhat are deep in Apache & Hortonworks, Ubuntu has embraced MapR for unknown reasons.
There's a lot of cost in creating a distribution -testing more than anything else- and nobody publishes their big test suites (Bigtop doesn't count: cloudera may have contributed the test runner, but not their tests).
Now Intel can stop spending $$ on that QA and
1. focus on what matters to them -latest support for server chips, SSD, NICs etc.
2. get it into a version of hadoop people actually use
3. Write benchmarks that argue that you need the latest server parts.
-MapR is in deep financial trouble -which makes it easier for all the Apache-based products to say "what will your data do when the sole vendor of the filesystem fails"?
-Hortonworks may get interest from a big competitor. Microsoft perhaps? Oracle feeling left out?
-Pivotal HD has lots of EMC funding and can keep going
-IBM? still funded there, lagging technically.
Finally, that last week a Forrester report on the Hadoop vendors said that IDH was going places as it had the backing of a company with lots of money. That's "leader" to "dead" in seven days. Doesn't say much for the quality of Forrester research.
As someone in the know... you're a bit out of touch... lets break it down to you...
1) Intel's exit from their own distro was painted on the wall when they first announced it.
Hadoop isn't their core competence. They have a couple of committers, however their goal in doing their own distro was to ensure that their kit is at the forefront. They were never in it for the long haul. They have no real skin in the game.
2) Microsoft partnered with Hortonworks because Hortonworks ported Hadoop to Windows. Microsoft wouldn't have a hadoop story without Horton.
3) MapR isn't hurting for cash. The problem with MapR is that the Apache crowd expected MapR to give away their secret sauce. They weren't going to do that. They have no motivation. MapR wants to build a better mousetrap. So their product is engineered for commercial grade. They just don't have strong marketing and are winning customers via word of mouth.
4) Forrester's research...
Forrester can't paint a company too negatively. So they have to look to the positives. And lets face it. Intel could toss $$$ and jump in 100% and hire engineers and spend money to establish themselves. That's the take away. But they don't need to do it. Again, their goal was to establish their position to ensure companies buy their kit and not AMD. Notice that there isn't a strong following for Power Series from IBM or Sparc from Oracle...
Forrester could have seen this coming, but to print any speculation would have a) had an effect and b) would have been premature. Think Quantum entanglement. Also think politics. You know the emperor has no clothes, but do you want to be the one to tell him that? Think of the ramifications.
You also don't know IBM. IBM is in a world of hurt and while they can spend billions on Watson and their Big Data products, they don't know how to integrate it and tell a story. They are the borg and the hive mind is rotting from the inside. Talk to anyone who's had their company purchased by IBM... many wait until the gold plated handcuffs are off and then bail.
Sorry, but while there will be consolidation, the Intel Cloudera thing probably scares Hortonworks the most.
MapR's sole justification for its premium is the filesystem which doesn't have the shared namenode "metadata server", and recently has added support for a key-value store that can pretend to be hbase.
-Hadoop 2.3+ has HA namenode failover, so you shouldn't see outages there.
-HBase is free to use.
-because nobody outside MapR uses maprfs, it's not tested at the scale HDFS is, so claims of scalability are theoretical rather than practical
-when mapr runs out of money, unless someone else buys it, your data is stuck on a filesystem nobody else has the source to.
For all its failings, HDFS is at least universal: it works the same on apache, EMR, CDH, HDP, IBM hadoop, etc. And if any one of those vendors makes you unhappy, you have the option of swapping vendor without having to reformat an 8 PB cluster after backing up the data (where?) and restoring it.
Maybe now that Intel has -effectively- bought Cloudera, Larry Ellison will be on the phone to the MapR team.
I think you need a bit of reality check.
1) HA in Apache. That's from Wandisco. Its a piece of proprietary software. So if you want HA, you need more machines as name nodes and you need to purchase a proprietary software component for your 'Open Source' cluster.
Note the following:
Wandisco's solution isn't bad, and it does offer the ability to have active / active clusters.
Its just that Horton made their marketing noise about how open source they are, yet must 'partner' with proprietary software in order to bring in HA. Note that the Apache folks could have used ZK and done it themselves by improving HDFS, but where's their incentive?
2) HBase is free. Sure. You buy MapR M5, HBase is free too. But if you want to have a better mousetrap... then look at M7.
3) MapRFS - Yes only MapR runs MapRFS. That's because its proprietary. But it does support the HDFS APIs so that anything you can run on HDFS, you can run on MapR, except that the same can't be said in reverse. You can't NFS mount HDFS. But it has been tested at scale. You just need to know who is in the know and ask the right questions.
4) Funding... One could say that Intel ran out of money. Or rather they ran the numbers, didn't see the return on investment. While only MapR knows MapR's money situation, I don't see them hurting for cash. I think that they need to invest more in terms of marketing.
Truthfully, I hate this MapR bashing.
I think that MapR, Horton and Cloudera all have viable products and takes on the market. Right now Horton is doing well with their marketing and are taking market from Cloudera. Beyond these three, its hard to tell.
As to MapR going to Oracle... not a chance. Larry bought several companies from Mike and while he's no longer CEO, he's still got Larry on speed dial. Larry can still buy out Cloudera if he so desired and Intel will still profit from it.
The statement above that MapR is the only FS that isn't reliant on non-redundant name nodes is not true. The concept of primary & secondary name node has been implemented for quite some time now (HDFSv2).
Beyond that, Isilon has been delivering N-way parallel name node & data service for over 2 years now. One can also run multiple Hadoop distros in parallel on the same data simultaneously. One can also land files using any protocol (e.g. NFS, SMB) and immediately use that data in an HDFS access (hdfs://...) in a compute job.
MapR's FS is a bit more than just that.
Its a complete rewrite.
You clearly don't know or understand Hadoop and what is being done by the secondary name node.
Its really a misnomer. You still have the issue of losing a name node and losing your cluster. Then there' s federation which isn't a good solution either. Each Name Node is a single point of failure.
The CLDB solution is countered by Apache releases using Wandisco's product, which is a proprietary solution.
I could say more, but I'm not allowed to. ;-)
"You still have the issue of losing a name node and losing your cluster. "
NN HA in Hadoop 2.2 (and CDH4 for the past 2 years) has active/passive failover. The NN fail and the failover Namenode switches over. This is not a secondary namenode -that's a bad name for a "metadata log checkpoint node". Is it as reslient as the MapR distributed metadata approach? Possibly not. But does it matter that much? Not necessarily.
Regarding MapR FS Being a complete rewrite, that's its weakness as well as a strength. MapR won't be benefiting from any R&D spend by Intel, let alone its direct competitors, field testing by the big web companies and so on. Its got a lot of work to stay technically better, especially once you copy your data into mapr://, it doesn't leave unless you have a nearby cluster to back it up to (or you trickle the data to Amazon glacier or the like)
Back in the 90's ORDBMs was all the craze.
Phil White bought Illustra and thought it would be easy to integrate it in. Rather than manage investor's expectations... He cooked the books. Pretty much killed the company.
It took Informix 5 years to recover and integrate ORDBMS in to the product as datablades. Oracle countered as did IBM with their add ons.
Nobody used them because they were complete and utter garbage. But they existed well enough to be a check off in terms of marketing and sales.
So to say Apache has HA, that's not the case. If it were, then why does Cloudera and Hortonworks partner with Wandisco?
I'm not suggesting that Wandisco is a bad product, its not. Its a very interesting product.
But what I am suggesting is that you need to do your homework before you make statements when the market data doesn't match what you're saying. ;-)
BTW, Illustra was a Stonebraker company. Mike Olson worked for him and was part of the company. This was before he did his other companies. Just thought I'd toss in that little fact to tie this all back together.
Here are my observations based on my experience in Enterprise Information Management, and recent experience with MapR.
1. Majority of the top Fortune 100 Enterprises that use large Hadoop clusters in production use MapR. When I say large, I am referring to 200 - 2000 node clusters. Yes, I have tested/used others and then chosen MapR. We did our due diligence and select MapR based on Company Viability and Roadmap - you will be surprise what they are doing quietly (I found them to be quiet as a saint!).
2. Please understand MapR File System is completely open. It is a 100% Apache Hadoop compliant distribution. Please don't spread incorrect information.
3. I am sure I would be pretty accurate to say many of the people on this thread that are commenting have never used Hadoop in production, leave alone playing around with a Hadoop cluster > 50 nodes. So readers, please take the negative comments in this thread with a lot of salt.
4. If there are readers that are seriously looking at a Hadoop solution, there is no harm speaking with MapR. Just call them and they will educate you on what the general industry seem to miss. It will be time very well spent.
While you're an informed consumer who chose MapR, I do need to point out something.
MapR's FS is proprietary. Its not open. It does however support the open source HDFS APIs so that anyone can write and port their stuff to the MapR environment and MapR will run all of the Hadoop products on top of their cluster. So if you want Impala and Tez, you can do it. Don't ask Horton or Cloudera to do that. ;-)
While that's a small nit, its an important one. Horton and Cloudera have made marketing campaigns against MapR because of this. That was, until they partnered with Wandisco to do HA.
MapR built a better mousetrap. I agree that anyone who wants a commercial grade product to look at MapR, however they are not without their faults. They may have a superior product, but that doesn't guarantee them success. (See Informix). Oracle won out because they had better marketing and sales. Not a better product.
MapR also trounced the benchmarks on Cisco kit. While its public knowledge... its not a well marketed message.
I am Hadoop vendor agnostic and I do see things that I like in all of the three major vendors. (As much as I see things that I don't like....)
And yes, I've built solutions on top of the three main vendors. ;-)
Biting the hand that feeds IT © 1998–2020