back to article MotherDuck scores $47.5m to prove scale-up databases are not quackers

In analytical database systems, the story of the last ten years or more has been about building out. Only databases distributed over multiple nodes could cope with the scale required by so-called Big Data. Web and mobile data were driving demand for systems which scale out, rather than rely on more and more powerful single …

  1. Anonymous Coward
    Anonymous Coward

    >Hadoop (technically a distributed file system)

    I mean if you want to get technical he's a stuffed yellow elephant. Apache Hadoop is a project loosely encompassing at least two distributed filesystems, a resource manager, two general-purpose compute frameworks and a massive grab-bag of connectors, adapters, protocols, widgets and gubbins depended on by the rest of the industry. But that's neither here nor there.

    I think the scale up vs scale out narrative gives away the game for MotherDuck. We haven't spent the last fifteen years scaling out just because it's fashionable, and we're under no delusions about performance. A bash sort | uniq is gonna run rings round a single-thread Spark df.distinct and we know that. We scale out because we have to. He's partially right that most queries don't run on 100TB scale, but I think most people would be surprised at how often that happens now. Many orgs *are* routinely slinging that kind of data volume around now because it's cheap and easy. Many more have ambitions to do so. But that's only part of it. We scale out to build in system tolerance to node failure. We scale out to distribute the work over more nodes, allowing us to use smaller/cheaper machines. We scale out to get more parallel IO - most workloads are IO, not CPU bound. The list goes on.

    I understand why they're doing this story. They need to build a market. Otherwise this ends up being just a component in someone else's product. A shinier, sexier, analysis-oriented version of SQLite with VC Backing, but ultimately just another embedded SQL execution engine. So they want to be the plucky upstarts fighting the big data orthodoxy. Make data small again, if you will. But it's a stupid narrative to build because it doesn't go anywhere. En embedded SQL engine is exactly what duckDB is. And it's very very flipping good at it. Lightning fast. Trivial to use. Easy to embed in anything else. I'm looking at a benchmark where it's running rings round our bespoke, data processing framework, beating it by 20+% on TPC-H workloads once we've got data onto a single node, with no config or tweaks or tuning. But therein lies the rub. We still need to get the data in and out. And that needs a product. If Motherduck start building a data platform they're quickly going to discover how hard that is. If they don't all they'll ever be is someone else's go-faster button.

    So because of that I'd be happy to bet they'll be acquired by a Snowflake or Databricks at some point in the medium term future. I'd assume both are poised to launch partner integrations in the next few weeks as a prelude to that and to test the waters. Which is a shame, because that inevitably means what is currently a cool, sexy, well-designed little open source widget is inevitably going to be borged into some other Silicon Valley monster corp/cult and disappear.

    I'd like to be wrong.

  2. xyz Silver badge

    Shocked!

    I was running 32TB datasets 12 years ago with instant, user defined querying and no cloud. Good indexing was the key. The way everyone's been talking I thought everyone was up to about 1000TBs not 100TBs. I might check this out. The data warehouse I'm involved with is only about 100GBs.

    1. Korev Silver badge
      Alien

      Re: Shocked!

      Out of interest, which technology did you use?

  3. well meaning but ultimately self defeating

    Are the employees called Motherduckers?

    1. Anonymous Coward
      Anonymous Coward

      "Are the employees called Motherduckers?"

      Ho ho ho !!! :)

      Literally they should be called 'Ducklings' :)

  4. Korev Silver badge
    Boffin

    Tigani tells The Register: "Everyone is talking about Big Data. Databricks and Snowflake have been trying to outdo each other in benchmark wars over a 100TB dataset. In reality, nobody uses that amount of data

    Odd, I'm just about to generate exactly that

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like