back to article Industry reacts to DuckDB's radical rethink of Lakehouse architecture

It's been a year since Databricks bought Tabular for $1 billion, livening up the sleepy world of table formats. People have been playing with it. It's captured people's imaginations for sure... The data lake company, with its origins around Apache Spark, had created the Delta Lake table format to help users bring query …

  1. Pierre 1970

    Expert exchange

    How could any barely technical Data Engineer not love this article?!

    Great job, vultures! (again)

    1. Yet Another Anonymous coward Silver badge

      Re: Expert exchange

      Although I wasn't sure if I was having a stroke reading it

  2. Charlie Clark Silver badge

    Vested interest nonsense

    The actual storage of the metadata, to me, is an implementation detail, and whether you store it in the file system, or you store it in a catalog, or something like that is or a relational data store, is not as important as the APIs you use to interact with it."

    Important here is the REST spec, which "behind the scenes" can keep information about metadata, where the files are.

    This is back-to-front: REST is merely a communication protocol. Where you store your data is, for data analysis, far more important than the protocol. And, even if it's metadata, I see no good reason not to keep it in the database.

    The Duck approach is generating a lot of enthusiasm from those looking to move away from US-based lock-in.

    1. Anonymous Coward
      Anonymous Coward

      Re: Vested interest nonsense

      Iceberg is an open source Apache project, with wide industry support, DuckDB's solution is proprietary lock in.

    2. Groo The Wanderer - A Canuck

      Re: Vested interest nonsense

      The problem is there is no truly portable solution to the need for "the database" to handle the database storage of the metadata for DuckDB's proposal. I agree with the Apache committees - that is a non-starter. Metadata must be available in a text-only format that can be versioned along with the configuration files of the system so you have a proper recovery point. You can't easily do that with an active RDBMS. Nor do I know of any widely-accepted RDBMS that can be guaranteed to handle the sheer volume of requests that a large data lake system can be presumed to be dealing with.

  3. bitwise

    Postgres

    For those of who weren't born in the clouds and only visit it out of convenience, just stick with Postgres.

  4. ecofeco Silver badge

    I wonder if it is related to this?

    I just stumbled across this over the weekend.

    https://www.geeksforgeeks.org/time-space-trade-off-in-algorithms/

    You have to scroll pretty far down to find the payoff. The first part of the article discusses past and current solutions before it get to the new discovery. There is also a good YouTube video that explains this better.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like