Reply to post:

Apache Iceberg promises to change the economics of cloud-based data analytics

Anonymous Coward
Anonymous Coward

>[Iceberg is a Linux Foundation project]. We contribute a lot to it, but its governance structure is in Linux Foundation

Oh, pull the other one Ali. Everyone and their mum knows the only reason projects are put under LF governance is because - compared to ASF - LF doesn't *have* a governance structure. It's to all intents and purposes a holding corporation for intellectual property, and projects aren't forced to act as fair or neutral brokers. So, yes, Delta Lake as you can grab it from LF is technically open source, but nobody outside of Databricks is able to contribute to the direction of the project or the governance of the project. None of the development happens in the open and none of the contributions or decisions are reviewed in the open.

More importantly for end users, the Databricks flavor of Delta Lake is always several major steps ahead of the Open Source version, with key features like Hilbert Curves for Z-ordering, Insertion-order Clustering and Tombstones/Soft Deletes exclusive to the closed source fork. It's basically shareware - you only get the full-fat experience in the Databricks Runtime. So if you're someone using multiple processing engines on the same data (i.e. everyone) you're going to have a painful time of it.

It's a shame because in many ways Delta has a nicer design than Iceberg (and is definitely way more functional and performant as things stand), but because of Databricks's allergy to proper open source governance, it'll end up an also-ran as everyone else coalesces behind Iceberg and Nessie.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon