Apache Spark is a batch data processing engine, not a data lake. A data lake is a logical architecture centered on sharing or federating raw data on-demand rather than curated marts/warehouses. A data lake is inherently not a "schemaless architecture" - it requires rigorous schema governance to exist.
Leaving Spark behind, Databricks enters new territory as it eyes 2021 IPO
Databricks, the commercial company founded around the popular Apache Spark data lake, is making a strike for new class workloads and enterprise data management jobs in its make-or-break IPO year. Hawking technology news from the company’s Data + AI Summit, CEO Ali Ghodsi spoke to The Register about the new technologies. Ghodsi …
COMMENTS
-
-
Tuesday 1st June 2021 19:58 GMT Anonymous Coward
Tell me about it. It's a growing issue in the data realm where people want everything to look familiar and warehouse like. If all you have is a hammer, everything looks like a nail. I fear that the whole concept of big data and data lakes is dead and someone will need to invent a new way to achieve those goals since this one has been infected and turned into the same old tired paradigm.
-