Apache Spark is a batch data processing engine, not a data lake. A data lake is a logical architecture centered on sharing or federating raw data on-demand rather than curated marts/warehouses. A data lake is inherently not a "schemaless architecture" - it requires rigorous schema governance to exist.
Databricks, the commercial company founded around the popular Apache Spark data lake, is making a strike for new class workloads and enterprise data management jobs in its make-or-break IPO year. Hawking technology news from the company’s Data + AI Summit, CEO Ali Ghodsi spoke to The Register about the new technologies. Ghodsi …
Tuesday 1st June 2021 19:58 GMT Anonymous Coward
Tell me about it. It's a growing issue in the data realm where people want everything to look familiar and warehouse like. If all you have is a hammer, everything looks like a nail. I fear that the whole concept of big data and data lakes is dead and someone will need to invent a new way to achieve those goals since this one has been infected and turned into the same old tired paradigm.