
Apache Spark is a batch data processing engine, not a data lake. A data lake is a logical architecture centered on sharing or federating raw data on-demand rather than curated marts/warehouses. A data lake is inherently not a "schemaless architecture" - it requires rigorous schema governance to exist.