The interesting thing is that you are seeing a potential change in the clustering model behind Hadoop.
In today's data center you have to carve out your cluster and its static. You really can't dynamically change the cluster as needed.
So you need to consider changing the model to a compute/storage model. (e.g. AWS EC2 clusters using S3 for the permanent storage and running their M/R jobs against the data directly in S3)
Here, you can spin up / spin down the resources as needed. (Although the data lake/sewer will continue to grow over time unless you decide to migrate older data to colder (iceberg) storage.
Docker fits in when you need to spin up or spin down your compute clusters. However, you may also want to consider spinning up and spinning down nodes in your lake as needed. Or consider that if you have rows and rows of machines all running VMs on the same subnet behind a massive firewall within your DC, you can now spin up / spin down spark clusters, M7(HBase), Storm, Kafka, web, etc.. for better utilization.
Makes it a bit more interesting.