Not sure what would happen if memory was way faster than processing
In a way, that was the case with CAFS: the storage (in this case the disk and its controller rather than RAM and its controller) was faster than the CPU which is why the task could be offloaded.
In principle, you can address the "faster than" and "slower than" mismatch simply by having more processors. If the memory is sufficiently fast, you can attach multiple processors to the same bus. If it's sufficiently slow, you can couple chunks of it up to equally slow individual processors that then communicate between themselves to share the results of their parallel computations.
So why did we end up with fast processors with increasingly complex cache hierarchies to optimise their use of slow memory? It's partly because the kind of tasks we habitually give to CPUs demand single-thread performance and partly because wiring up systems with lots of processors with their own memory is more expensive than dumping some cache on a CPU die.
Now that we have more parallelisable workloads, single-thread performance isn't necessarily such a benefit. Integrating processing capability into the memory solves the wiring problem.