What about the memory
All the really big HPC systems on the top-500 have only a couple of cores per node. Beyond that and memory bandwidth saturates. For HPC type applications most of the parallelism comes from large node counts. Unfortunatly using multiple nodes is much harder than multi-threading.
In my opinion very large core counts will only work for niche applications unless we start to see some inovation in memory system design but the memory manufactures seem only to be interested in making larger chips of the same old basic types rather than investing in significantly new technologies. Whatever happened to rambus?