Still sceptical about 4-core speed-up in HPC apps.
I have just been testing some quad-core boxes on a memory-hungry 3-D volume data analysis program. The 16-core total box maxed out at less than 8x speedup on 16 threads. Even when moving from 1 to 4 threads, memory contention issues lead to lower speed-up than with a 2-socket dual-core AMD opteron box (3.2 vs 4.2 times speed-up on 4 threads). I suspect the external memory controller is to blame. Once the Nehalem stuff arrives, things should be different.