Parallelism is hard.
That is all.
Nvidia had quite the showing at the International Supercomputing show in Hamburg last week. Its GH200 claimed a spot among the 10 most powerful publicly known supercomputers, while the CPU-GPU frankenchips dominated the Green500 for the efficiency prize. But Nvidia's gains in HPC may be short-lived if its next-gen Blackwell …
Yep. AMD remains the top dog in straight-up FP64 HPC oomph with its EPYC+MI250x combo (Frontier, still #1), and soon to supersede itself with the MI300A integrated form factor of El Capitan (currently in Early Delivery at Top500 #46). Funny-AI and sub-precision calcs (eg. FP32) are not gonna cut it in this (actually) serious area of computational activity. Mixed-precision (MxP) algorithms with FP64 endpoint targets will likely be useful if found consistent and stable (for convergence), but Grace-Hopper results seem conspicuously absent for that, at present ... and Aurora (Xeon Max + GPU Max) is the new leader with its 10.6 EF/s (10x speedup) edging just past Frontier.
Will Alps and Venado's GH200 HPL-MxP speedups match or best those of Selene with its A100s (9.9x speedup in Nov '23)? Is H100 good at HPL-MxP? What about B200? Will a fully updated HPL-MxP list for June '24 ever be posted online? Inquiring minds ...