Much needed
Yeah, the good folks at Sandia Vanguard describe the Mavericks as a runtime-reconfigurable accelerator which vastly helps it adapt its dataflow to workload specifics ... very neat! We badly need this capability also in scale-up/out networking to propagate the benefits of this flexibility to the system scale (with PCIe 6, CXL 3, and CPO).
The 600 GF/s FP64 perf on HPCG may sound low compared to 45 TF/s dense (eg. HPL) on a GB200, but checking with Top500 shows that the HPCG perf of Frontier (MI250X), Aurora (GPU Max), and Alps (GH200) is less than 1% of their perf on dense HPL (aka Rmax). In other words, it would take a 60+ TF/s FP64 GPU to get the 600 GF/s on HPCG that Maverick-2 (750W dual-die) gets. Interestingly, TNP reports (linked under "pointed out") that this Maverick cranks 40 TF/s on dense calcs, making its HPCG oomph 1.5% of its dense grunt, which is 1.5x to 3x better than seen in current Top500 GPUs ... nice!
The 2023 Gordon Bell Prize for Climate Modelling rewarded the SCREAM team for their pioneering exascale 1.26 simulated years per day of cloud-resolving earth atmosphere simulation at 3.25 km resolution. Getting the resolution down to 1 km should require (3.25 x 3.25 x 3.25)² more computations (approx. 1000x, i.e. Zettaflopping) and any tech that helps us get there efficiently is welcome imho (eg. Maverick-2). Meanwhile, some folks claim they can compute the Full Earth System at 1km, with 91.8 simulated days per day (1/4-year per day), on Alps and Jupiter, which should be interesting to see at SC25, if it works (software and hardware combined to improve perf further than either alone?)!