API
Imagine the GPU will have a new API call.
has_squeezed_flops
With a True / False response.
Intel offered the closest glimpse yet at its flagship datacenter GPU, code named Ponte Vecchio, at the Hot Chips conference this week, with its own internal benchmarks showing the chip outperforming AMD’s MI250x and competing head-to-head with Nvidia’s upcoming H100 GPU. Announced last year, Ponte Vecchio is Intel’s first …
I saw the joke alert. But will post it anyway, to confirm your point of view.
https://www.blopeur.com/2020/04/08/Intel-x86-patent-never-ending.html
That you get the same performance for FP32 and FP64 is not that exceptional anymore. It is a consequence of the vector design and the fixed number of vector lanes if you want to avoid packed data formats that are much more difficult to handle for a compiler. I assume that for the smaller, AI-oriented data formats they do use packed formats. But the AMD MI200 series has exactly the same. In fact, it can run FP32 at twice the FP64 speed, but AMD didn't say much about it in its initial presentations because it is so hard to use and really needs manual coding for the packing. The same is true for the NEC SX Aurora TSUBASA vector processors. Again FP32=FP64 unless you use a packed format that requires hand coding.
The claim that AMD has said that MI300 will be 8 times faster than MI200 is also false. All that they have said is that for certain low-precision data formats and operations this will be the case. The matrix units in MI200 are not very good in low-precision operations used in some AI applications compared to the NVIDIA A100 tensor units. But they never claimed an 8-fold improvement in, e.g. FP64 vector performance.