@ Roo,
"I think it's reasonable to quote single precision (32bit) FLOPS, given the origin of the term - but quoting half-precision figures is taking the biscuit."
You beat me to it!
"If pressed I would speculate that some real-time signal processing app out there can make use of 21 Thalflops, I'd be interested to hear what kind of apps folks think those halfFLOPs will be good for. :)
Not a lot to be honest. To make good use of this chip one would have to load it up with a good chunk of data and then perform a whole cart load of sums on it. Otherwise one would simply be wasting time doing nothing but tiresome transfers across the PCIe bus.
However, the more sums performed, the more significant the arithmetic errors will become as 16 bit half-floats struggle and fail to keep up with value growth that happens in, for example, an FFT. No matter which way one spins it, 16 bits can represent only 65536 different values; it's a bit like Trolls counting in Terry Pratchett books - "one, two, many, lots".
But the 32 bit floats, yeah baby!
I'm still not convinced though - a stonking great compute engine at the end of a PCIe connection is in the wrong place; you still have to transfer the data over to it to get sums done. If one's application doesn't need that many sums done on the data it'd simply be a waste of transistors.
Certainly for some applications I can see the up-coming (or here already?) Xeon-Phi (the one that is a CPU in its own right) beating this chip, despite it maxing out at 6TFLOPs-ish, simply because the data and compute are already in the same place.
China
I expect the US government to be keen to not let China get hold of any of these. I read Intel aren't allowed to sell Xeon Phi to the Chinese, so letting NVidia sell this GPU to them would be a bit inconsistent. I don't know enough about the corporate structure of NVidia to know whether Uncle Sam has the same level of influence as they have over Intel.