
My personal fave standard is SATA 3 transferring files at "6 Gb/s." Has anyone seen anything above 1Gb/s transfers? How about 512Mb/s?
A multi-exaflop supercomputer the size of your mini-fridge? Sure, but read the fine print and you may discover those performance figures have been a bit … stretched. As more chipmakers bake support for 8-bit floating point (FP8) math into next-gen silicon, we can expect an era of increasingly wild AI performance claims that …
My personal fave standard is SATA 3 transferring files at "6 Gb/s." Has anyone seen anything above 1Gb/s transfers? How about 512Mb/s?
Yes, pretty much every SATA SSD I've used can hit around the 600-650MiB/sec mark.
650MiByte/sec * 8 = 5200MiBit/sec.
5200 * 1024 = 5324800 bit/sec / 1000 / 1000 = ~5.3Gbit/sec
Given protocol overhead needs factoring into that, seems pretty close to me.
I assume the answer is 'nobody's done it' but, like all other benchmarks, it would be helpful if an FP64 installation was benchmarked with FP8.
The whole thing seems like marketing set the goal of 'exa' and the engineers figured out what benchmark would get them there.
Let's just be glad they didn't settle on FP4 (or 2 or 1).
Personally I'm going to go on with my life. This doesn't even rate popcorn.
I'd have thought FP8 (256 "numbers") could be done in look up tables, no need for actual processing logic of any complexity. As the bit number goes up I think either the silicon area goes up exponentially or the speed down so maybe 8 bits is a sweet spot for lots of mini-ALUs on the chip.
Not many.
Neither was my CPC.
I once wrote my own floating point routines in Z80 assembler implementing non-optimal algorithms, and achieved a doubling of speed over a lower precision implementation used by a pascal compiler.
I didn't need division :)
I find this kind of thing very amusing. No one knows exactly how their AI ML rigs produce the answers they do. Everyone knows that moving down to FP8 is adding a ton of noise to their calculations. No one can tell if that's a problem or not. But they do know it's running faster, and that's probably a bigger "success" metric than "correct answers" are (cos no one is expecting AI/ML to output perfection).
Results are that the chip makers can sell more chips, the researchers can point to their progress in speeding these things up or making them bigger...
As for whether they work properly or not? Well that's likely taken a hit...
Are they coercing to integers, or is it actually 'true' floating point with sign bit, mantissa and exponent?
How many bits of mantissa and exponent?
Does the implementation include subnormals, or do they all get coerced to zero?
-0, +/-Inf and NaN don't count as numbers, so there's only 252 unique values. Fewer if you exclude subnormals.
IEE754 doesn't define FP8 yet, so it's impossible to compare any 'FP8' implementations.
Graphics cards have of course been consuming and outputting 8-bit values since the very beginning.
Either way there's a trap here, using such a low precision with a wide dynamic range means that when adding multiple numbers, small ones may not contribute to the total because they are individually too insignificant. If there are many such small numbers then you'll get a very wrong answer.
Biting the hand that feeds IT © 1998–2022