It's all a conspiracy!
They've designed them simply to mine bitcoin as their ad profits are declining!
If you've been curious about the potential performance of Google's TPU2 – its second-generation custom neural-network math acceleration chip – well, here's an early Christmas present. Google engineering veteran Jeff Dean and fellow Googler Chris Ying unveiled a few more details [PDF] about the silicon at the Neural Information …
For bigger matrix multiplications single precision floats often lead to unacceptable accumulation of round-off errors. I am surprised no double-precision figures are given. I would also be very interested in the power drain of these TPUs. In robots with a limited battery capacity, you really need to think hard about power draw from the computers (and yes, I know these won't be available for us any time soon). There are some excellent deep-learning based stereo methods, but they do require a GPU, so we are looking at hand-crafted methods which hopefully give similar accuracy, whilst running on a Raspberry Pi (not sure that will work, of course, but if successful it will seriously reduce power issues).
As usual, It All Depends. The industry has become accustomed to using DP, so of course using SP for DP-designed algorithms is problematic. BUT, if you know what you are doing, you can get whatever level of precision you need, so long as you know what the hardware will do. It can be slow, of course. But suppose you have an algorithm that requires DP-level precision for some concentrated 1% of its work. Even if it costs 10x as much to achieve on SP-hardware, you still are way ahead as long as the remaining 99% only costs 60% as much to run on SP as on DP.
Personally, I would be shocked if these were fullly IEEE-754 compatible at all. Certain parts of that standard are EXTREMELY expensive to support. I advocated for IBM, Intel, and AMD to get together and repudiate IEEE-754 in the late nineties. Instead, we got the rise of graphics chips--which don't support IEEE-754. (I'm mostly talking about denormals here.)
I've not seen any NN work that requires FP* let alone 32 bit precision.
* its easier to program in FP but 15bit +sign is more than sufficient for most. Indeed I'd hazard to bet that if your weights for inference need any where near that accuracy you're holding it wrong.
Items in Your Bag
iMac Pro Item Price: $13,348.00 Quantity (iMac Pro) 1
Line Price: $13,348.00 Remove from bag (iMac Pro) Part number: Z0UR, Ships: 6–8 weeks
Hardware 2.3GHz 18-core Intel Xeon W processor, Turbo Boost up to 4.3GHz, 128GB 2666MHz DDR4 ECC memory, 4TB SSD, Radeon Pro Vega 64 GPU compute engine with 16GB of HBM2 memory
Magic Mouse 2 + Magic Trackpad 2 - Space Gray
Magic Keyboard with Numeric Keypad - US English - Space Gray
Software: Pages, Numbers, Keynote, Photos, iMovie, GarageBand, macOS
Gift Options: Gift Options (iMac Pro), Add Gift Message to Packing Slip - Free (iMac Pro) "ideal xmas shiny" or maybe i'd rather have an TPU2?
It is interesting they have gone back up to using 32 bit floats. Most chip makers are pushing in the opposite direction, with 8 or 16 bit calculations. I think the reason is that layer after layer the nonlinear behavior of neural networks compounds (as in compound interest.) This results in chaos theory type behavior (the weighted sum operation in the networks being only partially cancel out the nonlinear aspects.) Then you have a butterfly effect where very small changes in the input or weights in early layers can have a very large effect at the output. Basically bifurcations define the decision boundaries between attractor states. That would also explain why deep neural networks are susceptible to adversarial attacks, where minor changes in the input result in gross misclassification.
You can correct that problem by putting multiple lean neural networks in a parallel ensemble, which results in a chaos canceling effect.