back to article Google boffins tease custom AI math-chip TPU2 stats: 45 TFLOPS, 16GB HBM, benchmarks

If you've been curious about the potential performance of Google's TPU2 – its second-generation custom neural-network math acceleration chip – well, here's an early Christmas present. Google engineering veteran Jeff Dean and fellow Googler Chris Ying unveiled a few more details [PDF] about the silicon at the Neural Information …

  1. Brenda McViking
    Black Helicopters

    It's all a conspiracy!

    They've designed them simply to mine bitcoin as their ad profits are declining!

  2. Michael H.F. Wilkinson

    Single precision?

    For bigger matrix multiplications single precision floats often lead to unacceptable accumulation of round-off errors. I am surprised no double-precision figures are given. I would also be very interested in the power drain of these TPUs. In robots with a limited battery capacity, you really need to think hard about power draw from the computers (and yes, I know these won't be available for us any time soon). There are some excellent deep-learning based stereo methods, but they do require a GPU, so we are looking at hand-crafted methods which hopefully give similar accuracy, whilst running on a Raspberry Pi (not sure that will work, of course, but if successful it will seriously reduce power issues).

    1. Charlie Clark Silver badge

      Re: Single precision?

      At least for the first generation TPUs, lower precision was deliberately chosen to reduce the power drawer as higher precision wasn't required for the task at hand: these things are supposed to be useless FPUs and poor CPUs.

    2. Claptrap314 Silver badge

      Re: Single precision?

      As usual, It All Depends. The industry has become accustomed to using DP, so of course using SP for DP-designed algorithms is problematic. BUT, if you know what you are doing, you can get whatever level of precision you need, so long as you know what the hardware will do. It can be slow, of course. But suppose you have an algorithm that requires DP-level precision for some concentrated 1% of its work. Even if it costs 10x as much to achieve on SP-hardware, you still are way ahead as long as the remaining 99% only costs 60% as much to run on SP as on DP.

      Personally, I would be shocked if these were fullly IEEE-754 compatible at all. Certain parts of that standard are EXTREMELY expensive to support. I advocated for IBM, Intel, and AMD to get together and repudiate IEEE-754 in the late nineties. Instead, we got the rise of graphics chips--which don't support IEEE-754. (I'm mostly talking about denormals here.)

    3. Tom 7 Silver badge

      Re: Single precision?

      I've not seen any NN work that requires FP* let alone 32 bit precision.

      * its easier to program in FP but 15bit +sign is more than sufficient for most. Indeed I'd hazard to bet that if your weights for inference need any where near that accuracy you're holding it wrong.

  3. John Smith 19 Gold badge
    Unhappy

    I thought the idea of these custom FPUs was to lower precision

    Single color images are roughly 0-255 and telephone sound is 12 bit total (Phone CODECs have non linear sampling over the 8 bit range).

    So smaller range, more of them.

  4. Anonymous Coward
    Big Brother

    Big Thought!

    997 of the units will be heading to the NSA

  5. Anonymous Coward
    Anonymous Coward

    Processor "chip"?

    Looks more like a tower. Possibly a towering inferno, depending on how much power it consumes.

  6. Kaltern

    Why does this suddenly remind me of Cyberdyne...?

  7. Korev Silver badge
    Coat

    On the market?

    Will Google ever sell these or are they worried that they'd e a FLOP

  8. Anonymous Coward
    Anonymous Coward

    this also has 16GB HBM2

    Items in Your Bag

    iMac Pro Item Price: $13,348.00 Quantity (iMac Pro) 1

    Line Price: $13,348.00 Remove from bag (iMac Pro) Part number: Z0UR, Ships: 6–8 weeks

    Hardware 2.3GHz 18-core Intel Xeon W processor, Turbo Boost up to 4.3GHz, 128GB 2666MHz DDR4 ECC memory, 4TB SSD, Radeon Pro Vega 64 GPU compute engine with 16GB of HBM2 memory

    Magic Mouse 2 + Magic Trackpad 2 - Space Gray

    Magic Keyboard with Numeric Keypad - US English - Space Gray

    Accessory Kit

    Software: Pages, Numbers, Keynote, Photos, iMovie, GarageBand, macOS

    Gift Options: Gift Options (iMac Pro), Add Gift Message to Packing Slip - Free (iMac Pro) "ideal xmas shiny" or maybe i'd rather have an TPU2?

  9. John Smith 19 Gold badge
    Coat

    What I'm thinking......

    Pinouts on memory cards are standardized.

    Find a way to leverage that.

    Stop throwing transistors at the problem.

    Through memory instead, with individual processor bandwidth matched to individual memory card bandwidth.

    Just my $0.02

  10. Tom 7 Silver badge

    Can I buy one or three?

    wanna play.

  11. SeanC4S

    It is interesting they have gone back up to using 32 bit floats. Most chip makers are pushing in the opposite direction, with 8 or 16 bit calculations. I think the reason is that layer after layer the nonlinear behavior of neural networks compounds (as in compound interest.) This results in chaos theory type behavior (the weighted sum operation in the networks being only partially cancel out the nonlinear aspects.) Then you have a butterfly effect where very small changes in the input or weights in early layers can have a very large effect at the output. Basically bifurcations define the decision boundaries between attractor states. That would also explain why deep neural networks are susceptible to adversarial attacks, where minor changes in the input result in gross misclassification.

    You can correct that problem by putting multiple lean neural networks in a parallel ensemble, which results in a chaos canceling effect.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like