back to article Revealed: Blueprints to Google's AI FPU aka the Tensor Processing Unit

In 2013, Google realized that its growing dependence on machine learning would force it to double the number of data centers it operates to handle projected workloads. Based on the scant details Google provides about its data center operations – which include 15 major sites – the search-and-ad giant was looking at additional …

  1. Steve Channell
    IT Angle

    Tech-Porn is not news

    Unless they plan to licence the designs (ala ARM) and allow Universities and competitors to use these chips it really is not news, but Tech Porn. If you plan to add Tech-Porn as a category, why not cover Exxon’s attempts at synthetic petrol.. or some other proprietary tech that has real potential to change the world.

    1. Ian Michael Gumby

      @Steve Channell Re: Tech-Porn is not news

      Sorry, I have to give you a WTF? icon.

      Google designed the chips to give them a competitive advantage. There is no incentive of them to license the chips.

      At the same time, these are custom built, purpose built chips. Even if they were to market it and license it, is there enough of a potential customer base to justify the costs?

      The real significance to this story is that there is a shift towards custom built components and configurations when it can give you a competitive advantage. While COTS will keep costs down, the business value, reduction in operating costs... custom makes sense.

      This isn't tech-porn. Its showing a disruptive shift in thinking.

    2. Charlie Clark Silver badge

      Re: Tech-Porn is not news

      Get a fucking life you moaning bitch!

      The article does contain some interesting details such as being able to use 8-bit instead of 32-bit. It also confirms that if you want to do lots of machine learning nVidia is a good place to start, which is backed up by sales figures after a couple of failed bets. This is good for the industry as a whole, as it is a boost to custom chips and will put more pressure on Intel both to cut prices and be more accommodating in its own chips.

      And I also wouldn't put it past Google to release the specs at some point. They've done so with similar things in the past.

      1. Steve Channell

        Re: Tech-Porn is not news

        Thanks for the tip, the time I spent looking at TensorFlow would probably better spent doing something else.

        The idea that a 256*256 8-bit matrix multiplication unit will beat a 4k single precision GPGPU is hardly surprising, but would be news if the advantage scaled up to 32-bit matrix multiplication for BLAS operations. If google put their TPU into self-driving cars (like nVidia are doing) that would be great news to, and we'd all salivate on the potential in a Mars rover, but they are not.

        Proprietary hardware that depends on sending your data to an add-slinger stifles the industry because AI needs to move to the edge in phones/cars to be truly useful, but you don’t sell many adverts that way.

        1. Ian Michael Gumby

          @Steve WTF? Re: Tech-Porn is not news

          Again Seriously WTF?

          Look, NVidia makes GPUs that are doing double duty thanks to CUDA. So for a commercial product, they are taking advantage of their advancements in one area and applying them to another.

          In terms of Google, they *are* involved in self driving cars. Do you not pay attention to the news where they outed UBER's recent hires for stealing their tech?

          AIs in phones? Really, for what? predict the next digit you're about to dial, or to automatically phone your wife to tell her that you're taking an uber home because you're too tanked to drive your self driving car?

          Seriously... Do you not also understand that if you thought exploding phones were bad from batteries watch how you get 3rd degree burns from that GPU or FPU generates enough heat to burn a hole in your pants.

          1. Steve Channell

            Re: @Steve WTF? Tech-Porn is not news

            Google are indeed into self-driving cars, and have been very successful because it is an extension of google maps and streetview with all the heavy lifting done in the Google cloud. This is very good for google because all the telemetry data is uploaded, enabling them to dispense with all those streetview cars going around photographing roads: you get a self-driving car, and they get an unpaid agent.

            Nothing wrong with that commercial strategy, but it does not work for public transport (which has to keep going, even if the network is down), doesn’t work for emergencies like Fukashima, doesn’t work for the developing world and doesn’t work for a Mars rover; and stifles investment in wider technology innovations. They could put more smarts at the edge, but we wouldn’t need to give them data.

            All smart phones now have multiple processors with low-power processors for telephony/music and faster processors and GPU where more power is needed. A TPU could work like a GPU that switches on for fingerprint, face/voice recognition or video scanning; but stifles investment for innovation unless chip makers can add the technology to a corner of a SoC.

            1. Steve Channell

              Re: @Steve WTF? Tech-Porn is not news


              It is kinda pleasing when Google looks at your comments, and goes "yeh, I think he might have a point"

  2. Cuddles

    Not so fast?

    ""The comparison doesn't look quite so rosy next to the current-gen Tesla P40 GPU, which advertises 47 INT8 TOP/s at 250W TDP; compared to the P40, the TPU is about 1.9x faster and 6.5x more energy-efficient," Johnson wrote."

    Which is still a pretty decent margin really. Given that Google has been actually using these things since 2015, while the P40 was only announced a few months ago, that's still a pretty favourable comparison. Compared to contemporary competition, the TPU is way ahead; compared to modern competition, the TPU is still way ahead, just not by quite so much.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not so fast?

      Indeed this is great news to those of use looking at some smaller scale TPU/GPU hardware for home AI tinkering. Right now I can get an Nvidia Jetson board (2 ARM cores + 192 CUDA cores(?)) for about US$200. If Google can release the specs we might be able to buy these things in the near future, rather than use the cloudy version of this new h/w. Then some smart chaps in the UK can build out a GPIO extender to talk to the TPU and make my Raspberry Pi do some thunkin'! [sic] :P The sky's the limit! [Basil, 22 rooms is the limit. -- Mrs. Fawlty]

      1. Ian Michael Gumby

        @AC ... Re: Not so fast?

        You gonna write the Linux Device Driver for that?

        Didn't think so.

  3. Anonymous Coward
    Anonymous Coward

    Was Myles Dyson on the development team?

  4. Stevie


    Yesyesyesyes, quantizing the whoozit, gottit.

    But the important question remains: whose engrams were used?

    1. Anonymous Coward Silver badge

      Re: Bah!

      and will it blend?

  5. Tom 7 Silver badge

    So about par with the (soon I hope) new Parrallella chip.

    Ish . Possibly. Who knows - it should be back from fab soon?

    1. Frumious Bandersnatch

      Re: So about par with the (soon I hope) new Parrallella chip.

      Hmmm. I didn't know that Adapteva were bringing out a new model. Last time I checked on their website (around a month and a half ago?) the whole effort looked pretty moribund. While reading this article I was tempted to start messing around with my 16-core board again.

  6. BobC


    Though I have great hopes for ASICs like the TPU, and for the many FPGA-based ANN accelerators, as well as for upcoming 1K+ core designs like Adapteva's, the bottom line is support for the common ANN packages and OpenCL.

    In that regard, the GPU will reign supreme until one of the other hardware solutions achieves broad support. Only AMD and NVIDIA GPUs provide "serious" OpenCL support, and between the two, NVIDIA is preferred in the ANN world.

    An important previously-mentioned caveat is that most of the ASIC and FPGA accelerators aren't much help during training. For now, you're still going to need CPU/GPU-based ANN support.

  7. Ken Hagan Gold badge

    Can it do anything else?

    I mean, it sounds just great for people who have a workload that is 99% machine learning according to the particular algorithm for ML that Google happen to be using (*), but just as GPUs look great but can't do anything that isn't embarrassingly parallel, so the TPU looks great but is even more specialised. (* Specifically, the one they were using back in the distant past when the chip was designed. I doubt the algorithm designers have sat on their laurels since then.)

    It reminds me of the dedicated hardware (in non-server CPUs) for AVC and HEVC which are many times faster than using either GPU or CPU for the same job, but I'm not aware of anyone managing to turn those to any other task. (At least those algorithms have been adopted as standards and consequently have something of a shelf-life to justify baking them into the chip.)

    Worse, for Google, if it is only a small integer factor faster than a GPU then it will face pretty stiff competition from FPGA-on-chip if and when that gets some traction from OS and application writers. An FPGA is dedicated hardware that you can change when you think of a new algorithm.

    1. Ian Michael Gumby

      @Ken Re: Can it do anything else?

      That's part of what makes this interesting.

      Google is investing in custom hardware that is designed for a narrow niche of applications.

      Its an indicator that it makes sense to buy custom hardware and not to rely on COTS because the value exceeds the costs.

    2. Kevin McMurtrie Silver badge

      Re: Can it do anything else?

      I don't think Google cares if it does anything else. When it looks like you need $15 billion of new servers for a new project, running at 1.9x the speed saves $7.1 billion. Even a tiny 5% performance gain saves $714 million.

  8. Anonymous Coward
    Anonymous Coward

    Major thing there... price.

    A top of the line Tesla is going to run you $4,000+ MSRP. If going in house to develop their own ASIC saves them money at the order of magnitude they need (thousands of units)... then by all means do it. Maybe Nvidia might be forced to lower the price of their Tesla / Quadro cards to actually earn a place in these large data centers again.

  9. Conundrum1885

    Re. price

    There was I thinking that £350 for a used GTX1080 card was cheap.

    Seems that folks using these for Bitcoin mining realized early on that custom hardware was better, however the cards themselves have many other uses such as medical imaging, machine learning, data analysis, password bruteforcing, etc.

    Serious gamers moved on a long time ago but for certain applications (eg machine learning) an array of cheap used cards on budget boards (eg quad core AM3+) will still do more work than one £xpen$ive card when suitably retrofitted with better cooling and updated thermal management such as heat pipes just for the RAM not to mention the latest software custom designed to squeeze every last IOPS out of the overclocked GPUs.

    I once did this to my not-2-day-old RS480 and got about a 11% performance boost with no overheating.

    Some other folks are leveraging used laptop boards with on-chip GPUs as they are quite compact but this approach is significantly more complicated due to limitations in clock speeds with a laptop.

    Even replacing the base clock does not help here as the limit is in the interconnects.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like