I mean, it sounds just great for people who have a workload that is 99% machine learning according to the particular algorithm for ML that Google happen to be using (*), but just as GPUs look great but can't do anything that isn't embarrassingly parallel, so the TPU looks great but is even more specialised. (* Specifically, the one they were using back in the distant past when the chip was designed. I doubt the algorithm designers have sat on their laurels since then.)

It reminds me of the dedicated hardware (in non-server CPUs) for AVC and HEVC which are many times faster than using either GPU or CPU for the same job, but I'm not aware of anyone managing to turn those to any other task. (At least those algorithms have been adopted as standards and consequently have something of a shelf-life to justify baking them into the chip.)

Worse, for Google, if it is only a small integer factor faster than a GPU then it will face pretty stiff competition from FPGA-on-chip if and when that gets some traction from OS and application writers. An FPGA is dedicated hardware that you can change when you think of a new algorithm.

