
A hardware bonanza
Good point about tokens-per-dollar, with capital and running costs for TCO! TNP has some data points on this, about how optics can beat copper in the networking part, and how at FP64 AMD CPUs and CPU+GPU provide the best bang for the buck at present ... we might need a similar analysis for FP16 too ... (or a refresher given my limited human memory!).
Madhu Rangarajan's interview was also interesting relative to the degree to which pure CPUs might be used for inference as a function of model size and frequency of use.
Beyond the bigger players though I do love the diversity of hardware being developed for inference, from Cerebras' distributed thinking, through SambaNova's novel rhythms and Tenstorrent's dataflow, to NextSilicon's free spirited Maverick-2 mill cores, and on to d-matrix's Digital-In Memory Compute (DIMC) tech and EnCharge's analog route, among others. There's something there for everyone it seems (and don't forget the interconnects -- they're reconfigurable on the Maverick)!