Fuck the environment
that is all
Specialist cloud operators skilled at running hot and power-hungry GPUs and other AI infrastructure are emerging, and while some of these players like CoreWeave, Lambda, or Voltage Park — have built their clusters using tens of thousands of Nvidia GPUs, others are turning to AMD instead. An example of the latter is bit barn …
Well, I'd sure love to see a broad-based face-off between MI300A and GH200 (or GB200) over the whole spectrum of useful computational tasks, from HPC, through graphs, and down to AI/ML. I'd expect that specific design tradeoffs give either an edge over the other in some situations, but a clear winner may be hard to identify (or not?).
Let's do some math here:
There are 8000 hours in a year (give or take). Assumption #1: a 3-year payback is needed as the tech goes obsolete pretty quick. That's $24k to play with based on customers with long-term commitments.
Your biz needs at least 50% margin(*) else nobody will invest, so you have $12k to play with now.
Your electric, plant, other utilities is about maybe 30%.
Your actual capital cost (of the RAM + CPU + GPU + network + + +) is about maybe 40% assuming (assumption #4) you are paying some kind of premium for the tech.
You have to pay somebody, perhaps yourself, for hardware and software maintenance and support and that takes up the last 30%. This may seem high but you have to have lots of spares and separate labour pool to run it, so there is an interest cost, capital cost, and labour cost all separate from the capital pool used to buy the gadgets. Source: me and many years of data centre costing experience.
The implication is that a GPU+related CPU is only, based on my assumptions, about 40% of $12k or $4800. Sounds too cheap (**).
I'd like to meet their bankers. I have some ocean-front property in Arizona that needs refinancing....
(*) margin is defined as percent of revenue that exceeds your direct costs, so therefore omits all corporate stuff like CEO, management, finance costs, taxes, 2-inch carpeting, company cars, sales droids, marketing flunkies, senior corporate account executives, vice presidents of government influence, and all that. In the end you might have 20% profit at a corporate level. That also sounds too low.
(**) This is at the margins, where here marginal cost applies provided you are buying thousands of units. (And yes "margin" is a word that has too many overloaded meanings.)
Cloud-1st providers, like TensorWave, financing growth through massive debt, are reminiscent of the data center build-outs that preceded the dot-com bubble. New horror-stories to replace customer equipment racks held ransom as bankruptcy proceedings played out for dozens of companies are sure to be created should the cycle repeat.