Re: No mention of power usage?
..it's also an energy cost to transfer Terabytes of data somewhere cooler to do the calculations and transfer the results back, not to mention budgetary costs.
Possibly not that much, but depends on the modelling. Basic modelling is a wicked problem, but relatively easy. Break the world up into a bunch of grid squares, add maybe half a dozen parameters, add in the calculations and call it good. Then you 'just' have to decide how often to run the numbers, pass the data and hit 'Go'. So maybe a few million calculations and parameter passing per simulated hour and you'd like the results for say, 100yrs into the future and you really want the results back faster than real-time. So with bigger iron, you can do more runs per work day, tweak more parameters and generate more results.
Inputs for that don't need to be Terabytes, and neither are the outputs.
Where it gets more complicated is doing actual weather forecasting, because that's far more complicated. There, you're trying to accurately predict weather, not climate because climate is just averaged weather. Plus (from memory when I lived in Reading and drank with ECMWF folks) you need to produce 3 models a day because people rely on those forecasts. That requires ingesting far more data, especially as the 'climate hype' has had the bonus of producing more weather observation systems and data sources. Model output isn't that large either.
Then it gets a little more complicated if you're doing reanalysys, ie comparing models to reality.. But that still doesn't need that much data given you may only be comparing hourly temps, precipitation, relative humidity etc. But there you're constrained by both availability of data, and number of grid squares in your model vs observations.
I think the architecture questions get more interesting, but are waaay outside my area of expertise. I have however had some fun chats about this with people who do this stuff. So I've always been curious about CPU vs GPU given most of the world is analogue and vector performance just seems better. That also touched on aspects like FPC64 vs 32, or even smaller. After all, if your data are often all capable of being represented in 8bits, do you really need 64bit performance? So for example temperature data can only be measured to maybe 2 decimal places, and often with high error margins even then.
Which was all rather fascinating. So stuff like could CUDA cores on a GPU act as 'cells' for models, and pass results on to other cores more quickly than a CPU that's perhaps better optimised to calculate in serial rather than parallel.. And generally the answer was 'yes', but it all gets very complicated very quickly, and alcohol was involved.