Why do ppl act before they understand the Problem, and the Solution?
Is it the seniors not asking the right questions? or are vested interests hiding what's going on?
13 publicly visible posts • joined 25 Sep 2021
As Martin Fowler noted "if it hurts, do it more often". Get your firm to get the automation that you need in place.
Early in engagements, I commonly try to get reports on the ages of keys and certs. The ages of some of them can be enough to drive some desire to get this on the risk register. The biggest threat is internal (or ex internal) staff.
How on earth is NHS coordinating the solutions (or even requirements)? Unless these are managed, there will just be a lot of point solutions and there will continue to be the comment that the 'systems don't talk to each other'.
It's not super-hard. But it's not trivial either and could well involve quite far-reaching process changes (and hence pushback/organised change management).
I see no evidence that NHS has understood what went wrong with NPrIT, nor got beyond the expectation that if you buy a bunch of systems, they will automatically conform to the mental model that you have of how they will work (and that everyone else shares that model).
I used to run the PS for an IT discovery outfit (now part of BMC), when cloud was starting.
IT spending was very opaque: much effort went into hiding where costs were incurred and recovering them in later years.
Around 50% of capex was for assets that were not used. Cloud could change that, but few seemed to want to measure where the money was actually going. But you have to put the measurement in place. And, if you're worried about vendor lock-in, invest in portability. Although, I'm not a fan of using multiple clouds or hybrid delivery as it's much harder to keep the moving parts working efficiently.
The biz case for cloud is similar to the biz case for the heritage electricity market: capex is a function of peak demand, profitability a function of utilisation, so aggregate demand (qv Nick Carr's "The Big Switch") It works better than owning and managing the kit for most loads. Competition should ensure that this situation remains.
Of some concern may be that Civo seems to be encountering large barriers to entry, which will reduce market competition.
If the demand is well understood and stable, then it's perfectly possible that owned kit is lower cost. But, when I looked at this in large orgs around the time that cloud was emerging, it was clear that not only were capex costs very large - usually because the user demand and how the system behaved were not well enough understood before the investments were made - they were usually hidden, as were the opex costs of competing technical solutions and sustaining the necessary infrastructure skills, DR costs (for non-validated disaster recovery techniques), etc.
It's not very hard to migrate to public cloud providers with a suitable abstraction to enable porting between them, if/when needed, using Terraform or Palumi.
What does need to be understood is that on-prem technical architectures that depend on availability capabilities of hardware need to be redesigned, and, fairly obviously, moving from a large capex model to opex requires financial discipline. This latter point seems to be a surprise. However, I've yet to see an org that had a very good grip on the capex costs of IT systems, which tend to suffer similar issues to poorly managed supply chains: demand isn't well understood at initiation, but spend commitments are made then, and, since the buyer gets more grief for not meeting demand than for over-spending, too much is bought, which then leads to wasted efforts selling the excess supply in subsequent years.
Stroustrup ought to know about memory safety and strongly typed programming languages as he worked on the CAP computer, which had its operating system written in Algol68 for that reason, to show that you can use high level languages (as opposed to BCPL, its untyped competitor and precursor of C), and improved programmer performance).
Many cloud users are very naive - notably those who aspire to multi-cloud strategies. It's not very hard, but it is fast moving and innovative. So, if you want longevity for a solution, you need to work to ensure it. If you want to understand the costs, then measure them (qv on-prem approaches, where the budget is just set, top down and actual spend is hidden for years).
Waite is, substantially, correct. Few orgs measure their on-prem RTO/RPO very well, nor allocate costs appropriately. I've only encountered one client that had a DR process that was thoroughly tested before a disaster, and that involved doubling capex, by swapping prod and DR each month. Cloud outages get huge publicity.
I'm not sure about 'lightweight versions of services' this sounds expensive and challenging to get sufficiently right. It's usually easier to focus on getting blue/green deployments working well and use this approach to underpin failover to a different set of datacentres.
Lock-in is always a problem and needs someone to worry about it/measure it. For clouds, it's usually better to take advantage of new services, and then backfill them with alternatives post implementation.
That's exactly what I was thinking. Why is this so hard? Global IP traffic for consumers is now mostly IPv6 in many jurisdictions, and firms such as MS made a big deal about how they used it to avoid clashing RFC 1918 networks.
What's less clear is why there's no big push for IPv6 from the cloud vendors and startups helping to seed best practice. There is a barrier in a lot of legacy software, but why keep layering on the pain? (or is it just job preservation that's not been spotted?)
I looked at cloud economics in some detail for organisations with annual IT spends of $!M to $7Bn. The best summary of how the value chain hangs together is in Nick Carr's "The Big Switch": on the supply side, capex is a function of maximum demand, operating profit/efficiency is a function of average utilisation. So it pays to pool demand, much like traditional electricity generation based around large power plants.
However, cloud first entails some very significant changes to how IT is built and run, largely as the on-prem model relies on hardware to provide various availability/recovery processes, whereas these are delivered by the application layer in the cloud, you have to manage the demand and capacity used as you didn't buy orders of magnitude too much to start with (and I did measure some instances where, because capacity had to be estimated before demand was understood, such overcapacity had been bought and recovering the costs was being hidden in the budgeting process.) In practice, 'cloud first' implies Continuous Delivery.
Unfortunately, even for traditional IT, govt. procurement isn't always great, and typically creates zero sum 'multi-sourced' contracts, which obviously lead to excess costs.
It's typically not practical to lift and shift existing govt. IT systems as they don't have any tests and assume a particular technical architecture. A good example here cold be HMRC's CHIEF system, which was initially considered for retirement around 2010, but the first suppliers couldn't work out what it did (and they are very competent people). It's currently being superseded by CDS.
Where I looked at on-prem vs cloud security and reliability, the usual situation was that the on-prem characteristics weren't known. When you start digging, you quickly find that the reason that they are not known is to protect the guilty.