DevOps
Making the change straight into production and using your customers as testers
Google's again 'fessed up to cooking its own cloud. This time the mess was brief – just under two hours last Monday – and took down its Memcache service. The result was “Managed VMs experienced failures of all HTTP requests and App Engine API calls during this incident.” There's a little upside in the fact that Google now …
.... before CTOs realise that having all your business critical apps at the mercy of a 3rd party provider (over and above the ISP) is a Really Bad Idea and that having a local failover server might be useful. It strikes me that these people think a business contract is some magic wand that makes all IT issues disappear never to be seen again. I guess you just can't educate pork.
I wouldn't say that OpEx is preferred now. People are usually set up for CapEx because that is the way it has worked forever.
It isn't so much an OpEx vs CapEx thing. It is a wasting a lot of resource time thing. Cloud charges for something close to actual utilization. On prem charges you for peak utilization 365 days of the year.
Almost no IaaS cloud charges for close to utilization. They charge for provisoning. Exceptions typically include object storage.
Go provision 100 8 cpu vms let them sit at 99% idle and see how much it saves vs running at 80% utilization.
Go provision 30TB of amazon EBS storage and write 10gb to it, do they charge for the 10Gb? (my main storage arrays operate at about a 10:1 over subscription model and that approach has worked fine for me for a decade).
If you have a real solid handle on utilization and capacity requirements and ongoing capacity testing then public cloud can be good. Otherwise your most likely either going to be paying out the ass (previous company peaked at 500k/mo roughly 10x what was needed), or you will be having a lot of problems.
Certainly it is possible to "get it right", seems very few and far between though.
Here comes all the usual Regtard comments from people who have never released a line of code into production in their life. Let's preview what is sure to be the highest voted comments;
"I have never had a bug in my life. Anyone who has is a loser." - RegTardiusMaximus
"Trusting 3rds parties is ridiculous. BIGLY" - RegTrumpTard
"You are the product. YOU. YOU!" - RegTarden Little
"Here comes all the usual Regtard comments from people who have never released a line of code into production in their life"
I've released probaby half a million lines of code into production in the last 10 years. Some mine, some other peoples so you might want to give your nursery school attempts at patronising a miss.
""Trusting 3rds parties is ridiculous. BIGLY" - RegTrumpTard"
We had a major outage with an offsite VM supplier recently that meant our clients couldn't log in to our front end. Luckily we DO have onsight failover systems. But yeah, what the fuck do I know?
You prize ass.
Former Google SRE here. What folks don't understand is just how big Google really is. Google is figuring out how to do SRE as we speak. And they have so many applications that you cannot simply decree best practices throughout the stack.
The legacy system problem is significantly worse in Google SRE than in most places because the original work was done by sysadmins. These guys were and are smart and dedicated, but they had the skillset and mindset of sysadmins, not professional programmers. As such, a lot of the legacy software is unmaintainable, and requires deep or complete rewrites.
And the whole thing is just so big that you just cannot know where all the wtf's are lurking, let alone which ones are likely to bite you next.
So there will be major fails. What will be interesting to watch is what their incident rate looks like compared to AWS at similar points in maturity. I don't think they claim to be caught up already.
I'm sorry. Did I say it was "bad"? The industry as a whole is trying to figure this stuff out, and Google is naturally at the forefront of a lot of it & therefore making lots of mistakes. Mistakes that hopefully later entrants will be able to skip entirely.
The only thing I was trying to point out is that one of their earliest mistakes was to use the wrong people for the job of SRE. Again, this happened before they even understood that SRE was something that needed to be created, so it's not "bad" in the sense of "worse than other places", but merely "worse than someone whose not been there might expect".