Someone is going to get their ass kicked
No comment
Did the CrowdStrike patchpocalypse knock your Azure VMs into a BSOD boot loop? If so, Microsoft has some tips to get them back online. It's believed that a bad channel file for CrowdStrike's endpoint security solution Falcon caused its Sensor active detection agent to attack its host. That's caused Windows machines around the …
Or because your Windows updates were migrated over to your MSP by the powers that be, where you lost all control. Despite asking for filtering you get told "No, as then we'd have to do it for all companies we support" and then when asked "Well do you at least delay the updates a few weeks to make sure they're tested?" and told "Yes" but this turns out to be a lie.
Luckily we never got hit.
Whenever you're given a really, really stupid order, ask for written confirmation. Saved my behind when an order to put a one-line patch live without testing brought down a key government system for three days.
(The patch was fine. It was the undocumented dicking around in unallocated memory that it unintentionally trampled that was the problem. But at least it found it).
some minor flunky maybe. But some failed senior manglement will be kicked upstairs and one or two will get incredible golden parachutes for saving $5K and costing billions. This, after ElReg story about $MS being called inadequate by , of all things, USA bureaucrat.
Or not.
No such thing as corporate responsibility.
"Restore from backup" implying no-one has any actual data in Microsoft PCs. Probably generally true. Keep your actual data on Linux file servers, db's, and the cloud which usually is some flavour of Unix + no you can't autoupdate me to death.
Pity the folk with Window nodes in the cloud or sqlserver.
good luck booting in to safe mode for VM's in Azure! A big shitty issue with VM's in Azure is the lack of a proper console you're stuck with the serial console when things to titsup at bootup which is pi$$ poor! On prem with esx, hyper-V, AHV or whatever hypervisor you have, and you have a connection issue to a VM just bat up the console
I heard deleting the bios resolves the problem. If that fails remove all physical drives then submerge it in water, 30 minutes should do the trick. Only use fire as a last resort.
I jest ofc but I do love the old turn it off and on again. The multiple and up to 15 times just makes it even crazier. I feel sorry for those that are having to deal with this right now. Who would have thought an update pushed out on a Friday (or just before) could cause so much chaos? When will humanity ever learn not to do this?
> The multiple and up to 15 times just makes it even crazier.
But does make sense. The theory is that eventually the network stack gets enough time before the next BSOD to have updated to the latest files, which don't trigger the BSOD.
I can also imagine this very much depends on the machine's internet connectivity being very low latency and very high bandwidth.
"The multiple and up to 15 times just makes it even crazier."
It's how you get them off the phone, and, if the winds blowing in the right direction, might even fix the issue. Asking them to do it 15 times in a row is a sign of desperation where the extra few minutes of a re-boot isn't going to buy you enough time. If you still need more time, tell them to start again, but 20 times this time :-)
The problem here is that people who own/lease Windows systems, including VMs, are automatically updating software that can bring those systems down. This automated updating system saves tons of money in payroll expenses, but a simple, "Let's try this release on a sandbox and see how things work before we roll it out to every system" would be prudent. If you are an airline, and airport, a healthcare system (NHS, really?), or any other system that must run continuously, you shouldn't be trusting your vendors to be perfect. I understand that business want to save money, but until businesses that own software take some responsibility for maintenance of that software, this kind of problem will continue to happen.
And no, having AI check out the software is not a solution. :-)
True but places like the NHS, you have people in charge in IT that won't listen to their engineers. They appear to think they know best. Like the directors who took bribes from HP back in 2008 to go with HP laptops. The managers who ignored my requests for a HDD crusher of our own, then there was a big breach with sold HDDs. Its always someone else fault when things go wrong, but their idea when things go right so they can get their bullshit promotion.
Peter Principle comes to mind.
Yet on other topics people are screaming that systems are not updating. The trouble with AV, particularly a nebulous cloud based pile of shite is you have no control. That is what a "modern " solution is sold on. Always up to date, no need to interact or manage stuff.
"Let's try this release on a sandbox and see how things work before we roll it out to every system" would be prudent."
That can only work in some circumstances, definitely not all. And how long would you test for, 1 day, 2 day, 1 week, 1 month not all bugs show up immediately.
It's simply not viable to test all platforms in all languages for all the software in your systems. In fact it's near to impossible. Most outfits simply don't have the material, the time or the need.
It's not up to "most outfits" it's mega Corp pushing to everyone and not monitoring outcomes.
This looks to me like: push stuff and don't test the result.
Not humans but software that does that.
I'm suspect crowdstrike didn't think of testing at all, and don't have a mechanism to test the results of an update. I know of 2 similar botched updates by crowdstrike. They struggle to fix and they themselves dont know when it's happening and don't have a kill switch for a bothched rollout that's ongoing.
As we now all know: their internal testing before rollout is piss poor too.
Try it out first does not sem ti be a option for crowdstrike. The struck off all our laptops about two ago. Pushed an update that até 100% across s the company and some how neither our It not Microsoft have the perms to kill the thing so we could work.
2 days to fix. Then this.
Ie Crowdstrike issues are sistemic. Move off ASAP if you can.
You dn t need AI to check something is wrong. Recent crowdstrike bork had crowdstrike.exe at 100% cpu.
We deploy code all the time on our own systems and don't miss something obvious like that.
Neither should mega Corp on consumer hardware. It's basic QA. I suspect negligence here rather than the one messy update excuse they are peddling.
Monitoring is not something technically difficult to do with simple code, but while on the subject: AI can be better at pattern recognition and find things humans might miss. Eg cpu spike 37 mins after deploy.
Crowdstrike have not got as far as if (cpu >= 100)
I suspect the6 don't have any mechanism to test rollouts. Spendimg their money on cutting costs and scaling up rather than quality.
Looking at it more objectively there are several possibilities:
1. Very little is deployed on Linux
2. Simple luck that it was not affected.
I am always stunned at how few people use AV on Linux server or desktop and Mac. They are not invulnerable and the attact vectors are moving to software not the OS.
" ... That's caused Windows machines around the world to become even less useful"
Meeeow !!!
:)
P.S. I wonder if the person responsible for the $12.5bn (as at market opening) slipup is still working for Crowdstrike !!!???
P.P.S. Good time to buy Crowdstrike cheap as they will recover because there is no quick/easy/cheap alternative !!!
Also good time to get into Coffee, Pharma Companies (particularly Migraine Meds) and Keyboards (lots are going to be worn out !!!)
If this market play pays good, please let me know !!! :)
Yes. If you have incremental backups you'll be fine. But my point, which I maybe didn't make plain, was that just restoring a pre-1900hrs backup will, on its own, take you back to the state of work then. It will take further action to recover subsequent transactions, whether by restoring incremental backups or by manual re-entry. If, however, data was entered, say from a website, and there was no separate backup of that then it will have been lost. Is there a chance of recovering information from email acknowledgements or did the restore overwrite outbound emails? The system may even start handing out order numbers duplicating those issued between backup and outage.
Restoring a backup image is only the start of getting back.
This is why you have a separate disk for the O/S, and don't allow any application to put any data there.
We have successfully restored a number of Windows IIS and SQL servers today, by just rolling back the O/S disk to las night's snapshot. The data disks were not replaced, and so they are still current.
Us too. Some of ours are in Azure as SaaS (managed instances of sql server and a bunch of small sql dbs) and the rest are on-prem (application and database servers)
Azure ones survived and a few in on-prem died. Yet to check why they survived but the dead ones were revived by restoring the OS disk. Worked a treat.
Kudos to the consultant who did it that way.
"First, if you have a backup from before 1900 UTC yesterday, just restore that. If your backup habits are lax, then you're going to have to repair the OS disk offline."
Fortunately, I'm long since retired. And, I quit using Windows a LONG time ago after concluding that the system was far too buggy and poorly documented for serious use. And in any case, the idea of automatically loaded updates, has always seemed quite Utopian to me. I mean, like what could possibly go wrong? Aside from supply chain attacks? And quality control problems in agencies you have no control over? And a huge exposure surface for sophisticated national agents to attack if (likely when rather than if) international tensions boil over.
An accident waiting to happen if you ask me.
But, no matter. I do have a couple of questions about this particular ... ahem ... "situation".
1. If your system, virtual or real, is stuck in a boot loop, how the heck do you load this here backup?
2. Are you going to lose all the transactions entered after the last backup? Isn't that going to be a substantial problem for many businesses/organizations? After all, a lot of outfits purportedly use their computers to sell stuff, and/or buy stuff, and/or to keep track of things like attendance, work hours, medication lists, nuclear warhead inventories. Mundane stuff like that. Of course, if the computers are only there so the bosses can play Solitaire and send emails between important phone calls/meetings,maybe it doesn't matter all that much.
> 1. If your system, virtual or real, is stuck in a boot loop, how the heck do you load this here backup?
You say you've been quitting M$ and use something else, and I am astonished to read this. Ever heard of booting of Floppy Disk/CD/USB/network and just restore?
> 2. Are you going to lose all the transactions entered after the last backup?
Are your data partitions for your databases = system partition? You don't even need a separate (v)disk, just a separate partition is enough. Just a few years ago you had your drive A: for booting the OS, and drive B: for your data (What is a C: ? Oh, that newfangled stuff, I read about it recently, won't be here for decades...)
Don't you remember? They weakened the separation for Windows NT 4.0, for performance reasons. They should have tweaked the disk caching mechanism though. It is still as bad as it was back then, there are just some mitigations active so the cache does not take more RAM than free (and excluding the SWAP in that calculation).