'internet was not designed to support the applications that now rely on it'
10 days and only “a 20 per cent drop in volumes”.
Did he say how it ended up costing them $300m?
It's long been known that shipping giant Maersk suffered very badly from 2017's NotPetya malware outbreak. Now the company's chair has detailed just how many systems went down: basically all of them. Speaking on a panel at the World Economic Forum this week, Møller-Maersk chair Jim Hagemann Snabe detailed the awful toll of …
"...in the near future, as automation creates near-total reliance on digital systems, human effort won't be able to help such crises."
Its good that Maersk acknowledge this. Since the media often downplay 'Automation' risks and ignore how breaches, hacks and malware will factor into things. Overall, I suspect many corporations are still just thinking: "Its ok, I won't get that malware 'AIDS' shit, that only happens to others".
This post has been deleted by its author
I know that FedEx has penalty clauses built into contracts where they provide services. I have no way of knowing if they ever paid a 'fine' but I did hear that they had every available employee had sorting. With a revenue of $50B a cost of 0.6% would not be surprising.
Maybe if all these companies would listen to their security people and patch they could have saved most of that money.
"@ WPP 3 global AD forests, 1000s servers, dozens backup environments, 10000s workstations all encrypted. Networks are still wide open. They will get hit again."
I mean it's not as if WPP hasn't had these type of issues in the past with the constant stream of new companies/offices being taken on or consolidated into existing offices. Of the networks that were hit, some used to have systems in-place to stop this sort of thing however they were probably left to rot or unmanned and unmonitored while IBM waited to employ someone in India a year or two after the onshore resource was resource actioned. Or IBM have offered an expensive solution that WPP don't want to buy and neither side has the expertise to come up with a workable solution...
And there is some network flow data being sent to QRadar systems in-place now to identify issues but whether they would identify issues fast enough to stop senior IBM managers from making bad decisions is a different story. Unless it was a temporary solution and it's been removed pending agreement on costs.
Still, I'm sure WPP wouldn't want to upset such a valuable client in IBM by making public what the damage actually was.
I can't tell all the details as it is covered under many agreements, but as a person who was/is involved in this process with our managed service team (who was part of this recovery process as well) I can just say that there is no finger pointing.
There is a lot of constructive changes and solid plan which is being implemented how to lower the risk (you can't rule it out) of such incident happening in the future.
It was really exceptional to see how Maersk team handled it and how all involved parties (Maersk IT, managed service teams - ours included and external consultants) managed to pull it together and recover from it.
Also it is really exceptional how Maersk is sharing the knowledge about it and what happened and how they handled it.
We will be covering our lessons learned and experience from this event soon (next week) on our blog -> https://predica.pl/blog/ if you want to check it out.
That all the staff that pulled this off were well rewarded.
Because frankly that's a phenomenal effort that deserves it.
Annoys me that companies don't shout about how well their IT departments recover in situations like this. If they'd had a fire etc they'd be thanking those staff who helped PUBLICLY but IT is seen as a shadow department, we can't possibly talk about those people..
Their main IT support is via IBM so you can guess the chances of reward were between Buckleys and none.... (unless you were a manager.)
They had lots of heroes including the local techie who had the presence of mind to turn off one of the DC's once they realised what was happening - that saved their AD.
We'd heard bits and pieces of what had gone on during the recovery (usual stuff you'd expect - block switches, block ports until each host was confirmed a clean build in a segment then slowly remove the blocking/site isolation.) We didn't, however, see any emails publicly acknowledging their efforts.
"They had lots of heroes including the local techie who had the presence of mind to turn off one of the DC's once they realised what was happening - that saved their AD."
Hmm. Yes. I imagine the rebuild might have taken more than 10 days if it had included typing in a new AD from scratch.
Main challenge was to have data from AD to start recovery.
Main question here is - how many organisations have procedure for forest recovery, which is mostly logistics task with good understanding of AD as a service.
My consulting experience from last 20 years tells me that 99% of organisations doesn't have it and never thought about it as something which will never happen.
I had to recover an AD site once, it had only one PDC, no BDC's, but luckily there was a recent backup made of the PDC (Server2k3 and NTBackup).
Process was to reinstall Server2k3 on a clean server, run ntbackup to restore the AD backup, and we were back in business again. Only niggle was hoping that Windows Activation would go through as I was not in the mood to faff around with that - but it went through just fine. I then set up a BDC just in case, but still continue to make backups from the PDC juuuust in case.
And recovering the forest is no biggie as there's about 60 users - but a backup and BDC makes things so much easier.
But yes, forest recovery, especially with multiple sites and domains need to be addressed. Setting it all up from scratch by hand leads to errors and mistakes if due care is not taken.
This wasn't handled through standard IT support. IBM probably had it role there but it was mostly recovered by Maersk IT team, consultants from vendors (you can imagine which one) managed service teams like ours and external consultant.
Lots of people were working there on shifts spending time on-site for several days to make it happen.
I can't say about public e-mails from Maersk, but Maersk IT team is very open on what they did and how, I saw members of the team speaking about it on few sessions.
I said this earlier in this thread but just for the information - I can't provide official information on it, but our managed services team who was part of this recovery effort spending couple of weeks on site is cooking a blog post with details to be published next week on -> https://predica.pl/blog/ if you want to check it out.
Lot's of people working on it on multiple fronts. Lots of logistics - for example, you might be surprised how little USB storage pens are on the stock in the shops :).
One important aspect - no panic! Don't do things if you don't fully understand the scope and what hit you.
Besides technical details there are lessons from the management side:
- do not let people doing the work being bothered by people throwing random questions or issues, put some "firewall" to handle it
- good communication where are you with efforts and what is recovery time is crucial. Dedicated team for it might help you A LOT.
As I wrote in other replies here we will be covering it soon on our blog from our managed services team perspective who was on-site to help recover out of it for couple of weeks.
This post has been deleted by its author
I'm sure (!?) that after his "Road to Damascus" moment Mr Snabe also directed that a further $300m be invested in DR facilities and redundant systems while also doubling the systems security budget.
Back on planet Earth.....
Probably not.... now the IT department has un-stealthed itself as still having some staff left then the CEO is asking "how come these people are still on the payroll.... why haven't they been outsourced already?"
The company may have changed (I stopped working with the main IT supplier in 2007), but it was then a company that was aware how important IT was to them. Doing any work at their servers at headquarter, even when employed by a subsidiary, meant having one of their guys standing next yo you.
If they think it makes sense to invest further, they will do it.
- I hear a comment from an ext. consutant once: "we sat at this meeting, and as we talked hardware costs went up from 50k to 200k in 20 minutes - and the customer (Maersk) didn't blink".
Quote: That's a good question and IT should always be prepared to justify their costs.
Horse shit to put it bluntly
IT is the core system that lets the company function, no IT no company
Thats as true for the 20 man band I attend upto the mega-corps like Maersk.
If you are not treating IT as the core section of the business and consider it an incidental expense (like my manager did once*) then you deserve to have your company crash and burn
This should be drilled into every C level exec with a big mallet.
*His eureka moment arrived thanks to a dead PC containing 10 years worth of robot/cnc programs ...
IT the core system?
the one that sends emails that can't be replied to?
that says we're experiencing extremely high call volumes 24/7?
that turns employees into robots mindlessly reciting words on a screen?
the one that ensures your details can never be updated because the system won't let me?
I'm seeing a huge competitive advantage here, at a modest cost, resulting in a more effective and resilient organisation.
Quote: That's a good question and IT should always be prepared to justify their costs.
Quote Response: Horse shit to put it bluntly
Ker-wrong. They should still justify their costs but as a a core function it ought to be a whole lot easier. I'm pretty sure the story above said Maersk struggled but didn't fall over. Maersk is not IT in the same way that the IT companies with banking licenses are. Maersk moves metal boxes, not ones and zeros.
You would be suprised how much IT Maersk actually is.
When I started in the group I thought it was quite simple to move a box from A to B, but when you have a million or so boxes (then) and have to keep track of where they come from, where they are going, what is in them, who does it belong to (and this is "legal title to ownership" sort of serious), can it be loaded under deck, how do you stack it on the ship to make as few moves as possible (without making the ship tip over due to wrong balance) you really need IT.
This is before implementing rules such as "do not send ALU containers to certain places as the locals find it too easy to get into them".
spending the extra to get rid of the computers, thereby improving the quality of service looks like a good move?
no more robotic emails or dumb chatbots telling you they're passionate about customer service, you'd get to talk to someone who could actually do something. back to the future.
Good job guys.
Don't get company values such as 'uprightness', and 'humbleness' at many other places.
I'd left the Big M group years before this, but we were well into a outsourcing and centralisation program, pushing lots jobs to India. Despite it being as unpopular with the users as these things always are, we went full steam ahead on Helldesk and desktop support, but the plans went much further. We were pushed to an expensive US company for HP-UX, Oracle, and Navis for example, which didn't seem particularly smart since Maersk had plenty of domain knowledge and was big enough to support centralised internal teams. Can't say they didn't know their stuff though.
There was also talk of centralising port infrastructure until the speed, cost, and reliability of Internet/IPLCs in many locations was brought up.
This is where Infrastructure as code comes into play. If you can blow away the entire lot and stand back up an entire fresh set of machines where you have an immutable / declarative way of launching everything you could save a massive amount of time. Ok bare metal is a bit harder and getting up the base hypervisor layer if you're on-premise will take a short while, but after that IAC would save you a fair old chunk of time.
Terraform ftw, or Heat if you're an OpenStack shop.
You know, it sounds horribly bad when you first think of the work requried. But then thinking about it for a few minutes you can see how it could be done relatively quickly for deploying standard builds. It'd be interesting to know how they did it.
Doing a job of this scale, personally i'd think the fastest way of doing it would be to create a new (clean) desktop image via WDS, rebuild the servers from backups and then firewall everything but WDS and AD for joining PC's to the domain. Download the image to each server, and then send somebody around each site to ensure that every PC ends up reimaged and on the network with the correct network ID.
It's a big job, but far from an impossible one (as they demonstrated by doing it in ten days, although I suspect that they had a lot of tidying up to do such as installing odd bits of random software on PC's that wasn't in the standard build.
But he also warned that in the near future, as automation creates near-total reliance on digital systems, human effort won't be able to help such crises.
That's why when J. Lyons and Son bought their first computer and saw how many people it could replace, they didn't go live with it until a spare was installed ready to take over.
That was back in the 50s...
I think they didnt buy their computer - they builded it themselves. I think every IT person should be made to read 'A computer called Leo' so when they are wrestling with the formatting of some arseholes code in a spreadsheet they can ask themselves how the boys at the tea rooms had offices more automated than the 6000 PCs they are trying to manage, and did it on 6000 valves that wouldnt power the keyboard on the PC they're swearing at.
"Noting that the internet was not designed to support the applications that now rely on it"
What total nonsense and pseudo technical sounding waffle. The Internet performs exactly as designed, it transfers packets from one end node to another end node. The problem relies solely on the "computers" connected at either end.
What would be of interest to your readers is what was the dollar value in costs to Maersk in down time and the expenditure spent on employing people to reinstall all eight thousand plus "systems" and what indemnification did the vendors of the software provide Maersk in the event they were victims of a hacking attack.
"Snabe plans to ensure Maersk .. turn its experience into a security stance that represents competitive advantage."
What you need to do is run your "computers" off of 3½ floppy disks with the write-protected shutter in the enabled position.
Maersk wasn't the only outfit to cop a huge NotPetya bill: pharma giant Merck was also bitten to the tune of $310m, FedEx a similar amount, while WPP and TNT were also hit but didn't detail their costs.
Hmm isnt TNT a subsiduary of FedEx
So the FedEx numbers are TNTs numbers
"2016 - On 25 May, FedEx completed the acquisition of TNT Express." from `https://www.tnt.com/corporate/history