Backups
Is this article saying that they didn't have backups?
The British Library is denying reports suggesting the recovery costs for its 2023 ransomware attack may reach highs of nearly $9 million as work to restore services remains ongoing. The institution said in a statement today that the final costs remain "unconfirmed" and no additional bids for funding to support the rebuild have …
You can have all the backups in the world - if you were compromised and then backed up, what do you think you're restoring apart from a known-compromised system?
Sounds far more like they are having to toothcomb everything they restore onto a fresh system, which can take forever for even a basic network.
Imagine I told you to reinstall all your servers without using backups (except for raw data) or your ready-made images? Or even connecting any server that hasn't been freshly-rebuilt.
Now think about things like SQL functions being compromised.
If you didn't notice you were open to compromise, don't know how you were compromised, and only know that the system you were backing up was definitely compromised, restoration is definitely not just a "click the button and cross your fingers" type of restore.
If anything, it's condemn every system you have immediately, back them up (what new data may or may not be required that hasn't made a backup yet?), wipe them out entirely, restore piecemeal and then try to rebuild every OS, service, database, configuration, integration, etc. from scratch without EVER letting the old machines back onto the new network directly.
Not necessarily, but it is possible. They may be trying to rebuild something better rather than restoring exactly what they had before, or they might have to rebuild something different because they don't have some of what used to exist. From the statements in the article, I don't think we can know for sure whether either of those apply. Similarly, they may have restored a lot of the content from backups but want to recreate all the systems that handled that content from scratch, which would certainly add to the recovery time. It's often not as simple as did they have unaffected backups yes or no.
"This time [...] with some security" is not something you can just buy. You have to work on doing that better than last time, and that takes time and effort. Insurance covering the costs depends on whether you had insurance that covers that, which not everyone does, and doesn't necessarily shorten the time to recovery; if the insurance lets you hire the most expensive consultants and as many of them as you want, then you can cut down on implementation time to some extent, but it usually doesn't let you do that. Even if it did, there comes a point where adding more people won't speed up the process anymore.
If the infiltrator is effectively nation-state backed and the motivation is likely more political with a side offering of cash then the damage can be massive. Especially when you don't know how long they've been in your systems or potentially how they got there.
As a (very) small-time sysadmin, I have trouble understanding why so many very large organisations are so hard-hit by ransomware attacks. Sure, the exfiltrated data is gone, nothing you can do about that. But what about service restoration? Is it really that hard to rebuild a server infrastructure and recover/restore data at least to a certain point?
I know, there's always the odd backup that didn't actually back up anything since the last twelve months, but that should be the exception. Am I the only one who believes in "If you haven't tested restoring, then you do not have a backup"? What's with multi-level, offline or write-once backups? Do they not have incident response and disaster recovery plans?
I would really love to learn more about the detailed problems they're battling. I can't just put all of this down to incompetence or negligence. Are modern infrastructures simply built in a way that makes recovery so hard? Are they all saving so hard that someone has to get the ten-year-old DR plans from the proverbial filing cabinet in a locked bathroom stall in the basement?
Much of this is down to cost.
Capability directly links to cost.
Let's say that you have stuff locally backed up to disk and it is still viable.
Do you delete everything that is encrypted and start again? You can but then you have to restore and validate everything to the point prior to encryption because some stuff will have survived will now not match stuff that was restored from a week ago. That in itself is a "crank-the-handle" exercise but you need the capacity to process and write the data.
Could you restore 500VMs and a few PB of data in 24 hours?
Highly unlikely.
Even if everything was in the cloud where you could expand resources to speed up the process there are still limits.
If there is an air-gapped copy in a cloud service or on tape then you are limited by what that solution can deliver. Bandwidth can be bought but then you need hardware to use it.
There will also be all the due diligence needed to make sure the source has been found and neutralised. It is no used restoring if it simply recurs because the source had not been sufficiently understood.
Let's say you have a server with reliable weekly backups. The server has been infected with ransomware and cannot be decrypted. The last four weeks' backups were encrypted as well because the operators watched you and determined that you do a test of the backup tape every month, so they waited for you to do one, corrupted the backups for the next month, then went through with the full attack. You can't restore any of those, but you can restore the one from five weeks ago. However, if you just hit the big restore button like you would do if the disks had failed, you'll get the server image from that time which still has their malware on it. So you may instead have to recreate a new server and carefully copy only the data back onto that server. Then you have to do something to recover last month's data, which could mean using some incremental backups you have, recreating from other sources, or dealing with unrecoverable data. Deciding which to do and actually carrying it out requires someone familiar with the system and someone familiar with the data, likely not the same person, and some time for each to evaluate the situation, determine the best method of recovery, and carry it out. Carrying it out may require more people to spend time doing so. You also have to make sure that the malware won't be able to reinfect the new server when you have it running, so you'll need to make some changes. I imagine you understand all these actions.
Now you have ten thousand servers, and they're not all the same, and many of them aren't just standalone servers but various types of infrastructure, from networking equipment to functions that get resources provisioned automatically by your datacenter VM management software or your cloud provider. Most of them don't do anything on their own, but work in a big cluster of other things. The data on the resources covers everything your company used to use, so you need many more data experts to determine how to recover it. The scale of the recovery effort isn't linear. Fortunately, your team is likely bigger, but that only goes so far.
In such a situation, it often comes down to luck. Something may have evaded the attack because it was better secured, because it didn't work the way the attackers had planned, or for many other reasons. It can also be a great opportunity to change the systems. I usually have a long list of changes that would probably be good to make, but we don't do it because everything's running right now and making big changes could break something. Now that everything is broken and we're rebuilding from scratch anyway, it might make some sense to make improvements so that the new version is better. That adds delays as well.
So does EVERYTHING run on the one machine or network? It appears the ransomware attack took out absolutely everything! Surely the public facing website wasn't connected on the same LAN as every other device? I can see they may have hit a web server and got access to SQL, but the entire business??? Who "designed" and implemented this setup? I do hope this £9M is being spent on a different IT provider to the one that has already screwed them over with their original design, implementation and support fees.