> Running infrastructure in the tropics has its challenges – but so do failed disaster recovery plans
So the plans were Singapore then...
Outages at two banks that stopped 2.5 million payment transactions were sparked by a technical issue with the datacenter's cooling system, according to the Monetary Authority of Singapore (MAS) on Monday. DBS and Citibank, the banks involved, experienced outages in the mid-afternoon of October 14, 2023 that resulted in full or …
Been to many "unmanned Datacenters", when i say unmanned you just deal with a security guard who's job is just that, nothing more, that's the only monitoring that is going on. Servers could be cooking for all they know or crashing. I have walked past racks that have been bleeping for weeks or red fault lights blinking away till the led gives up.
I get your point but the DCs should have correct monitoring for those possibilities. Also inert gas fire suppression isn't compatible with meatsacks.
I once got a bollocking for being in the DC of a catalogue based high street retailer. They didn't know we were working in there. They also didnt think to question how their security had let us in tho.
Backups that fail are a common problem in the industry. Backups get maintained but often not tested for fear that they will fail. Chernobyl disaster was reputed to be the result of a failed backup test. I personally crashed a system by switching over to an incorrectly set up backup. Luckily I was able to switch back after a few minutes. Nobody noticed except for the maintenance man and I. My boss never even knew about it. Whew!!
In the old days when I worked graveyard on tymnet betore the internet I was rotating packs every other night from various errors. A pair running and spare on the shelf. I loved that job, still remember all those people, and had I kept that job I would've dodged being sucked into relationships because I wouldn't have had any money and so not suitable to bear a lot of responsibility for other bloke's children to then be kicked to the curb later after being used up just like the other bloke.
... but technologists, business managers, and governmental employees are not the only ones to blame. People in general seem to have become, in some respects, "stupidized" regarding backup systems, manual alternatives, and the need for them. People no longer demand systems reliability and resilience.
Two contrasting incidents:
(1) I was inside a major retailer's store whose power had failed. The building had a very tall roof, and large windows to let in sunlight. It was late afternoon, but people could see well enough to shop, even with the lights out. The electronic cash registers didn't work -- there was no backup electrical power system for them. They had failed over to a manual system: Each checkout operator had been equipped with a battery-powered adding machine, with printed paper tape. The store had no electricity, yet it was still doing business!
(2) I was visiting a technophile couple (they both work in technical jobs for a major CPU-and-other-chips manufacturer). They own a Tesla automobile. The wife was moaning to us about how a friend of hers had suffered a smartphone GPS failure while driving around the city, and had became quickly and completely lost, and how horrible and devastating this was for her friend. I asked, "Why didn't she just use the map from her glove box?" His indignant reply: "Nobody uses maps any more!"
In the megalopolis where we live, there are clear, legible street signs on every corner (though a few have been stolen or mis-pointed). They work even for persons lacking maps.
I couldn't think of a reply that did not include the words, "fucking", "arrogant", "incompetent", "absolute", and "ninny", so I said nothing.
(Mine's the coat with a plasticized city map in its pocket.)
I have to say I'm impressed with the response from the regulating authority. Instead of slapping a meaningless financial penalty on the bank (which in the end is paid by the customers and low-level employees anyway), they basically ordered them to stop playing around until they have fixed the mess.
Typically it's the other way around; they get a massive fine, and in response, close some branches and fire part of their workforce.