I doubt a real intra-day DR test would go that well in the real world!
BOFH: The company survived the disaster recovery test. Just. The Director's car, however...
BOFH logo telephone with devil's horns "So what happened yesterday?" the CEO asks, not looking too pleased. "It was an unauthorised action by our contractors," our Director says, nodding at the PFY and myself. "Or a miscommunication," the Boss adds, pouring oil on the troubled water. "We were just following orders," the …
COMMENTS
-
-
Friday 18th October 2019 13:45 GMT Alchemi
I had a counterpart at smaller community bank in the early 2000s tell me that a Sr VP came down to the datacenter on a Friday, mid-morning to test the "DR" capabilities as he understood them, without consulting any technical person. He flipped the master breaker for the datacenter and most systems went down immediately with the rest failing after a few minutes, them not having facility UPS. That was the last cycle that breaker could manager and they couldn't get it switched back on. It took about two hours for the electrician to get there and get that sorted back out, just enough time to be down for a lunch rush on a Friday. Good times were had by none.
-
Friday 18th October 2019 18:26 GMT Mark 110
Wow. What a fuckwit. Hope he got sacked.
Business managers / directors kinda glaze over when I try and explain DR. They think the techies have a magic failover / fail back button not realising that you might be a year getting everything back to production if you really really really failed everything over.
Critical systems (bank mainframes for example) get failed back and forward regularly. Everything else relies on the big SRM button you hope will work when you need it.
-
-
Monday 21st October 2019 06:38 GMT deadlockvictim
Re: oh no they don't
Proper DR testing is eyewateringly expensive and requires an insane amount of planning, documentation and coordination, not to mention training. I don't blame this bank for postponing it.
Unless you have invested heavily (and properly in DR), I expect it to be time-comsuming, criminally-liable and involve the loss of much data.
-
-
-
-
-
Sunday 10th November 2019 05:21 GMT Marshalltown
Heh
A former employer/boss/chief source of computer virus infestations, etc. Walked in to find his C:\ out of commission (this was in the '90s). He yelled for my buddy and I to get in there and fix things. We looked things over, noting a small burned scar on the circuit board that made the drive look as if it had been hit by a micrometeor. Impossible of course, but still cool enough to hang on to later as a paper weight. We shook our heads and said the drive would have to go to some more rarified, far higher-paid specialist than us peons to recover any data, The best course would be to simply replace the drive and restore everything that could be restored from backups made the day before (the boss's own policy). Hemming and hahhing ensued. Finally it is revealed that the boss himself, source of the Tuesday/Thursday back up all files directive (had to be done to floppies, a tape drive or drives was too costly) had neglected to follow his own directive - ever. None of the rest of the worker bees was seriously affected, but he was out a month's work, plus files related to closed jobs. Happily we had paper copies of all reports archived. But no email, no electronically stored notes. Of course he later also once returned from a trip to eastern Europe with a floppy disk "utility" that "backed up" all(!!!) the office hard drives. He never bothered to inform the guys (my buddy and I) about this procedure and our first notification was a viral plague on every machine in the office except the print server. We spent a day cleaning things up and (we thought) locking things down. Next day, same plague is raging once more.
Then the boss gets concerned about lost work and possible corrupted files and possibly the spread of the virus TO his immigrant disk. We asked what it was and he explained he was backing up all the computers at night after we all left. Eyebrows tangled in hairlines, we asked where he was storing all that data. Why, on the floppy. Ah, had he ever restored any data from the floppy from one hard disk to another? No. He had been very carefully installing a virus over and over on computers he (he was always happy to point out) owned. Because his eastern European "friend" had told him what a wonder program the "utility" was. Careful examination revealed the floppy was THE source of the virus. We ceremonially degaussed it and then chopped it up with a paper cutter. We then asked that he never ever buy "magic beans" again without getting a second opinion.
-
-
-
Monday 21st October 2019 12:24 GMT IT's getting kinda boring
Must be a VP thing then.
Had one who did the very same thing but on an aircon unit. Pushed the button, the aircon shut down, and nobody could work out how to start it up again.
4 hours later, when the aircon engineer turned up, all systems were down due to overheating. He rotated a small (yet barely visible) collar on the button which released it and the aircon burst into life.
For those of you wondering - this aircon unit consisted of a bloody big fan (and I mean *big*) blowing air into the machine room. Not redundant. Go figure. This was back in the mid 80's.
-
Monday 9th December 2019 20:27 GMT rototype
Not just VPs but any managers can be susceptible... reminds me of our office/server room (this was at a private school by the way) that was about 6 x 15 feet and the "aircon" was 2 pedestal fans pointing in different directions in front of the windows (one blowing in and the other blowing out. Most mornings in the summer I had to clear the drift of dead midges of my keyboard. Didn't stay there long and took a pay cut to leave and was glad of it.
-
-
-
Tuesday 22nd October 2019 20:27 GMT C_D
Seems a bit unreal... I mean what kind of REAL disaster doesn't have at least a couple of fatalities - Boss, Director at least if not the CEO?
A few beancounters electrocuted?
Some helldesk personnel locked in the bog?
Bunch of auditors taken on a roller coaster ride in the Lift from Hell?
-
-
-
-
-
Wednesday 11th August 2021 22:07 GMT BobTheIntern
Re: Very Thorough
Yank here, arriving fashionably late to this thread...
I was intrigued by the long building with "600 gasbags" by the River Thames, but wasn't finding anything by searching with any combination of those terms. Since I had the geographical hint that it was just down the road from Thames House where MI5 is housed, I opened Google Earth and virtually circumnavigated myself across the pond for a virtual look-see.
MI5's digs were easily located, and I then began to cast my eye north and south along the riverbanks, searching for a long building. The only nearby structure reasonably fitting that description was Westminster Palace. But where would one expect to find "600 gasbags" amongst the trappings of royalty and national symbolism? The penny dropped as I zoomed in closer and saw that the Parliament of the United Kingdom (and more specifically, the House of Commons) is ensconced upon the Palace grounds. Gasbags indeed!
This is of course fairly common knowledge to a majority of residents of the Commonwealth realms, but is not as well known in the States, and thus why I found the reference unfamiliar. Now if someone would like to explain how to quickly understand the true intended meaning of Cockney Rhyming Slang, I would be forever indebted.
-
-
-
-
-
-
Friday 18th October 2019 20:00 GMT J. Cook
That reminds me of another story...
[RedactedCo] actually did have what was called a 'business continuity' event last year, when we had a bit of a flooding occur at one of our sites; seems the business next to this site had poor drainage control, and the storm that occurred overran their drainage canal, and proceeded to flood the lower levels of our site. This had the effect of destroying the power substation for the site, completely flooded the sub-basement, and a good portion of the ground floor. The flood also damaged one of the sewer lines which added to the mess. the building was knocked completely offline (no power, no network, nuthin) for about a month. Fortunately, most of our stuff came back up without too much hassle, but still... woof.
Unfortunately, I wasn't allowed to use that as an excuse to get out of performing our annual DR test that year.
-
-
Friday 18th October 2019 23:10 GMT J. Cook
Re: That reminds me of another story...
Of course, but that means having spare equipment ready to go when the 'testing' is over...
I'm just glad that none of the flood waters got to the server room of that site- I have zero desire to try and clean server/storage/network gear that's been flooded with poo water...
-
-
Monday 21st October 2019 05:31 GMT TimTheEngineer
Re: That reminds me of another story...
... and the consequent discovery that brand new shiny also means upgraded OS across the board, which in turn means application upgrades ... none of which had happened in a timely fashion in the past because it was varyingly too difficult / not worth the expense / name your excuse du jour.
eBay as a DR strategy...
-
-
-
-
Saturday 19th October 2019 07:43 GMT Anonymous Coward
Re: That reminds me of another story...
Over 20 years ago I witnessed a creative use of a business continuity plan where the customer's financial system AS/400 was not up to crunching the quarterly reporting workload in a timely manner to avoid effecting daily operations. Whilst awaiting delivery and commissioning of a more powerful AS/400, what the newly signed up managed service team did was to call in the DR service with their systems on the back of a truck. Overnight, cables would be run into the building to hookup the network and then the tapes loaded after end of day. The DR systems took over, ran the reporting and stayed online for a couple of days to continue as a DR exercise.
-
-
-
-
-
-
Friday 18th October 2019 22:00 GMT Doctor Syntax
Re: "The building was on fire, and it wasn't my fault."
"Having full backups in a separate firesafe building saved us."
I believe the relevant safe manufacturer liked to tell the tale of the Co-op building fire in Belfast. The fire safe fell through several floors and landed not really damaged but jammed. A PHB decided he couldn't wait for the manufacturer's locksmith to come and open it and got someone to cut it open with a torch. The contents were unscathed except for those damaged by the torch.
-
-
Friday 18th October 2019 14:08 GMT Terry 6
There's a solid point there too
Disaster recovery planning, where I've met it on a couple of occasions, seems to be based on the ideas that a) it'll be something obvious b) best not to plan for anything too difficult, especially if that would cost money and C) won't actually happen we just need to have lots of documents to show the brass.
In real life, when something's happened ( and some things do happen) it wasn't, and they could have
And each time (I've met 2) recovery was a matter of luck ( the water didn't reach the main documents) and thinking on the hoof ( maybe we can take all 350 kids across to....).
But they really could have planned much more for both. And yes fire evacuations are a joke. Except when it's a genuine false alarm - because then you learn an awful lot. Like when some toe rag in a school hits the fire alarm button and you suddenly find out that the expected orderly evacuation becomes confusion. ( And it often does!)
-
Friday 18th October 2019 14:38 GMT AndyMulhearn
Re: There's a solid point there too
I did some work for a small Japanese bank in the 90s who'd not long done a DR test one weekend. Shortly after the exercise started a number of staff arrived at the office to get their "what to do in the event of a disaster" books from the office that had, err, been rendered unusable in this simulated disaster. You know, the books that they're supposed to keep close at hand in case of a disaster.
Idiots.
-
-
Sunday 20th October 2019 21:55 GMT Terry 6
Re: There's a solid point there too
Sorry, there are some particular things..
*Where do we take the kids to for immediate evacuation and
*Where can we house a temporary primary school for a day/week/month until we get the building back.
And yes I've been involved with both. A gas leak in an old Victorian building for the first of those.
-
-
Friday 18th October 2019 17:51 GMT A.P. Veening
Re: There's a solid point there too
And yes fire evacuations are a joke. Except when it's a genuine false alarm - because then you learn an awful lot.
Experienced that once and it went flawless (surprising just about everybody). Based on the results of the false alarm, the fire evacuation drill scheduled for a week later was cancelled ... by the head of security.
-
Friday 15th November 2019 16:19 GMT Loyal Commenter
Re: There's a solid point there too
...We had a bomb scare in our building some time back.
The receptionists spotted a suspicious package that someone dodgy looking had crept in and left in the lobby when they were away from their desk.
They duly hit what they thought was the evacuation alarm, which unlike the fire alarm, didn't release any door locks (including those on the fire escapes).
Cue the occupants of the building proceeding to evacuate through reception, the only way out of the building, unknowingly past the suspicious package. (We weren't told until outside why we were being evacuated, or what the single-tone alarm meant)
Fortunately, it turned out to not be a bomb. I think it was some stolen goods some toerag had stashed there thinking it was a good idea.
It also highlighted how thoroughly useless the local plod are at dealing with bomb-scares; they didn't turn up in the hour I spent waiting outside the building before giving up and going home, and apparently took a couple of hours more to send someone round to take a look, who promptly picked up the suspicious bag to take a look.
So, in summary, fails all round.
-
Friday 18th October 2019 22:12 GMT Doctor Syntax
Re: There's a solid point there too
"Except when it's a genuine false alarm"
I worked in a completely glass-clad building where one of the other tenants got occasional bomb threats causing the entire building to be evacuated. Manglement's idea was that we would evacuate through the nearest door and if that was the back door we would walk round the end of the completely glass-clad building and congregate in the front of it.
I made the point that I'd worked in Belfast and in building that had had a genuine car bomb delivered to it and although I wasn't there at the time I'd heard first hand reports. There was no way that I was going to walk alongside the end of their completely glass-clad building with a suspected bomb in it. If I left their completely glass-clad building I would continue in as close as possible to a straight line perpendicular to the frontage as far as possible away from it because a bomb in a completely glass-clad building is going to project potentially lethal shards of glass for considerable distances.
-
Monday 21st October 2019 11:01 GMT TRT
Re: There's a solid point there too
Had exactly the same thought with the front of Euston station. And I wouldn't get close in to the building as although shards might go over the top, full panels of safety glass suddenly changing shape and falling free of their frames can come down like guillotines.
-
Monday 21st October 2019 13:50 GMT Clunking Fist
Re: There's a solid point there too
Our local school got some learnings from a simulated earthquake event (NZ). Prior to the sim, all the children must take their shoes off because the school had new carpet. 600 children trying to find both shoes in an orderly and timely manner resulted in a change of heart: bare feet and broken glass don't mix well. It prompted the school to improve the drainage of the playing fields instead.
-
-
Friday 18th October 2019 14:13 GMT 2Nick3
"We considered it, but it's not much of a disaster if everyone knows when it's happening"
The number of times I was asked to take "special backups" to be prepared for an upcoming DR Test, wow. I used that same line to explain why I wouldn't do it. I was forced to agree once, but escaped by following the procedures to the letter - a "special backup" would require a "special restore" to be added to the Disaster Recovery Plan, which in turn required following the Change Management process to make the update. The requester was not willing to put their name to the request in writing for some reason...
-
Sunday 20th October 2019 03:05 GMT JimC
Re: asked to take "special backups" to be prepared for an upcoming DR Test,
I don't have a problem with that, just so long as the special backups are properly identified and don't compromise the BAU in action. I don't really see it as being different to taking extra backups before a destructive migration. If you actually need to use your special backups to recover from a DR test then you've identified a major problem with BAU DR without hopefully having too badly compromised the service. But if you had the same problem and you didn't have the special backups then the problem has become a crisis.
-
Friday 18th October 2019 15:51 GMT Dave 32
Disaster
Our last Disaster Readiness test wasn't really a test. We have this wonderful building, complete with two transmission line feeds from the utility, along with a huge bank of generators, just to ensure that we don't ever lose power. Well, one day, not too long ago, two of the three phase power feeds dropped, leaving one phase energized. Sadly, the sensor for switching transmission lines was keyed to that one phase that was still live. Even more sadly, the sensor for starting the generators and switching to them was also keyed to that one phase which was still live. The net result is that about two-thirds of the mainframes in the building went down. Whoopsie! Most of them had redundant power supplies, which were plugged into the two phases which went down. Whoopsie! Oh, well, with 2/3 of the lights off, the building became a nice place to take a nap, especially without all of that obnoxious fan noise.
-
Saturday 19th October 2019 12:39 GMT ColinPa
Re: Disaster
I heard the tale that there was a power outage as the CIO was visiting.
The sysprogs sighed and went into their practiced routine of going to the backup site. The CIO said we have these generators in the car park - you should do a restart in place - no buts, just do it.
So they started the generators and started bringing the systems up - only to find the generators did not have enough power for the machine room. They were stuck half up, half down until the mains power was restored. As they could not start, nor stop the systems. they were not able to fail over. Instead of a 20 minute outage while they switched to the backup site - they were down for 3 hours. At the post morten the sysprogs said "we told you so".
-
Friday 18th October 2019 18:35 GMT Mark 110
Directors and DR
Directors should not be allowed anywhere near DR planning. As all they do is say we want these apps back in this order.
It doesn't work like that. Though I guess we could pretend the active - active stuff wasn't still working so it looked like their pet apps came back first.
-
Friday 18th October 2019 18:53 GMT I ain't Spartacus
Re: Directors and DR
I was hired to cope with a failed DR situation. Turns out my large US multi-national had not been backing-up their finance records here in Blighty, and I was one of the junior bean-counters brought in to recover the 3 years of lost accounting data. Oops! Is this the point to mention the HMRC inspection - where I was madly printing the missing paperwork - and my boss was ruffling it up, adding extraneous staples and coffee stains, so I could take a plausible looking file of our invoices to them?
Obviously a new head of IT was required, as it transpired that no testing of the backups had been done. And it was felt that this might, perhaps, have been a mistake...
Due to space reasons I got shoved into the IT office, not finance - due to the extra staff they'd just had to hire to recover said accounts from paper, guesswork and phoning people up to get copies of stuff. And I over-heard the new IT manage shouting at someone because one of our sites had a bunch of extra networking gear and was hugely over-provisioned with data connections. Terribly wasteful!
Until this person said that this was our Disaster Recovery backup site if HQ burned down. New IT manager had been in place for at least 6 months by that point - but apparently hadn't bothered to read his DR plans. I mean, what could possibly go wrong?
-
Friday 18th October 2019 22:29 GMT Doctor Syntax
Re: Directors and DR
"it transpired that no testing of the backups had been done"
I had a gig replacing a pair of non-identical servers prior toY2K on the basis that the smaller on, the warm standby, wasn't Y2K compatible. They did, in fact, take nightly tape backups and put them in as remote as possible whilst still on-site fire safe but the backup they were most relying on was a nightly network copy of the database from the live server to the standby. This clearly hadn't been tested recently is at all because I discovered that the time slot wasn't long enough and when the server kicked into production mode the copy process got terminated.
-
-
Saturday 19th October 2019 13:28 GMT js.lanshark
Preparing for planned surprise outages
A test of our capabilities is mandated by regulation on a somewhat scheduled basis. It is supposed to be a "surprise inspection", but somehow we always managed to know about when (down to the week) it was going to occur. We would then go to 12 hour shifts to prepare for the surprise inspection that was supposed to highlight our ability to execute the mission at a moments notice.
So there I was, in a meeting with management. We were about to start the extended work shift when I commented that we should just operate with that level of effort anyway because it *was* our job to be able to execute at a moments notice. Silence reigned.
I received a good heart to heart talk (his to mine, I was not required to respond nor was a response time offered, you know the drill) with the Chief later in private.
-
Sunday 20th October 2019 10:48 GMT Danny 2
It's a long, long while from May to Johnson
I've been ripped off by Virgin Media for the past year, during which time I ascended to a Silver badge here. I know, at best I am a Bronze badge compared to the mass of you. It was an ordeal to get me taken off the internet, a month from now. Still quicker than the UK can leave the EU.
I voted for Brexit, not for Johnson. Who the hell voted for Johnson? I want a confirmatory vote, a general election and a revolution, I'm just not sure in what order.
At least I know I am about to leave the internet. Before I go I'll post my postal address in case any of you want to send letters (downvotes).
-
Monday 21st October 2019 09:52 GMT Rhuadh
"and phones failed over to the off-site call centre service"
Being on the DR call centre service team, the alternative premises were, the company's IT dept! When the plug was pulled, we were bussed over, with our headsets in the leatherette bags, and went to our allocated desks.
Er, we had the latest headsets, to fit the advanced phones that we had fitted a couple of years previously. In front of us we had computers with IT systems and programs that we couldn't change, even by turning off and reaccessing with our own access codes.
Oh, and the phones? 2 systems out of date, our headsets couldn't connect - the plugs were totally different. When we did get compatible head sets (after an hour), no one knew how to access the phone codes.
After 4 hours, it was decided that we could go home, so we all went to the nearest pub, where we found a load of familiar inebriated techies, whereupon we decided to follow their example in sampling the beverages.
Not sure how I got home....
-
Monday 21st October 2019 14:39 GMT Friar
Or the time in a Hosting Centre where a certain government organisation thought they knew better than to use the centre's own power backup system, so used their own battery backup system in their super secure server area (Security cleared personnel only!).
Unfortunately it overheated, releasing acid fumes into the server room. Also unfortunately the server room aircon operated in common with the rest of the building aircon, so the whole building was pumped full of fumes.
This resulted in the secure server area rapidly becoming insecure as all exits, emergency and otherwise were unlocked to try to disperse the fumes. Still we had an afternoon off as we were told to evacuate for several hours.
-
Monday 21st October 2019 15:46 GMT Anonymous Coward
A long time ago I was the IT Team leader for a technology company in London and we had a DR process that worked, we had tested it previously and made adjustments so it worked, it was not perfect but it worked staff could log in to the network and customers could call us.
when I left the company I gave my replacement a tour of the office and a handover of things that needed doing. among these were adapter plugs for the New UPS for the server room, new windows 7 image for the DR PC to replace the XP one and new image then needed sending to the DR Company. This was 03 June
In December they decided to run a new test of the DR system, and found that all the help desk people couldn't login and access what they needed as the DR image at the DR site was still XP and their windows 7 roaming profiles didn't work. so they cancelled the test and were instructed to create a new image and get it uploaded.
1) they couldn't find the One specially purchased PC which was the same spec as the ones at the DR Company, I was even called by a friend to ask where it was. (last time I saw it was safely in the server room Labelled DR-image-PC )
2) Three days after the December test they had a real DR requirement when the mains for the building was taken out, and yes no one had installed the adapters for the UPS in the server room.
Somehow even though I had been gone for 6 Months I got the blame for both of these issues :-(