User topics

Article topics

Log in Sign up

How not to test a new system: push a button and wait to see what happens

Welcome once again, valued reader, to Who, Me? – The Register's weekly confessional column in which readers recount their tales of derring-do that derring-didn't. This week meet a reader we'll Regomize as "Doug" who found himself in something of a hole when, as an ops manager for a food supply company, he oversaw a massive …

COMMENTS

Post your comment

House rules Send corrections

Add to 'My topics'

Monday 28th November 2022 07:10 GMT Little Mouse

Alternative Lesson: "Never turn anything off if..."

"you don't know how to turn it off"

FTFY

32 0 Reply
1. Monday 28th November 2022 08:19 GMT b0llchit
  
  Re: Alternative Lesson: "Never turn anything off if..."
  
  You know, it'll turn itself automagically off when the UPS fails. Turning it back on usually is the problem.
  
  47 0 Reply
  1. Monday 28th November 2022 08:35 GMT Admiral Grace Hopper
    
    Re: Alternative Lesson: "Never turn anything off if..."
    
    Time taken for a senior manager to formulate and execute plan for an impromptu DC failover using the Big Red Button? Microseconds.
    
    Time taken to get everything back up and working, failed boards replaced, systems restarted, rogue LAN cables replaced, comms balanced? 39 hours.
    
    Size of boot applied to arse of senior manager by Very Senior Manager? <-------------------- This Big -------------------->
    
    56 1 Reply
    1. Monday 28th November 2022 09:17 GMT Paul Crawford
      
      Re: Alternative Lesson: "Never turn anything off if..."
      
      However, if a power failure needs serious effort or hardware fixes to get it going, its a shit system.
      
      Who here has not see a UPS fail and simply take out the supply instead of going to bypass? (Think of any unfortunate APC owners)
      
      Or, less commonly, a local digger has JCB'd the local 11kV feed and your power is off for hours and UPS exhausted? (Generators are available, some of them might even work when needed)
      
      38 0 Reply
      1. Monday 28th November 2022 12:19 GMT oiseau
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        ... if a power failure needs serious effort or hardware fixes to get it going, its a shit system.
        
        Exactly my thought when I was reading the article.
        
        And if it isn't in the manual it ought not to happen.
        
        Well ...
        
        Reality/fate/chance or serendipity do not look at manuals to get their cues as how to proceed.
        
        Shit just happens.
        
        20+ years ago I was involved in the set up of all the hardware at a local election vote-tally data entry centre.
        
        It was a 300 PC, 2X heavyweight mirrored Compaq servers, industrial size UPS + automatic start generator gig.
        
        The very short deadline and the usual opposition rabble with daily accusations of vote tampering did not make things easier as the team were in the papers every day.
        
        Not a nice scenario and as it was my first job after a very long dry spell, I was not going to let anything pass.
        
        When everything was set up and all the individual parts of the system were duly tested and approved, a dry run was scheduled.
        
        It was with the 300+ staff working on mock/simulation vote tallies and in the midst of it all, just to see their faces, I asked them all just what would happen if power went out at any moment.
        
        A flurry of explanations were given, all very correct pointing out what should (according to manuals) happen in such an event.
        
        I replied that the proof was in the pudding, that it had to be tested without notice to anyone.
        
        Protests ensued but I refused to budge from there, clearly stating I would not sign off on anything if not tested as I required.
        
        Not at all happy, all parties involved finally accepted and all went as expected.
        
        That's the only way to know if things work properly.
        
        O.
        
        35 0 Reply
        
        Monday 28th November 2022 12:25 GMT Anonymous Coward
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        usual opposition rabble with daily accusations of vote tampering
        
        Magnify that by a million and you have the current climate.
        
        If I was in that job these days I would want serious danger money.
        
        23 0 Reply
        
        Monday 28th November 2022 12:30 GMT TDog
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        Strangely enough, with limited opportunities for general elections I for one welcome our new rabble pentesters. Without them the "believed redundant" analysis might never have happened.
        
        Pain in the arse yes, useful too.
        
        2 0 Reply
        
        Tuesday 29th November 2022 00:01 GMT Benegesserict Cumbersomberbatch
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        Pain in the arse is an interesting way to refer to an existential threat to civil society.
        
        6 0 Reply
      2. Monday 28th November 2022 13:08 GMT Anonymous Coward
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        All happily sitting in our office on a sunny Tuesday morning and suddenly a huge bang outside , the whole building goes dark, screens off, beeping everywhere as the building UPS kicks on to save the comms room. It's the bomb, run for the shelters! Nope. One of the council's finest digger operators decided he wanted to cause untold chaos up the whole street by cutting through a stack of cables feeding several businesses. Cue 60 mins of dusting off DR plans, switching off non-eessential kit and praying the UPS holds, 4 hours later the leccy board hooked everything back up temporarily while they decided how to fix the mess properly. Took about 2 weeks of constant minor power cuts, which resulted in a few PCs going bang and countless insurance claims from the businesses in the area as old kit finally gave up the ghost after too many power blips.
        
        None of this was in any manual we had, so we wrote our own from the experience!
        
        26 0 Reply
    2. Tuesday 29th November 2022 00:05 GMT vogon00
      
      Re: Alternative Lesson: "Never turn anything off if..."
      
      @Admiral Grace Hopper
      
      Seeing as you currently have 42 upvotes, and I is 'vogon-ish', I have to reply to this.
      
      Size of boot applied to arse of senior manager by Very Senior Manager? <-------------------- This Big -------------------->
      
      Would that it would be so....in my experience the shit flows from VSM->SM->M->me, even if it should have stopped at SM or M level:-)
      
      2 0 Reply
      1. Tuesday 29th November 2022 09:23 GMT Mooseman
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        "in my experience the shit flows from VSM->SM->M->me, even if it should have stopped at SM or M level:-)"
        
        Indeed, and the amount of excrement gets deeper with each descending level, so that the VSM-->SM "Dont do that again Clive please" ends up with M--> grunt "youre lucky you werent fired"
        
        3 0 Reply
        
        Thursday 1st December 2022 13:10 GMT Anonymous Custard
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        It's the old "birds in a tree" image (example).
        
        Whenever management look down, all they see is (their own) shit...
        
        Whenever workers look up, all they see is (management) arseholes...
        
        3 0 Reply
  2. Monday 28th November 2022 09:15 GMT CrazyOldCatMan
    
    Re: Alternative Lesson: "Never turn anything off if..."
    
    Turning it back on usually is the problem
    
    Especially if, like some of our more 'legacy'[1] systems, turning the system back on isn't just a case of turning all the servers on.. no - a carefully-scripted set of actions (turn on server A, enable services in a specific sequence and timings, turn on server B, wait 5 minutes, turn on server C etc etc).
    
    [1] Short for "we'd like to take them out the back for a mercy-killing but the business won't let us"
    
    35 0 Reply
    1. Monday 28th November 2022 13:26 GMT Anonymous Custard
      
      Re: Alternative Lesson: "Never turn anything off if..."
      
      And more often the zero'th step - work out exactly what legacy kit you really have, and where the **** it's been hidden.
      
      Having brought up servers A, B and C, only to find out that they are relying on mysterious servers AA and AAA for various services and data feeds and that no-one knows anything about as they've been quietly humming away in the back of a cupboard/under a desk/behind a partition wall/in some dusty basement for years without anyone actually being aware of them.
      
      And of course by the laws of sod, those are the ones that are the key foundation for everything and haven't been touched for years and so get mugged by the dust bunnies that have built up inside their cases, or when their fans/PSU's just screech to a halt or emit the magic smoke...
      
      16 0 Reply
      1. Monday 28th November 2022 14:30 GMT Yet Another Anonymous coward
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        Thankfully in these days of full stack hyper cloud convergence you don't discover some vital service was running on a developers machine - but you don't know which one
        
        5 0 Reply
        
        Monday 28th November 2022 22:11 GMT Strahd Ivarius
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        No, it is running on an AWS server instance nobody has any knowledge of...
        
        4 0 Reply
        
        Tuesday 29th November 2022 11:48 GMT phils
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        And the credit card it's billed to expired since last month's payment.
        
        4 0 Reply
        
        Monday 19th December 2022 17:59 GMT StudeJeff
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        And the card is set up for autopay and no one remembers the password.
        
        0 0 Reply
      2. Monday 28th November 2022 22:10 GMT Anonymous Coward
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        or having to go to the 4th basement level to start a server in a building where there are only 3 basement levels...
        
        2 0 Reply
      3. Tuesday 29th November 2022 10:00 GMT Anonymous Coward
        
        How did it even survive?
        
        Back in the day Novell updated the network protocols to improve performance on large networks,
        
        it was necessary roll out the change stack to all servers in a network before flipping the setting to use the new protocol, if any old style servers were still powered up they would be used as the gateway by default.
        
        Sure enough despite weeks of audits, server upgrades, comparison of real devices found with asset registers when we made the change the network stopped.
        
        We were expecting this and were perched looking at network traffic and soon found which network segment was taking all the traffic, phones were ringing off the hook and the entire organisation was stopped. We swatted the affected office and sure enough there was no server visible. a search of all cupboards revealed a hole with a lan cable and power cable in the side of one, sure enough under a pile of other rubbish there was a 286 server lying on the floor inside the cupboard. It must have been at least 8 years old and had survived with very little ventilation and numerous powercycles with no attention for at least 4 years.
        
        When we got it back to our office the only service it was running was a fax gateway, at some point it had been decided that it was unsightly and someone had paid a joiner to make the hole in the cabinet then shoved the server in there and forgotten about it.
        
        6 0 Reply
        
        Tuesday 13th December 2022 15:49 GMT Anonymous Coward
        
        Re: How did it even survive?
        
        Yep -- frightening how often this happens. I was working in a telephone office when a *lot* of alarm bells starting sounding. Outside, two lines of pin-flags were delineating the path of the cables between office and world. A digger decided to put his bucket right between the two lines. It took some time to splice them all (working round the clock). This happened decades ago; the manager first thought that the system was being overloaded and started shouting: "Has Nixon resigned?"
        
        0 0 Reply
    2. Monday 28th November 2022 17:31 GMT RichardBarrell
      
      Re: Alternative Lesson: "Never turn anything off if..."
      
      Tentatively, I am coming to believe the way to avoid this is to, at the implementation stage, have the system's developers follow a process that involves switching it on and off a lot, and tearing it down and re-deploying it a lot. Ideally with all the components booting up in a more or less random order.
      
      So the processes for deploying and cold booting your system will all get thoroughly tested many times and the developers themselves will be very severely inconvenienced if it doesn't reliably come up cleanly without hand holding. Hopefully they'll respond to this by fixing the software to boot reliably (e.g. making each part come up cleanly when its dependencies eventually show up).
      
      8 0 Reply
      1. Monday 28th November 2022 18:35 GMT Yet Another Anonymous coward
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        Or just employ a chaotic monkey to run around unplugging things and turning stuff off at random
        
        5 0 Reply
        
        Tuesday 29th November 2022 03:50 GMT Montreal Sean
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        Can I be the chaotic monkey? I just need a banana.
        
        6 0 Reply
        
        Thursday 1st December 2022 13:13 GMT Anonymous Custard
        
        Re: Alternative Lesson: "Never turn anything off if..."
        
        I think in our organisation, that's probably me. Although I prefer the (borrowed) title of Roving Agent of Chaos.
        
        I'm not sure if the organsation actually realises this though... ;-)
        
        1 0 Reply
  3. Monday 28th November 2022 23:40 GMT Martin-73
    
    Re: Alternative Lesson: "Never turn anything off if..."
    
    Was absolutely terrified of this today, small 19" cabinet in a nursing home, doing the internet routing to the building, cctv, and medical records for residents, much of it mandatory for their insurance/license, said cabinet contained an ancient looking APC UPS. Our job (should we choose to accept it) was to feed more sockets off the circuit it was plugged into. Thus needed to kill the power to that circuit. There was another socket on the other side of the office, on a different circuit (different distribution board and different electrical intake even lol). I had the nerve wracking job of yanking the plug out, and slamming into an extension lead from the opposite socket. Imagine the HORROR when the (borrowed) extension lead arced and crackled while i wrestled to put the plug in. Fortunately the ageing UPS not only had a functioning battery, but withstood the barrage of arcing impeccably.
    
    They are however going to schedule a shutdown to allow us to replace the 12v batteries as it went down to 30% in that few seconds.
    
    11 0 Reply
Monday 28th November 2022 07:17 GMT lglethal

Yep, Doug was definitely a candidate for Head of IT...

47 0 Reply
1. Monday 28th November 2022 07:28 GMT Joe W
  
  As the BOFH remarked when they pushed him to become an IT manager
  
  Boss: "a fool could do it!"
  
  Simon: "a fool generally does it"
  
  63 1 Reply
  1. Monday 28th November 2022 07:56 GMT UCAP
    
    Having once been the IT Manager for my previous company (and SME that really, really needed 100% uptime on its server resources) my reaction to that quotation is to curl up in a corner and whimper quietly to myself.
    
    33 0 Reply
2. Monday 28th November 2022 09:28 GMT Anonymous Coward
  
  I would say a Senior Head of IT.....
  
  Just put the initals of his new job title on the office door!
  
  29 0 Reply
  1. Monday 28th November 2022 09:36 GMT Anonymous Coward
    
    Especially if Doug's middle name was Ian and surname Prentice (as two random names with the correct initials)
    
    14 0 Reply
3. This post has been deleted by its author
Monday 28th November 2022 08:39 GMT John Sager

Testing times ahead

I expect one or two systems may get the Big Red Button test this winter but not by the one near the door in the server room!

11 0 Reply
Monday 28th November 2022 09:06 GMT MJB7

Boo! Very poor "Who? Me?" this week. "One side of mirror switched off; mirror does its job" ... and that's it?

6 2 Reply
1. Monday 28th November 2022 09:29 GMT chivo243
  
  You've just given Who? Me? the Syndrome Award. Lame! Lame! Lame! ??
  
  3 0 Reply
2. Monday 28th November 2022 09:29 GMT Anonymous Coward
  
  ""One side of mirror switched off; mirror does its job" ... and that's it?"
  
  Just a couple of years ago, we were moving an AS400 that was, in everyone's memory (and not on documentation), configured, $DEITY knows how, as a mirrored active/passive AS400 metro-cluster.
  
  Upon switching the passive side on, we realized the active side switched to everything Read-Only.
  
  We never had time to investigate this, but moved the passive AS400 as quickly as possible.
  
  But then again, this legacy system was to be decommissioned. Has been in this state for multiple decades :)
  
  9 0 Reply
3. Monday 28th November 2022 13:29 GMT Jou (Mxyzptlk)
  
  Well, no. Not poor from my point of view. The "Who Me" part is to communicate before pushing a button.
  
  6 0 Reply
4. Monday 28th November 2022 19:21 GMT PRR
  
  > Very poor "Who? Me?"
  
  I'd call that Newsworthy. It is a theory-breaking refutation of Murphy's Law.
  
  2 0 Reply
Monday 28th November 2022 09:07 GMT andy 103

And if it isn't in the manual it ought not to happen.

That suggests the manual is always correct.

Which in the case of IBM manuals is almost certainly never the case.

Sometimes you have to think outside the box. This is known as "experience" and "skill".

31 0 Reply
1. Monday 28th November 2022 12:31 GMT Outski
  
  Re: And if it isn't in the manual it ought not to happen.
  
  Also known as knowing where to put the chalk-mark
  
  6 0 Reply
  1. Tuesday 29th November 2022 03:54 GMT Montreal Sean
    
    Re: And if it isn't in the manual it ought not to happen.
    
    Chalkmark goes around the dead body of course!
    
    4 0 Reply
2. Monday 28th November 2022 15:37 GMT VeganVegan
  
  Re: And if it isn't in the manual it ought not to happen.
  
  Sometimes the manual omits / hides useful information.
  
  I remember working on an IBM 1130 system at JPL (yes, it was that long ago; punched cards to load programs, big panel of blinking lights and toggle switches instead of a monitor, 8k of RAM…).
  
  The system flat out refused to run Assembly and Fortran at the same time.
  
  An enterprising system engineer I worked with discovered that hiding in the config byte-array at the head of RAM was a bit that enforced this. Changing that bit allowed Assembly and Fortran to run and call each other.
  
  We couldn’t be bothered to check if IBM offered an upgrade to allow both languages to run at the same time.
  
  Geezer icon —->
  
  17 1 Reply
  1. Monday 28th November 2022 15:58 GMT Anonymous Coward
    
    Re: And if it isn't in the manual it ought not to happen.
    
    yes probably a several £,000 upgrade! to remove that bit
    
    7 0 Reply
3. Monday 28th November 2022 16:06 GMT TimMaher
  
  Re: IBM
  
  Hence the acronym “I Bring Manuals”
  
  Also known as “I’m Back Monday”, “It’s Being Mended” and many, many more.
  
  3 0 Reply
  1. Monday 28th November 2022 18:10 GMT chivo243
    
    Re: IBM
    
    I Be Milking it, for all it's worth!
    
    See ya Suckers!
    
    3 0 Reply
  2. Tuesday 29th November 2022 01:22 GMT SU
    
    Re: IBM
    
    Idiots Become Managers
    
    I've Been Married
    
    I've Been Moved
    
    I Bring Many (to meetings)
    
    There are lists of them in IBM
    
    1 0 Reply
    1. Monday 19th December 2022 18:07 GMT StudeJeff
      
      Re: IBM
      
      I used to be an IBMer and had a sign in my cube that said "Nunquam Permissium Opus, Interfere Per Plactium"... Internet Latin for "Never Let the Work Interfere With the Meetings"
      
      I figure if your going to be a smarta$$ at work it's best to do it in Latin. It's classier and no one knows what it means anyway (except I'd tell anyone who would listen).
      
      0 0 Reply
4. Tuesday 29th November 2022 14:56 GMT Anonymous Coward
  
  Re: And if it isn't in the manual it ought not to happen.
  
  When the manuals were referred to as "eyes only", my immediate thought was "brain not allowed". Which was then duly carried out by the IBMer Bob - "no, we can't do a test that simulates real-life (including startup), we MUST do it exactly the way the book says!"
  
  0 0 Reply
Monday 28th November 2022 09:23 GMT Pascal Monett

How to make an IBM engineer hyperventilate ?

Easy : ask him how many years he's got until retirement.

20 0 Reply
1. Monday 28th November 2022 14:33 GMT Yet Another Anonymous coward
  
  Re: How to make an IBM engineer hyperventilate ?
  
  Truck question, if he can count with using his fingers he's already been RIFed
  
  3 0 Reply
2. Monday 28th November 2022 14:44 GMT Anonymous Coward
  
  Re: How to make an IBM engineer hyperventilate ?
  
  Nah.
  
  Just ask them to work a full week in the office. They weren't called TWaTs for nothing (Tuesday, Wednesday, and Thursday).
  
  3 2 Reply
Monday 28th November 2022 09:28 GMT ColinPa

Why not use the backup generators.

I've told this one before.

A bank was hit with a power outage, and they went into the well practised fail over the backup site. A passing senior manager/director told them - don't do that - we have generators in the car park for this sort of emergency - use those and avoid the outage of switching sites. So they reluctantly restarted in place. Half way through the power on and restart, they found the generators did not have enough power for the machine room and so were stuck in limbo. They did not want to shutdown half way through an emergency restart, and they could not complete the start up to be able to shut it down. They had to wait a couple of hours before the power was restored, and they could complete the restart.

When the incident was reviewed by the board, the manager/director had to explain it was his decision, and admit he did not actually know about the generators capacity, he just paid for them.

The IT team learned a lesson - there are times when you ignore the management chain and do what you have practised.

29 0 Reply
1. Monday 28th November 2022 10:18 GMT andy 103
  
  Re: Why not use the backup generators.
  
  The IT team learned a lesson - there are times when you ignore the management chain and do what you have practised.
  
  That only works in situations where the (unknown, future) outcome is a success though. If the IT team tried "anything" and it did not go according to plan it would 100% be seen as their fault irrespective of the circumstances that preceded it.
  
  22 1 Reply
2. Monday 28th November 2022 11:46 GMT Adrian 4
  
  Re: Why not use the backup generators.
  
  So he was paying for something that didn't work ?
  
  Sounds like a useful lesson was learned there, too.
  
  10 0 Reply
3. Monday 28th November 2022 12:18 GMT Jon 37
  
  Re: Why not use the backup generators.
  
  I'd write down on paper:
  
  """
  
  To whom it may concern,
  
  I am aware that there has been a power outage, and the power is still out.
  
  I have been told by IT that the documented, agreed, and tested procedure is to fail over the IT systems to the backup site.
  
  I have been told by IT that restarting the IT systems here, on generators, is not the documented or agreed procedure, and has not been tested.
  
  I am ordering IT to try to restart the IT systems here, on generators.
  
  Signed
  
  ___________
  
  <Name of senior Manager>
  
  Date: xx/xx/xx Time: xx:xx
  
  """
  
  Then I would give that piece of paper to the director and ask them to sign it to confirm the order.
  
  Any sane person would take one look at that and refuse to sign it, and let the IT people follow the plan.
  
  If the director is stupid enough to sign, then they get what they deserve.
  
  29 0 Reply
  1. Monday 28th November 2022 14:07 GMT bishopkirk
    
    Re: Why not use the backup generators- which generators?
    
    A few years ago work stopped in my hospital because they realised all the copper to the generators had been stolen… probably weeks had passed before they found out but it put a certain pressure on finishing those operations on time…
    
    7 0 Reply
Monday 28th November 2022 11:20 GMT IanRS

Many many years ago I was asked to create some tests for a file storage system. The intention was that a file could be moved between two storage areas, but you only ever saw it in one. The whole file could be accessed from 'A', or 'B', but you should never see it in both at once, nor should you ever be able to see a partial file anywhere. I specified: use a large file so you have a few seconds to act in, start the transfer, disconnect the network cable, wait a few seconds, check. No problem with that one. The second variation said start the transfer, disconnect the power lead from one side, then the network cable, repower but do not reconnect the network, wait for startup to complete and check. The project flat out refused to run this test. Considering this was a system intended for usage in combat areas (even though in staging posts rather than front-line), I did not consider it an unreasonable scenario. However, the manual stated that systems must always be shutdown down cleanly by following the specified procedures.

19 0 Reply
1. Monday 28th November 2022 13:36 GMT Anonymous Custard
  
  Ample proof if needed that manuals and procedures have the same survival chance as battle-plans in such scenarios of contact with the enemy (or indeed friendly fire)?
  
  12 0 Reply
  1. Monday 28th November 2022 17:53 GMT A.P. Veening
    
    Friendly fire ... isn't.
    
    1 0 Reply
    1. Monday 28th November 2022 22:17 GMT Anonymous Coward
      
      yes it is...
      
      ...more accurate than enemy fire usually.
      
      4 0 Reply
Monday 28th November 2022 11:51 GMT Uncle Slacky

I see what you did there

"Doug...found himself in something of a hole"

16 0 Reply
Monday 28th November 2022 13:22 GMT SonofRojBlake

Doug and Bob?

Are they Metropolitan police officers with a difference?

2 0 Reply
Monday 28th November 2022 13:31 GMT Anonymous Coward

*yawn* Another "idiot pushes a button" story.

3 3 Reply
1. Monday 28th November 2022 16:04 GMT Anonymous Coward
  
  *yawn* Another "idiot pushes a button" story.
  
  You know, I *was* thinking about clicking "upvote", but now... :-)
  
  5 0 Reply
Monday 28th November 2022 13:33 GMT Jonathon Green

Doug is not the problem here…

…OK, so pressing The Big Red Button is not something to be taken lightly, or to be done without consideration of the consequences, but, given, that this is supposed to have been a resilient process if Bob doesn’t like the idea and can’t come up with a better reason than “the manual says not to do it, and there’s nothing in our procedures about it…” then someone ought to be giving him Hard Stares and asking him Awkward Questions, and other people ought to be answering Awkward questions about why Awkward Questions weren’t asked before Bob’s employer got the deal.

7 0 Reply
1. Wednesday 30th November 2022 13:33 GMT Daedalus
  
  Re: Doug is not the problem here…
  
  Yes, well, if you'd care to put yourself in Bob's situation, you might think differently. After you've seen enough screwups, you assume that anything that can go wrong, will go wrong, and will do it at the most excruciating time.
  
  0 0 Reply
  1. Thursday 1st December 2022 13:50 GMT Anonymous Custard
    
    Re: Doug is not the problem here…
    
    And the advanced version being that things which cannot go wrong still will, just out of bloody-minded vindictiveness.
    
    Also at the worst possible moment, with the least amount of time and opportunity to recover or cope.
    
    1 0 Reply
Monday 28th November 2022 14:23 GMT Aladdin Sane

never turn anything off if you don't know how to turn it back on.

And that, dear reader, is why my marriage failed.

10 0 Reply
1. Monday 28th November 2022 14:35 GMT Yet Another Anonymous coward
  
  Re: never turn anything off if you don't know how to turn it back on.
  
  Hope you had an off site backup
  
  12 0 Reply
  1. Monday 28th November 2022 20:07 GMT Anonymous Coward
    
    Re: never turn anything off if you don't know how to turn it back on.
    
    Regularly test your off-site backup for malware infections.
    
    7 0 Reply
    1. Tuesday 29th November 2022 00:10 GMT Benegesserict Cumbersomberbatch
      
      Re: never turn anything off if you don't know how to turn it back on.
      
      Log to stderr
      
      3 0 Reply
2. Monday 28th November 2022 16:50 GMT Anonymous Coward
  
  Re: never turn anything off if you don't know how to turn it back on.
  
  Oh...
  
  Did you have a hardware failure?
  
  7 0 Reply
Monday 28th November 2022 15:56 GMT Anonymous Coward

in a word YES

whilst a 2nd year BTEC Electrical engineering student back in 89/90, I "accidentally" cut power to the entire college campus! Picture the scene about half a dozen spotty 18 year olds dispatched to the college main electrical switch room to make sketches of and label what we found in there. Any road up in we all went in to this quite small room armed with A4 binder, paper and pencils. Had a bit of a butchers around, ok that's obviously the main breaker for the campus, chuffing great BIG switch like a one armed bandit, hmmmmmm what's this small flippy thing on the side of it says I. Press, CLUNK as the massive handle descends, then an eerie quiet as EVERYTHING was off! This lasted a few seconds until broken by the sound of my mates saying oh sh1t quite loudly and the sound of doors opening and lectures coming out in to the corridor wondering what the feck was going on! Yes I had successfully found and identified the main breakers test switch!

My mates, who were all fab and several are still in touch 30 years later, covered for me and said I knocked the test switch with my A4 binder. For a whole term after the college clock and bell was about 5mins off as that's how long it took to get the power back on and they didn't reset the clock hahahahahahahahahah

15 0 Reply
1. Monday 28th November 2022 16:13 GMT TimMaher
  
  Re: “eerie quiet”
  
  Question “What is the worst sound in a server room?”
  
  Answer “No sound at all! “
  
  Copyright me about 1995
  
  15 0 Reply
  1. Monday 28th November 2022 16:28 GMT Yet Another Anonymous coward
    
    Re: “eerie quiet”
    
    >Question “What is the worst sound in a server room?”
    
    Velociraptor ?
    
    16 0 Reply
2. Monday 28th November 2022 16:46 GMT omz13
  
  Re: in a word YES
  
  No safety cover on the switch?!… that is just asking for it to be “accidentally” pressed
  
  5 0 Reply
  1. Monday 28th November 2022 16:57 GMT Anonymous Coward
    
    Re: in a word YES
    
    nope and this was 30 years ago and the switch was probably a good 20 years old then!
    
    3 0 Reply
Monday 28th November 2022 17:06 GMT Flightmode

About 15 years ago, a colleague and I went on site to a local PoP to install a third switch into an existing Cisco 3750 stack. The plan was simple - screw the switch in place, then just remove one of the two stacking cables between the existing ones, connect the third switch in-line, add a third cable, flip the breaker and configure the new ports. Simple, right?

Only my colleague had a brief brain fart and flipped the breakers for the OTHER two switches, essentially shutting access to the whole PoP down - taking some thirty-odd-thousand customers connected downstream with them. And to add insult to injury - when the stack booted back up, the NEW switch became the stack master so all the port numbers were transposed and all the old ports had to be reconfigured. Via console. As both our phones were pinging like crazy trying to tell us that we'd taken down the whole site. Trust me, we knew. And that was they day we learned to always pre-configure the stack member serial numbers.

(We still work together. And our colleagues still to this day remind us of that incident whenever we walk towards a Data Center together.)

11 0 Reply
1. Monday 28th November 2022 17:23 GMT Anonymous Coward
  
  ex-colleague of mine working in the DC out of hours was configuring a port that he thought was just a std port but had mistyped and was configuring a totally different port, unfortunately it was also a trunk. Took the port down which then initiated a failover to the other DC, sh1t show!
  
  6 0 Reply
Monday 28th November 2022 19:20 GMT Giles C

First week at a new job

Needed to get something off a shelf above the ups, this ups had an emergency off button that was about 1mm proud of the panel it was mounted to.

Caught it with my knee and dropped the main comms room and a large as400. It took an hour for the as400 to come back up.

I left the company 21 years later so it didn’t do any long term harm to my career.

3 0 Reply
1. Wednesday 30th November 2022 22:57 GMT John Brown (no body)
  
  Re: First week at a new job
  
  "I left the company 21 years later so it didn’t do any long term harm to my career."
  
  Revenge is a dish best served cold. But 21 years is taking it to bit of an extreme :-)
  
  2 0 Reply
Monday 28th November 2022 20:46 GMT Ashto5

Fortune favours the damn lucky

While being less senior than now, in a moment of madness I decided to see if I could rebuild the entire DB and Data from scripts.

It took me a few days to get it working and I was then able to recreate from scratch the main system of the company feeling pretty good about it.

I went home thought nothing of it until we came in the next day with the fire brigade lights flashing and the building shut, a water pipe had bust, and in the main server room a tsunami was occurring.

The bosses were panicking about the business not surviving so when I mentioned I could rebuild the system in a matter of hours IF they got my desktop out.

Later that day the system was back up and running slowly on my PC.

Hero back slapping ensued then it struck me the data I had been working with was at least a week out of date, best not to ruin their day so I kept quiet.

One of my Finest hours

10 0 Reply
Monday 28th November 2022 23:03 GMT aerogems

IBM Should Give Him A Bonus

Sounds like he managed to come up with a scenario not covered in their manuals, which absolutely should be. The "trigger happy" user who just goes around pushing buttons. If it happened once, it's bound to have happened before and would happen again. IBM should give the guy a bonus or something for finding that particular gap in their process.

3 1 Reply
Monday 28th November 2022 23:05 GMT Anonymous Coward

Sounds like VMware

The DR failover to the backup datacente thing.

Test everything EXCEPT the big failover button. Failback is not part of the spec ;-)

1 0 Reply
Tuesday 29th November 2022 01:01 GMT hoofie

I was that idiot

A number of years ago on a new hospital build with lots of IT

It was time for the power down test on a data center. A proper power test - severing of supplies etc to test data centre failover.

Everyone in a room. For some reason Data Storage guys were adamant they needed a soft shutdown because anything else would knack the arrays.

Idiot here popped up and pointed out that if they were going to be damaged they weren't a lot of use in a major power loss scenario.

All hell breaks loose with lots of angry questions from the customer to the Storage specialists.

At that point I made my excuses and left before I got lynched.

15 0 Reply
1. Tuesday 29th November 2022 08:39 GMT Giles C
  
  Re: I was that idiot
  
  Sounds a perfectly reasonable request to me.
  
  If the system is that sensitive then it needs its own way of reacting to a power down, either a local ups or a battery system to write the final cache entries to the disks before it dies. It should be part of the chassis or firmware that handles this.
  
  Otherwise it has there are things out of its control and must be able to deal with them.
  
  3 0 Reply
Tuesday 29th November 2022 09:37 GMT Stuart Castle

A few years ago, I was part of a small team workinging on an inventory management system. Because we couldn't find a commercial one that fitted our needs 100%, our manager initiated a project to design and build one.

We did, and the final system worked well for years. We had the development and test servers set up properly, with us doing development on the development, testing it as rigidly as possibly with the resources available to us. One of my colleagues also had the responsibility to roll out updates to the production server. In theory, he was the only one with rights to do this. This was a deliberate choice to prevent accidental updates. In fact, the system was slightly bureaucratic, again deliberately, to prevent accidental updates.

The system had been in use for some months, when all of a sudden, the users started reporting errors. My colleague and I started investigating. The third member of our team apparently had a meeting, so vanished.

After about 15 minutes investigation, we had worked out what happened. The system had its own database on SQL server. It recorded everything the system did as a "transaction" and the transaction table had vanished.

Upon further investigation, we found the transaction table had been renamed as a full stop. We found that the technician who had vanished for a meeting had renamed the file. Unfortunately, the version of SQL Server Management Studio wouldn't allow us to access the table after this, so we had to restore it from the previous night's backup. Thankfully, we only lost a couple of hours transactions.

3 0 Reply
Tuesday 29th November 2022 12:37 GMT old-iron

That DR plan is only as good as it's execution

A major oil firm realised they had a power outage at "DC1" but UPS kicks in perfectly

Process initiated to bring down systems carefully, as "DC2" is absolutely fine

Systems/workloads closed satisfactorily, well in advance of the UPS dying

...

Corporate meltdown then ensues as they'd shut down the systems in an orderly fashion in DC2

9 0 Reply
Wednesday 30th November 2022 13:30 GMT Daedalus

In related news....

Bill Gates (for it is he) was so convinced that an install of the latest Windows was bulletproof that, while it was being demonstrated by his VP on stage and on live video across the planet, he pulled the plug on the test PC in the middle of the process.

To say that the VP was "white faced" at that point is an understatement.

1 0 Reply

POST COMMENT House rules

Not a member of The Register? Create a new account here.

Other stories you might like

Tired techie 'fixed' a server, blamed Microsoft, and got away with it

who, me? If you're too exhausted to think, maybe you shouldn't be doing tech support

On-Prem 15 Apr 2024 | 66

Windows 95 support chap skipped a step and sent user into Micro-hell

Who, Me? Every byte, and executable, counted when trying to fix Redmond's finest

OSes 8 Apr 2024 | 75

You break it, you ... run away and hope somebody else fixes it

Who, Me? Enthusiastic young tech decided to simplify the mainframe, with unexpected results

On-Prem 1 Apr 2024 | 62

DBA made ten years of data disappear with one misplaced parameter

Who, Me? Greybeards thought it was clever, making this an educational experience in more ways than one

Databases 25 Mar 2024 | 41

Yes, I did just crash that critical app. And you should thank me for having done so

Who, Me? Quick thinking turned poor judgement into genius proactivity

Software 18 Mar 2024 | 75

Intern with superuser access 'promoted' himself to CEO

Who, Me? Older and wiser colleagues couldn’t see the funny side

Software 11 Mar 2024 | 64

Health system network turned out to be a house of cards – Cisco cards, that is

who, me? Pride came before a fall for techie who thought he knew it all

Networks 4 Mar 2024 | 57

If we plug this in without telling anyone, nobody will know we caused the outage

Who, Me? The whole incident could have been avoided by spending some spare change on a fastener

On-Prem 26 Feb 2024 | 69

Self-taught-techie slept on the datacenter floor, survived communism, ended a marriage

Who, Me? This is what happens when you get promoted too fast, too soon

Software 19 Feb 2024 | 87

'Crash test dummy' smashed VIP demo by offering a helping hand

Who, Me? Sometimes you can have too many people in the room

On-Prem 12 Feb 2024 | 45

Developer's default setting created turbulence in the flight simulator

who, me? What is it? It's an instrument used to train pilots, but that's not important right now

Software 5 Feb 2024 | 122

One person's shortcut was another's long road to panic

Who, Me? Clever techie thought of everything – except someone else's stupidity

Storage 29 Jan 2024 | 73

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024