back to article How not to test a new system: push a button and wait to see what happens

Welcome once again, valued reader, to Who, Me? – The Register's weekly confessional column in which readers recount their tales of derring-do that derring-didn't. This week meet a reader we'll Regomize as "Doug" who found himself in something of a hole when, as an ops manager for a food supply company, he oversaw a massive …

  1. Little Mouse

    Alternative Lesson: "Never turn anything off if..."

    "you don't know how to turn it off"

    FTFY

    1. b0llchit Silver badge
      Trollface

      Re: Alternative Lesson: "Never turn anything off if..."

      You know, it'll turn itself automagically off when the UPS fails. Turning it back on usually is the problem.

      1. Admiral Grace Hopper

        Re: Alternative Lesson: "Never turn anything off if..."

        Time taken for a senior manager to formulate and execute plan for an impromptu DC failover using the Big Red Button? Microseconds.

        Time taken to get everything back up and working, failed boards replaced, systems restarted, rogue LAN cables replaced, comms balanced? 39 hours.

        Size of boot applied to arse of senior manager by Very Senior Manager? <-------------------- This Big -------------------->

        1. Paul Crawford Silver badge

          Re: Alternative Lesson: "Never turn anything off if..."

          However, if a power failure needs serious effort or hardware fixes to get it going, its a shit system.

          Who here has not see a UPS fail and simply take out the supply instead of going to bypass? (Think of any unfortunate APC owners)

          Or, less commonly, a local digger has JCB'd the local 11kV feed and your power is off for hours and UPS exhausted? (Generators are available, some of them might even work when needed)

          1. oiseau
            Facepalm

            Re: Alternative Lesson: "Never turn anything off if..."

            ... if a power failure needs serious effort or hardware fixes to get it going, its a shit system.

            Exactly my thought when I was reading the article.

            And if it isn't in the manual it ought not to happen.

            Well ...

            Reality/fate/chance or serendipity do not look at manuals to get their cues as how to proceed.

            Shit just happens.

            20+ years ago I was involved in the set up of all the hardware at a local election vote-tally data entry centre.

            It was a 300 PC, 2X heavyweight mirrored Compaq servers, industrial size UPS + automatic start generator gig.

            The very short deadline and the usual opposition rabble with daily accusations of vote tampering did not make things easier as the team were in the papers every day.

            Not a nice scenario and as it was my first job after a very long dry spell, I was not going to let anything pass.

            When everything was set up and all the individual parts of the system were duly tested and approved, a dry run was scheduled.

            It was with the 300+ staff working on mock/simulation vote tallies and in the midst of it all, just to see their faces, I asked them all just what would happen if power went out at any moment.

            A flurry of explanations were given, all very correct pointing out what should (according to manuals) happen in such an event.

            I replied that the proof was in the pudding, that it had to be tested without notice to anyone.

            Protests ensued but I refused to budge from there, clearly stating I would not sign off on anything if not tested as I required.

            Not at all happy, all parties involved finally accepted and all went as expected.

            That's the only way to know if things work properly.

            O.

            1. Anonymous Coward
              Anonymous Coward

              Re: Alternative Lesson: "Never turn anything off if..."

              usual opposition rabble with daily accusations of vote tampering

              Magnify that by a million and you have the current climate.

              If I was in that job these days I would want serious danger money.

              1. TDog

                Re: Alternative Lesson: "Never turn anything off if..."

                Strangely enough, with limited opportunities for general elections I for one welcome our new rabble pentesters. Without them the "believed redundant" analysis might never have happened.

                Pain in the arse yes, useful too.

                1. Benegesserict Cumbersomberbatch Silver badge

                  Re: Alternative Lesson: "Never turn anything off if..."

                  Pain in the arse is an interesting way to refer to an existential threat to civil society.

          2. Anonymous Coward
            Anonymous Coward

            Re: Alternative Lesson: "Never turn anything off if..."

            All happily sitting in our office on a sunny Tuesday morning and suddenly a huge bang outside , the whole building goes dark, screens off, beeping everywhere as the building UPS kicks on to save the comms room. It's the bomb, run for the shelters! Nope. One of the council's finest digger operators decided he wanted to cause untold chaos up the whole street by cutting through a stack of cables feeding several businesses. Cue 60 mins of dusting off DR plans, switching off non-eessential kit and praying the UPS holds, 4 hours later the leccy board hooked everything back up temporarily while they decided how to fix the mess properly. Took about 2 weeks of constant minor power cuts, which resulted in a few PCs going bang and countless insurance claims from the businesses in the area as old kit finally gave up the ghost after too many power blips.

            None of this was in any manual we had, so we wrote our own from the experience!

        2. vogon00

          Re: Alternative Lesson: "Never turn anything off if..."

          @Admiral Grace Hopper

          Seeing as you currently have 42 upvotes, and I is 'vogon-ish', I have to reply to this.

          Size of boot applied to arse of senior manager by Very Senior Manager? <-------------------- This Big -------------------->

          Would that it would be so....in my experience the shit flows from VSM->SM->M->me, even if it should have stopped at SM or M level:-)

          1. Mooseman Silver badge

            Re: Alternative Lesson: "Never turn anything off if..."

            "in my experience the shit flows from VSM->SM->M->me, even if it should have stopped at SM or M level:-)"

            Indeed, and the amount of excrement gets deeper with each descending level, so that the VSM-->SM "Dont do that again Clive please" ends up with M--> grunt "youre lucky you werent fired"

            1. Anonymous Custard
              Trollface

              Re: Alternative Lesson: "Never turn anything off if..."

              It's the old "birds in a tree" image (example).

              Whenever management look down, all they see is (their own) shit...

              Whenever workers look up, all they see is (management) arseholes...

      2. CrazyOldCatMan Silver badge

        Re: Alternative Lesson: "Never turn anything off if..."

        Turning it back on usually is the problem

        Especially if, like some of our more 'legacy'[1] systems, turning the system back on isn't just a case of turning all the servers on.. no - a carefully-scripted set of actions (turn on server A, enable services in a specific sequence and timings, turn on server B, wait 5 minutes, turn on server C etc etc).

        [1] Short for "we'd like to take them out the back for a mercy-killing but the business won't let us"

        1. Anonymous Custard
          Headmaster

          Re: Alternative Lesson: "Never turn anything off if..."

          And more often the zero'th step - work out exactly what legacy kit you really have, and where the **** it's been hidden.

          Having brought up servers A, B and C, only to find out that they are relying on mysterious servers AA and AAA for various services and data feeds and that no-one knows anything about as they've been quietly humming away in the back of a cupboard/under a desk/behind a partition wall/in some dusty basement for years without anyone actually being aware of them.

          And of course by the laws of sod, those are the ones that are the key foundation for everything and haven't been touched for years and so get mugged by the dust bunnies that have built up inside their cases, or when their fans/PSU's just screech to a halt or emit the magic smoke...

          1. Yet Another Anonymous coward Silver badge

            Re: Alternative Lesson: "Never turn anything off if..."

            Thankfully in these days of full stack hyper cloud convergence you don't discover some vital service was running on a developers machine - but you don't know which one

            1. Strahd Ivarius Silver badge
              Facepalm

              Re: Alternative Lesson: "Never turn anything off if..."

              No, it is running on an AWS server instance nobody has any knowledge of...

              1. phils

                Re: Alternative Lesson: "Never turn anything off if..."

                And the credit card it's billed to expired since last month's payment.

                1. StudeJeff

                  Re: Alternative Lesson: "Never turn anything off if..."

                  And the card is set up for autopay and no one remembers the password.

          2. Anonymous Coward
            Anonymous Coward

            Re: Alternative Lesson: "Never turn anything off if..."

            or having to go to the 4th basement level to start a server in a building where there are only 3 basement levels...

          3. Anonymous Coward
            Anonymous Coward

            How did it even survive?

            Back in the day Novell updated the network protocols to improve performance on large networks,

            it was necessary roll out the change stack to all servers in a network before flipping the setting to use the new protocol, if any old style servers were still powered up they would be used as the gateway by default.

            Sure enough despite weeks of audits, server upgrades, comparison of real devices found with asset registers when we made the change the network stopped.

            We were expecting this and were perched looking at network traffic and soon found which network segment was taking all the traffic, phones were ringing off the hook and the entire organisation was stopped. We swatted the affected office and sure enough there was no server visible. a search of all cupboards revealed a hole with a lan cable and power cable in the side of one, sure enough under a pile of other rubbish there was a 286 server lying on the floor inside the cupboard. It must have been at least 8 years old and had survived with very little ventilation and numerous powercycles with no attention for at least 4 years.

            When we got it back to our office the only service it was running was a fax gateway, at some point it had been decided that it was unsightly and someone had paid a joiner to make the hole in the cabinet then shoved the server in there and forgotten about it.

            1. Anonymous Coward
              Anonymous Coward

              Re: How did it even survive?

              Yep -- frightening how often this happens. I was working in a telephone office when a *lot* of alarm bells starting sounding. Outside, two lines of pin-flags were delineating the path of the cables between office and world. A digger decided to put his bucket right between the two lines. It took some time to splice them all (working round the clock). This happened decades ago; the manager first thought that the system was being overloaded and started shouting: "Has Nixon resigned?"

        2. RichardBarrell

          Re: Alternative Lesson: "Never turn anything off if..."

          Tentatively, I am coming to believe the way to avoid this is to, at the implementation stage, have the system's developers follow a process that involves switching it on and off a lot, and tearing it down and re-deploying it a lot. Ideally with all the components booting up in a more or less random order.

          So the processes for deploying and cold booting your system will all get thoroughly tested many times and the developers themselves will be very severely inconvenienced if it doesn't reliably come up cleanly without hand holding. Hopefully they'll respond to this by fixing the software to boot reliably (e.g. making each part come up cleanly when its dependencies eventually show up).

          1. Yet Another Anonymous coward Silver badge

            Re: Alternative Lesson: "Never turn anything off if..."

            Or just employ a chaotic monkey to run around unplugging things and turning stuff off at random

            1. Montreal Sean

              Re: Alternative Lesson: "Never turn anything off if..."

              Can I be the chaotic monkey? I just need a banana.

              1. Anonymous Custard
                Trollface

                Re: Alternative Lesson: "Never turn anything off if..."

                I think in our organisation, that's probably me. Although I prefer the (borrowed) title of Roving Agent of Chaos.

                I'm not sure if the organsation actually realises this though... ;-)

      3. Martin-73 Silver badge

        Re: Alternative Lesson: "Never turn anything off if..."

        Was absolutely terrified of this today, small 19" cabinet in a nursing home, doing the internet routing to the building, cctv, and medical records for residents, much of it mandatory for their insurance/license, said cabinet contained an ancient looking APC UPS. Our job (should we choose to accept it) was to feed more sockets off the circuit it was plugged into. Thus needed to kill the power to that circuit. There was another socket on the other side of the office, on a different circuit (different distribution board and different electrical intake even lol). I had the nerve wracking job of yanking the plug out, and slamming into an extension lead from the opposite socket. Imagine the HORROR when the (borrowed) extension lead arced and crackled while i wrestled to put the plug in. Fortunately the ageing UPS not only had a functioning battery, but withstood the barrage of arcing impeccably.

        They are however going to schedule a shutdown to allow us to replace the 12v batteries as it went down to 30% in that few seconds.

  2. lglethal Silver badge
    Facepalm

    Yep, Doug was definitely a candidate for Head of IT...

    1. Joe W Silver badge

      As the BOFH remarked when they pushed him to become an IT manager

      Boss: "a fool could do it!"

      Simon: "a fool generally does it"

      1. UCAP Silver badge

        Having once been the IT Manager for my previous company (and SME that really, really needed 100% uptime on its server resources) my reaction to that quotation is to curl up in a corner and whimper quietly to myself.

    2. Anonymous Coward
      Anonymous Coward

      I would say a Senior Head of IT.....

      Just put the initals of his new job title on the office door!

      1. Anonymous Coward Silver badge
        Joke

        Especially if Doug's middle name was Ian and surname Prentice (as two random names with the correct initials)

    3. This post has been deleted by its author

  3. John Sager

    Testing times ahead

    I expect one or two systems may get the Big Red Button test this winter but not by the one near the door in the server room!

  4. MJB7
    Thumb Down

    Boo! Very poor "Who? Me?" this week. "One side of mirror switched off; mirror does its job" ... and that's it?

    1. chivo243 Silver badge
      Coat

      You've just given Who? Me? the Syndrome Award. Lame! Lame! Lame! ??

    2. Anonymous Coward
      Anonymous Coward

      ""One side of mirror switched off; mirror does its job" ... and that's it?"

      Just a couple of years ago, we were moving an AS400 that was, in everyone's memory (and not on documentation), configured, $DEITY knows how, as a mirrored active/passive AS400 metro-cluster.

      Upon switching the passive side on, we realized the active side switched to everything Read-Only.

      We never had time to investigate this, but moved the passive AS400 as quickly as possible.

      But then again, this legacy system was to be decommissioned. Has been in this state for multiple decades :)

    3. Jou (Mxyzptlk) Silver badge

      Well, no. Not poor from my point of view. The "Who Me" part is to communicate before pushing a button.

    4. PRR Silver badge

      > Very poor "Who? Me?"

      I'd call that Newsworthy. It is a theory-breaking refutation of Murphy's Law.

  5. andy 103
    FAIL

    And if it isn't in the manual it ought not to happen.

    That suggests the manual is always correct.

    Which in the case of IBM manuals is almost certainly never the case.

    Sometimes you have to think outside the box. This is known as "experience" and "skill".

    1. Outski

      Re: And if it isn't in the manual it ought not to happen.

      Also known as knowing where to put the chalk-mark

      1. Montreal Sean

        Re: And if it isn't in the manual it ought not to happen.

        Chalkmark goes around the dead body of course!

    2. VeganVegan
      Headmaster

      Re: And if it isn't in the manual it ought not to happen.

      Sometimes the manual omits / hides useful information.

      I remember working on an IBM 1130 system at JPL (yes, it was that long ago; punched cards to load programs, big panel of blinking lights and toggle switches instead of a monitor, 8k of RAM…).

      The system flat out refused to run Assembly and Fortran at the same time.

      An enterprising system engineer I worked with discovered that hiding in the config byte-array at the head of RAM was a bit that enforced this. Changing that bit allowed Assembly and Fortran to run and call each other.

      We couldn’t be bothered to check if IBM offered an upgrade to allow both languages to run at the same time.

      Geezer icon —->

      1. Anonymous Coward
        Anonymous Coward

        Re: And if it isn't in the manual it ought not to happen.

        yes probably a several £,000 upgrade! to remove that bit

    3. TimMaher Silver badge
      Windows

      Re: IBM

      Hence the acronym “I Bring Manuals”

      Also known as “I’m Back Monday”, “It’s Being Mended” and many, many more.

      1. chivo243 Silver badge
        Coat

        Re: IBM

        I Be Milking it, for all it's worth!

        See ya Suckers!

      2. SU

        Re: IBM

        Idiots Become Managers

        I've Been Married

        I've Been Moved

        I Bring Many (to meetings)

        There are lists of them in IBM

        1. StudeJeff

          Re: IBM

          I used to be an IBMer and had a sign in my cube that said "Nunquam Permissium Opus, Interfere Per Plactium"... Internet Latin for "Never Let the Work Interfere With the Meetings"

          I figure if your going to be a smarta$$ at work it's best to do it in Latin. It's classier and no one knows what it means anyway (except I'd tell anyone who would listen).

    4. Anonymous Coward
      Anonymous Coward

      Re: And if it isn't in the manual it ought not to happen.

      When the manuals were referred to as "eyes only", my immediate thought was "brain not allowed". Which was then duly carried out by the IBMer Bob - "no, we can't do a test that simulates real-life (including startup), we MUST do it exactly the way the book says!"

  6. Pascal Monett Silver badge
    Trollface

    How to make an IBM engineer hyperventilate ?

    Easy : ask him how many years he's got until retirement.

    1. Yet Another Anonymous coward Silver badge

      Re: How to make an IBM engineer hyperventilate ?

      Truck question, if he can count with using his fingers he's already been RIFed

    2. Anonymous Coward
      Anonymous Coward

      Re: How to make an IBM engineer hyperventilate ?

      Nah.

      Just ask them to work a full week in the office. They weren't called TWaTs for nothing (Tuesday, Wednesday, and Thursday).

  7. ColinPa

    Why not use the backup generators.

    I've told this one before.

    A bank was hit with a power outage, and they went into the well practised fail over the backup site. A passing senior manager/director told them - don't do that - we have generators in the car park for this sort of emergency - use those and avoid the outage of switching sites. So they reluctantly restarted in place. Half way through the power on and restart, they found the generators did not have enough power for the machine room and so were stuck in limbo. They did not want to shutdown half way through an emergency restart, and they could not complete the start up to be able to shut it down. They had to wait a couple of hours before the power was restored, and they could complete the restart.

    When the incident was reviewed by the board, the manager/director had to explain it was his decision, and admit he did not actually know about the generators capacity, he just paid for them.

    The IT team learned a lesson - there are times when you ignore the management chain and do what you have practised.

    1. andy 103

      Re: Why not use the backup generators.

      The IT team learned a lesson - there are times when you ignore the management chain and do what you have practised.

      That only works in situations where the (unknown, future) outcome is a success though. If the IT team tried "anything" and it did not go according to plan it would 100% be seen as their fault irrespective of the circumstances that preceded it.

    2. Adrian 4

      Re: Why not use the backup generators.

      So he was paying for something that didn't work ?

      Sounds like a useful lesson was learned there, too.

    3. Jon 37

      Re: Why not use the backup generators.

      I'd write down on paper:

      """

      To whom it may concern,

      I am aware that there has been a power outage, and the power is still out.

      I have been told by IT that the documented, agreed, and tested procedure is to fail over the IT systems to the backup site.

      I have been told by IT that restarting the IT systems here, on generators, is not the documented or agreed procedure, and has not been tested.

      I am ordering IT to try to restart the IT systems here, on generators.

      Signed

      ___________

      <Name of senior Manager>

      Date: xx/xx/xx Time: xx:xx

      """

      Then I would give that piece of paper to the director and ask them to sign it to confirm the order.

      Any sane person would take one look at that and refuse to sign it, and let the IT people follow the plan.

      If the director is stupid enough to sign, then they get what they deserve.

      1. bishopkirk

        Re: Why not use the backup generators- which generators?

        A few years ago work stopped in my hospital because they realised all the copper to the generators had been stolen… probably weeks had passed before they found out but it put a certain pressure on finishing those operations on time…

  8. IanRS

    Many many years ago I was asked to create some tests for a file storage system. The intention was that a file could be moved between two storage areas, but you only ever saw it in one. The whole file could be accessed from 'A', or 'B', but you should never see it in both at once, nor should you ever be able to see a partial file anywhere. I specified: use a large file so you have a few seconds to act in, start the transfer, disconnect the network cable, wait a few seconds, check. No problem with that one. The second variation said start the transfer, disconnect the power lead from one side, then the network cable, repower but do not reconnect the network, wait for startup to complete and check. The project flat out refused to run this test. Considering this was a system intended for usage in combat areas (even though in staging posts rather than front-line), I did not consider it an unreasonable scenario. However, the manual stated that systems must always be shutdown down cleanly by following the specified procedures.

    1. Anonymous Custard
      Terminator

      Ample proof if needed that manuals and procedures have the same survival chance as battle-plans in such scenarios of contact with the enemy (or indeed friendly fire)?

      1. A.P. Veening Silver badge

        Friendly fire ... isn't.

        1. Anonymous Coward
          Anonymous Coward

          yes it is...

          ...more accurate than enemy fire usually.

  9. Uncle Slacky Silver badge
    Thumb Up

    I see what you did there

    "Doug...found himself in something of a hole"

  10. SonofRojBlake

    Doug and Bob?

    Are they Metropolitan police officers with a difference?

  11. Anonymous Coward
    Anonymous Coward

    *yawn* Another "idiot pushes a button" story.

    1. Anonymous Coward
      Anonymous Coward

      *yawn* Another "idiot pushes a button" story.

      You know, I *was* thinking about clicking "upvote", but now... :-)

  12. Jonathon Green

    Doug is not the problem here…

    …OK, so pressing The Big Red Button is not something to be taken lightly, or to be done without consideration of the consequences, but, given, that this is supposed to have been a resilient process if Bob doesn’t like the idea and can’t come up with a better reason than “the manual says not to do it, and there’s nothing in our procedures about it…” then someone ought to be giving him Hard Stares and asking him Awkward Questions, and other people ought to be answering Awkward questions about why Awkward Questions weren’t asked before Bob’s employer got the deal.

    1. Daedalus

      Re: Doug is not the problem here…

      Yes, well, if you'd care to put yourself in Bob's situation, you might think differently. After you've seen enough screwups, you assume that anything that can go wrong, will go wrong, and will do it at the most excruciating time.

      1. Anonymous Custard
        Trollface

        Re: Doug is not the problem here…

        And the advanced version being that things which cannot go wrong still will, just out of bloody-minded vindictiveness.

        Also at the worst possible moment, with the least amount of time and opportunity to recover or cope.

  13. Aladdin Sane
    Coat

    never turn anything off if you don't know how to turn it back on.

    And that, dear reader, is why my marriage failed.

    1. Yet Another Anonymous coward Silver badge

      Re: never turn anything off if you don't know how to turn it back on.

      Hope you had an off site backup

      1. Anonymous Coward
        Anonymous Coward

        Re: never turn anything off if you don't know how to turn it back on.

        Regularly test your off-site backup for malware infections.

        1. Benegesserict Cumbersomberbatch Silver badge

          Re: never turn anything off if you don't know how to turn it back on.

          Log to stderr

    2. Anonymous Coward
      Anonymous Coward

      Re: never turn anything off if you don't know how to turn it back on.

      Oh...

      Did you have a hardware failure?

  14. Anonymous Coward
    Anonymous Coward

    in a word YES

    whilst a 2nd year BTEC Electrical engineering student back in 89/90, I "accidentally" cut power to the entire college campus! Picture the scene about half a dozen spotty 18 year olds dispatched to the college main electrical switch room to make sketches of and label what we found in there. Any road up in we all went in to this quite small room armed with A4 binder, paper and pencils. Had a bit of a butchers around, ok that's obviously the main breaker for the campus, chuffing great BIG switch like a one armed bandit, hmmmmmm what's this small flippy thing on the side of it says I. Press, CLUNK as the massive handle descends, then an eerie quiet as EVERYTHING was off! This lasted a few seconds until broken by the sound of my mates saying oh sh1t quite loudly and the sound of doors opening and lectures coming out in to the corridor wondering what the feck was going on! Yes I had successfully found and identified the main breakers test switch!

    My mates, who were all fab and several are still in touch 30 years later, covered for me and said I knocked the test switch with my A4 binder. For a whole term after the college clock and bell was about 5mins off as that's how long it took to get the power back on and they didn't reset the clock hahahahahahahahahah

    1. TimMaher Silver badge
      Windows

      Re: “eerie quiet”

      Question “What is the worst sound in a server room?”

      Answer “No sound at all! “

      Copyright me about 1995

      1. Yet Another Anonymous coward Silver badge

        Re: “eerie quiet”

        >Question “What is the worst sound in a server room?”

        Velociraptor ?

    2. omz13

      Re: in a word YES

      No safety cover on the switch?!… that is just asking for it to be “accidentally” pressed

      1. Anonymous Coward
        Anonymous Coward

        Re: in a word YES

        nope and this was 30 years ago and the switch was probably a good 20 years old then!

  15. Flightmode

    About 15 years ago, a colleague and I went on site to a local PoP to install a third switch into an existing Cisco 3750 stack. The plan was simple - screw the switch in place, then just remove one of the two stacking cables between the existing ones, connect the third switch in-line, add a third cable, flip the breaker and configure the new ports. Simple, right?

    Only my colleague had a brief brain fart and flipped the breakers for the OTHER two switches, essentially shutting access to the whole PoP down - taking some thirty-odd-thousand customers connected downstream with them. And to add insult to injury - when the stack booted back up, the NEW switch became the stack master so all the port numbers were transposed and all the old ports had to be reconfigured. Via console. As both our phones were pinging like crazy trying to tell us that we'd taken down the whole site. Trust me, we knew. And that was they day we learned to always pre-configure the stack member serial numbers.

    (We still work together. And our colleagues still to this day remind us of that incident whenever we walk towards a Data Center together.)

    1. Anonymous Coward
      Anonymous Coward

      ex-colleague of mine working in the DC out of hours was configuring a port that he thought was just a std port but had mistyped and was configuring a totally different port, unfortunately it was also a trunk. Took the port down which then initiated a failover to the other DC, sh1t show!

  16. Giles C Silver badge

    First week at a new job

    Needed to get something off a shelf above the ups, this ups had an emergency off button that was about 1mm proud of the panel it was mounted to.

    Caught it with my knee and dropped the main comms room and a large as400. It took an hour for the as400 to come back up.

    I left the company 21 years later so it didn’t do any long term harm to my career.

    1. John Brown (no body) Silver badge

      Re: First week at a new job

      "I left the company 21 years later so it didn’t do any long term harm to my career."

      Revenge is a dish best served cold. But 21 years is taking it to bit of an extreme :-)

  17. Ashto5

    Fortune favours the damn lucky

    While being less senior than now, in a moment of madness I decided to see if I could rebuild the entire DB and Data from scripts.

    It took me a few days to get it working and I was then able to recreate from scratch the main system of the company feeling pretty good about it.

    I went home thought nothing of it until we came in the next day with the fire brigade lights flashing and the building shut, a water pipe had bust, and in the main server room a tsunami was occurring.

    The bosses were panicking about the business not surviving so when I mentioned I could rebuild the system in a matter of hours IF they got my desktop out.

    Later that day the system was back up and running slowly on my PC.

    Hero back slapping ensued then it struck me the data I had been working with was at least a week out of date, best not to ruin their day so I kept quiet.

    One of my Finest hours

  18. aerogems Silver badge
    FAIL

    IBM Should Give Him A Bonus

    Sounds like he managed to come up with a scenario not covered in their manuals, which absolutely should be. The "trigger happy" user who just goes around pushing buttons. If it happened once, it's bound to have happened before and would happen again. IBM should give the guy a bonus or something for finding that particular gap in their process.

  19. Anonymous Coward
    Anonymous Coward

    Sounds like VMware

    The DR failover to the backup datacente thing.

    Test everything EXCEPT the big failover button. Failback is not part of the spec ;-)

  20. hoofie

    I was that idiot

    A number of years ago on a new hospital build with lots of IT

    It was time for the power down test on a data center. A proper power test - severing of supplies etc to test data centre failover.

    Everyone in a room. For some reason Data Storage guys were adamant they needed a soft shutdown because anything else would knack the arrays.

    Idiot here popped up and pointed out that if they were going to be damaged they weren't a lot of use in a major power loss scenario.

    All hell breaks loose with lots of angry questions from the customer to the Storage specialists.

    At that point I made my excuses and left before I got lynched.

    1. Giles C Silver badge

      Re: I was that idiot

      Sounds a perfectly reasonable request to me.

      If the system is that sensitive then it needs its own way of reacting to a power down, either a local ups or a battery system to write the final cache entries to the disks before it dies. It should be part of the chassis or firmware that handles this.

      Otherwise it has there are things out of its control and must be able to deal with them.

  21. Stuart Castle Silver badge

    A few years ago, I was part of a small team workinging on an inventory management system. Because we couldn't find a commercial one that fitted our needs 100%, our manager initiated a project to design and build one.

    We did, and the final system worked well for years. We had the development and test servers set up properly, with us doing development on the development, testing it as rigidly as possibly with the resources available to us. One of my colleagues also had the responsibility to roll out updates to the production server. In theory, he was the only one with rights to do this. This was a deliberate choice to prevent accidental updates. In fact, the system was slightly bureaucratic, again deliberately, to prevent accidental updates.

    The system had been in use for some months, when all of a sudden, the users started reporting errors. My colleague and I started investigating. The third member of our team apparently had a meeting, so vanished.

    After about 15 minutes investigation, we had worked out what happened. The system had its own database on SQL server. It recorded everything the system did as a "transaction" and the transaction table had vanished.

    Upon further investigation, we found the transaction table had been renamed as a full stop. We found that the technician who had vanished for a meeting had renamed the file. Unfortunately, the version of SQL Server Management Studio wouldn't allow us to access the table after this, so we had to restore it from the previous night's backup. Thankfully, we only lost a couple of hours transactions.

  22. old-iron

    That DR plan is only as good as it's execution

    A major oil firm realised they had a power outage at "DC1" but UPS kicks in perfectly

    Process initiated to bring down systems carefully, as "DC2" is absolutely fine

    Systems/workloads closed satisfactorily, well in advance of the UPS dying

    ...

    Corporate meltdown then ensues as they'd shut down the systems in an orderly fashion in DC2

  23. Daedalus

    In related news....

    Bill Gates (for it is he) was so convinced that an install of the latest Windows was bulletproof that, while it was being demonstrated by his VP on stage and on live video across the planet, he pulled the plug on the test PC in the middle of the process.

    To say that the VP was "white faced" at that point is an understatement.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like