back to article Operation Desert Sh!tstorm: Routine test shoots down military's top-secret internets

Welcome to Friday. The weekend is almost upon us so put down that bacon sarnie and pick up today's On Call, The Register's weekly column of tales from the tipping point. Today's story comes from "Jason", who spent some quality time working as a local sysadmin "at one of the larger bases in Iraq" towards the end of the last …

  1. chivo243 Silver badge
    Pint

    I'm so glad we kept one!

    "I had argued for keeping at least one physical domain controller but to no avail."

    My boss was against virtualizing all of our estate. We still have one physical DC, we weathered a VSphere sit down because we kept one ;-}

    Have one on me Jason!

    1. Evil Harry

      Re: I'm so glad we kept one!

      "I had argued for keeping at least one physical domain controller but to no avail."

      We were always against putting our infrastructure monitoring platform in the virtual world as well for the same reason. Not being able to see what's actually down when the virtual infrastructure was having a rest never did seem like a good idea. It was a hell of a job getting management to reach the same conclusion.

      1. Anonymous Coward
        Anonymous Coward

        Re: I'm so glad we kept one!

        To be honest, there's nothing like actually triggering a disaster to educate the management as to why you really do know best and they do not.

        1. A.P. Veening Silver badge

          Re: I'm so glad we kept one!

          To be honest, there's nothing like actually triggering a disaster to educate the management as to why you really do know best and they do not.

          And if you do it correctly, you get rid of a lot of dead wood in management as well, collecting the reduction bonus as well ;)

          1. Anonymous Coward
            Anonymous Coward

            Re: I'm so glad we kept one!

            And if you do it correctly, you get rid of a lot of dead wood in management as well, collecting the reduction bonus as well

            BOFH, is that you?

            1. A.P. Veening Silver badge

              Re: I'm so glad we kept one!

              BOFH, is that you?

              I am a Programmer, not an Operator, but I've got my Bastard papers (though my parents were already married to each other before I was conceived ;) ).

          2. Anonymous Coward
            Anonymous Coward

            Re: I'm so glad we kept one!

            Clearly you don't know the US Federal ways... that is how promotions happen. If you keep the incompetent above the level of actual operations, things stay working.

    2. Anonymous Coward
      Anonymous Coward

      Re: I'm so glad we kept one!

      The problem with vSphere is it doesn't support automatic startup/shutdown of VM when ESXi nodes are powered on/off if hosts are in a vSphere HA cluster. Otherwise you can set some machines to start automatically, and in which order - I use exactly that to have DCs and other critical services to become available as soon as possible without user intervention.

      Still I prefer too to have at least one physical DC - even if it's small blade in a blade enclosure.

      1. Anonymous Coward
        Childcatcher

        Re: I'm so glad we kept one!

        True but there are mitigations. Don't let your entire cluster go offline (lol). Use DRS to pin a VM to a single host and set it to start automatically. No DRS? then don't move the VM. You can script all this stuff so that you can move a VM and do updates and then put it back again. Its a bit naff but there you go.

        Its not that vSphere doesn't allow you to use auto startup/shutdown in a cluster, it will forget the setting as soon as a VM moves away from the host you set it up on.

      2. K

        Re: I'm so glad we kept one!

        I've got a vSphere cluster at home (yeah, I that much of a geek), and whilst I've never tested this... The auto-start settings do migrate with a VM, you just need to ensure the Host had auto-start switched on!

        But as I said, I've not tested if the VMs actually fire up yet...

        1. CrazyOldCatMan Silver badge

          Re: I'm so glad we kept one!

          I've got a vSphere cluster at home (yeah, I that much of a geek)

          I use Proxmox (linux/KVM-based virtualisation server) at home and I can confirm that all the VMs marked as "start on boot" do actually start on boot..

          Having a UPS that been flashing a battery warning on one of the days where a severe thunderstorm knocked out the local power substation is not a good combination..

          Still, everything came up again and nothing was lost (apart from me nearly having a nervous breakdown!)

        2. Anonymous Coward
          Anonymous Coward

          Re: I'm so glad we kept one!

          From the vSphere product documentation for v6.7...

          "The Virtual Machine Startup and Shutdown (automatic startup) setting is disabled for all virtual machines residing on hosts that are in a vSphere HA cluster. Automatic startup is not supported with vSphere HA."

          From experience... Although you can sometimes go into a host's setting and configure startup/shutdown options, any settings you make for the automatic startup or shutdown of VMs will not work.

          We get around it by having customised local.sh script that run when a host boots, to start some VMs for us.

      3. Trixr

        Re: I'm so glad we kept one!

        I prefer just a simple pizza box - redundant power supplies, 2 CPUs in actual sockets, two sticks of RAM (no, I not want the 32GB module - 2 x 16 please - also, one single stick should be big enough to hold the AD database in memory), two disks in a mirror set, 2 NICs in an HA team connected to different switches. If you've only got one switch, plug in 2 cables anyway - the only thing it protects you against is a bad port, but sometimes Ports Go Bad.

        Sure, you need power and network, but those are the only dependencies. They're also easier to get into and fix things if hardware goes bad (rather than needing approval to do something in a blade enclosure that's also running your email and other critical systems - yes, you shouldn't need to go thru change control contortions to pop a blade and fix its hardware, welcome to the real world).

    3. Rich 11

      Re: I'm so glad we kept one!

      "I had argued for keeping at least one physical domain controller but to no avail."

      We were successful in making this argument, fortunately. It seems like such an obvious safety net that it's well worth implementing, and a surprise that it can be an uphill struggle to get agreement for it. It's almost like manglement believe the salesmen when they say, "Our proposal will replace all your estate and save you tens of thousands a year."

      1. Caffeinated Sponge

        Re: I'm so glad we kept one!

        Same way they believe the salesperson who sells them the wonderful new software without mentioning the HW resources and infrastructure required.

    4. NoneSuch Silver badge
      Coat

      Re: I'm so glad we kept one!

      Jason was probably let go for the failure and the guy who designed the crap system was promoted and given more power.

      That was my experience in the military anyway. Pub O'Clock.

      1. macjules

        Re: I'm so glad we kept one!

        That's my experience of any government system - the idiot gets a bonus and a promotion while the Poor Bloody Expert gets zilch.

        1. A.P. Veening Silver badge

          Re: I'm so glad we kept one!

          while the Poor Bloody Expert gets zilch fired.

          FTFY :(

          1. Doctor Syntax Silver badge

            Re: I'm so glad we kept one!

            Not necessary. The Poor Bloody Expert gets pissed off and leaves.

    5. StargateSg7

      Re: I'm so glad we kept one!

      This is ONE reason it is nice to be SOOOOO SMART that I can just modify Windows Server code directly and any VM management software AND their signing keys myself using Assembler and ENSURE they start and work EXACTLY as I feel like completely OBLITERATING any group policies or rights assignment! I just bypass them completely and put my own rights in!

      Nice to be able to do that! There isn't a system I haven't jail broken! Even Apple's iOS CHIP-based one!

      .

  2. Anonymous Coward
    Anonymous Coward

    >Ever found yourself at the sharp, pointy end of a routine test gone wrong or uttered the words "I warned you this would happen"

    <heavySlavicAccent>of course. I was electric engineer in Tchernoby. I argue for ever that vodka supply should be limited on the days we tun test</heavySlavicAccent>

    1. Chris King
      Mushroom

      Was this you, Sergey ?

      (From "Star Wreck: In The Pirkinning" - icon, because he was feeling cold)

      1. bpfh
        Pint

        Bur he was using light beer

      2. Kiwi
        Pint

        I'd forgotten that! Thanks!

        Nearly finished watching B5 with some friends who hadn't yet been introduced... Something to finish with...

        Amazing how some jokes can carry across language barriers...

        1. Chris King
          Coat

          "Something to finish with..."

          Or even Finnish with ?

          Icon's a no-brainer...

  3. ColinPa

    Recovering after loss of power - paper bootstrap.

    Someone told me of a similar problem. These guys had a document describing exactly what to do, (start the generators....) and it had been well tested. However this document was stored on a networked file system, and yes, they needed the NFS and infrastructure up to be up to be able to get to the document to tell them how to start the infrastructure. They learned to keep a print-out in the desk.

    1. Anonymous Coward
      Anonymous Coward

      Re: Recovering after loss of power - paper bootstrap.

      Reminds me of an outage of the secure system for storing passwords, that could only be booted using a secure card, that was stored in the safe, whose combination was in the secure system for storing passwords... Oops.

      1. Christoph

        Re: Recovering after loss of power - paper bootstrap.

        As detailed here

      2. Mark 85

        Re: Recovering after loss of power - paper bootstrap.

        Murphy does teach some hard lessons.

      3. YetAnotherLocksmith Silver badge

        Re: Recovering after loss of power - paper bootstrap.

        You should've programmed RealLife 1.0 in something that checks for circular dependancy at compile, rather than in real time.

        Fortunately, that safe can be opened.

    2. Struggling to find a different name
      Facepalm

      Re: Recovering after loss of power - paper bootstrap.

      Reminds me of the time I was conducting an audit of an offshore oil production platform. The company had introduced a document control policy whereby the only printed documents allowed were those issued as part of a formal work pack and permit (and valid for the day of issue only). I advised the OIM (Offshore Installation Manager) that this didn’t work for daily/routine operations (that were the focus of my visit) which didn’t require a specific work permit. But he was unmoved - that was company policy.

      When I’m the control room I asked how they recovered from a power outage. “We start up from the recovery checklist,” was the response. You’ve guessed: that was reached via a desktop PC - the document itself being stored on the network. Neither PC nor network access was hooked to the battery backup as that needed to be devoted to maintaining platform safety. Then a confession - there was a printed copy in the desk drawer. My audit finding didn’t go down well but my case for a policy change got the OIM’s support.

      1. Doctor Syntax Silver badge

        Re: Recovering after loss of power - paper bootstrap.

        Great first post.

      2. MonkeyCee

        Re: Recovering after loss of power - paper bootstrap.

        "My audit finding didn’t go down well but my case for a policy change got the OIM’s support."

        I thought that was the exact point of such an audit.

        Bad policy is made. In order for this to be changed either management listen to common sense (if they didn't like the report, they aren't keen on being told they are wrong) or you prove this is a bad policy because it is being circumvented.

        It's like security. If you make it too inconvenient relative to the protection needed, people will just find a way around it.

        1. vtcodger Silver badge

          Re: Recovering after loss of power - paper bootstrap.

          people will just find a way around [security].

          Or, worse, they won't -- in which case, you are secure, but dead in the water.

        2. Anonymous Coward
          Anonymous Coward

          Re: Recovering after loss of power - paper bootstrap.

          Audits generally don't find bad procedures. They verify you are following procedures (good, bad, idiotic, etc.). A good auditor in the right environment may be able to suggest process improvements, but in general they are there just to verify the processes are being followed.

          1. taxythingy

            Re: Recovering after loss of power - paper bootstrap.

            I've found the key with audits is to be good enough in the first place. We can then spend more time talking with the auditors and learning from their experience.

    3. Anonymous Coward
      Anonymous Coward

      Re: Recovering after loss of power - paper bootstrap.

      "They learned to keep a print-out in the desk."

      The corollary of this story is the site that has a paper print out of instructions for how to start things up in the correct order where the equipment has changed significantly and the order no longer applies.

      A previous employer had a two hour startup procedure that was used for annual building maintenance and very complex (down to power supplies per devices level).

      Through chance (a power failure that took many hours to restore, the "regular power" generator starting, but the "critical systems" generator failing to start, and the "critical systems" UPS only having around 8 hours capacity) we got to test how quickly we could power up and check systems during the working day.

      It turns out 90% of the order for equipment startup had been corrected by either software/hardware updates or configuration changes or was a case of operational mistakes in the past. Total startup time was ~30 mins including all testing and monitoring systems recovering fully.

    4. Anonymous Coward
      Anonymous Coward

      Re: Recovering after loss of power - paper bootstrap.

      I once witnessed a fire marshal, while the building was actually on fire,, go to Sharepoint to find the document on what to do about it.

      That turned out to be to raise a ticket to Facilities. Who to be fair to them did meet their SLA on that day...

      1. Wexford

        Re: Recovering after loss of power - paper bootstrap.

        Subject: Fire. "Dear Sir stroke Madam, I am writing to inform you of a fire which has broken out at the premises of..." No, that's too formal. [deletes] "Dear Sir stroke Madam. Fire, exclamation mark. Fire, exclamation mark. Help me, exclamation mark. 123 Clarendon Road. Looking forward to hearing from you. All the best, Maurice Moss."

    5. JulieM Silver badge

      Re: Recovering after loss of power - paper bootstrap.

      Shades of Del Boy's combination lock briefcases, with the combination printed on a slip of paper inside each one .....

    6. Angry IT Monkey
      FAIL

      Re: Recovering after loss of power - paper bootstrap.

      That only works if they get followed.

      Early in my career my boss insisted I create, print and laminate procedures for critical systems for the new guy (moved from office temp to IT support thanks to an embellished CV) to follow when I wasn't around.

      One day I come back from buying lunch to office panic.

      Boss: (dirty look) The main ISP went down 20 mins ago, why isn't there a procedure to switch to the backup?

      Me: (calmly points to laminated procedure pinned to the server cabinet in clear line of sight 6 feet away) You mean like that one?

      Boss: Oh. Yeah. Didn't see that.

      Me: That you asked me to create and print out?

      Boss: Yeah, I didn't..

      Me: That you had (non-technical new guy) laminate and pin to the server cabinet?

      Boss: We didn't think to look there...

      Me: So between the 2 of you it didn't occur to look at the wall of procedures 1 of you told me to create and the other physically pinned up there? You've spent 20 mins doing what?

      Boss: (to new guy) Let's follow this procedure as far as we can and use this as an opportunity to improve it.

      I then calmly sit at my desk to finish lunch while Tweedle Dumb & Tweedle Dumber work through the instructions. They fix the problem quickly then Boss decides to pick it apart with awkward "what if" questions. The answer to each was "It's in the procedure, read all of it"

      Luckily years of family tech support taught me to write for non-technical folks. The new guy had a post-it with the AD admin password on his monitor and the manager couldn't see why that was a problem in an office with high staff traffic, so not exactly IT professionals.

      My 2nd best day working there, runner up to the day I left :)

      1. A.P. Veening Silver badge

        Re: Recovering after loss of power - paper bootstrap.

        My 2nd best day working there, runner up to the day I left :)

        Let me guess, your boss was not well pleased that you left.

        1. Angry IT Monkey
          Pint

          Re: Recovering after loss of power - paper bootstrap.

          Yeah he was off sick my last day to avoid the whole "All the best, here's something we pitched in to show our appreciation" malarkey. Did get presents off the cleaners and HR manager tho, cleaners also brought tea / coffee 3 times a day so IMHO were *the* most important part of the business :)

          Heard on the grapevine he blamed me for a ransomware attack months after I left. He convinced the higher-ups he'd phoned me and I'd refused to give him the unlock code so they'd lost all their data. The backups had failed for ages but the procedure to check them daily (you guessed it, pinned to the server cabinet) hadn't been followed since I left.

          They're still in business, IT manager was demoted after his boss retired (hmm...) and support is contracted out to a company once owned by a drinking buddy of the MD. Useless new guy is still there and widely regarded as useless.

          Icon - to those who've survived IT Hell, to those yet to escape IT Hell, or for the Hell of it!

      2. Anonymous Coward
        Anonymous Coward

        Re: Recovering after loss of power - paper bootstrap.

        "then Boss decides to pick it apart with awkward "what if" questions."

        BTDT, people always do this when they've been shown up.

        > The answer to each was "It's in the procedure, read all of it"

        My response tends to be (quite deliberately) more strongly worded and loud enough for other people to hear, in order to make them feel embarrassed enough to actually READ the fucking things.

        Anon, because I have to work with them.

        1. Angry IT Monkey

          Re: Recovering after loss of power - paper bootstrap.

          By that point I'd probably resigned myself to just getting out, all the previous "robust explanations" clearly had no effect.

          Being heard over the industrial machinery would take too much effort and most people knew what he was like anyway.

      3. Anonymous South African Coward Bronze badge
        Trollface

        Re: Recovering after loss of power - paper bootstrap.

        Luckily years of family tech support taught me to write for non-technical folks. The new guy had a post-it with the AD admin password on his monitor and the manager couldn't see why that was a problem in an office with high staff traffic, so not exactly IT professionals.

        Bonus points for creating a GPO that will :

        1. Clear all server logs

        2. Prevent access to the GPO editor

        3. Institute a 2-day password policy with seriously complex requirements

        4. Change all desktop themes to the Hotdog Stand theme from Windows 3.1

        5. And reboot the servers every 15 minutes.

        We need a BOFH icon. Seriously.

  4. Secta_Protecta

    Common Trap

    I've seen the same thing when an HP EVA went titsup, bringing down VMware including all DCs; An engineer ended up having to go to the DC and logging into the HP CommandView server with a local account before starting the laborious task of getting the EVA back up with an HP Engineer on the phone. That was a long day...

  5. Anonymous Coward
    Anonymous Coward

    don't wait 20 mins !

    "The batteries abruptly died, taking everything down with them."

    Of course, they have to run out of juice someday, and he didn't know for how long they could keep up.

    The sensible thing he should have done was put all the infra down except the critical services, but immediately after seeing the situation was uncontrolled, not wait 20 mins and pray ...

    This way, he may have had enough running time to allow for the power to be restored.

    And yes, whoever thought it would be a good idea for the local team to not have Vsphere access was a complete retard.

    1. Anonymous Coward Silver badge
      Boffin

      Re: don't wait 20 mins !

      A bit of sideways thinking would've seen a few vehicles hooked up to those batteries to extend their runtime.

      Basically a diesel engine and alternator is the same whether it's in a generator or in a car... different ratings and output level, but if you're running off car batteries then using a car to charge them is not too far fetched!

      1. Doctor Syntax Silver badge

        Re: don't wait 20 mins !

        The batteries might have been connected in a series/parallel arrangement to deliver more than 12v. In that case you also have to arrange the right number of vehicles in series as well.

        1. vtcodger Silver badge

          Re: don't wait 20 mins !

          And it might be a really super idea for someone who knows exactly what they are doing test that battery charge plan ahead of time. Trying to improvise under pressure has a certain potential for turning a potential problem into two current problems -- dead battery backup plus some amount of damaged batteries and equipment.

          1. Anonymous Coward
            Anonymous Coward

            Re: don't wait 20 mins !

            .. plus all the lovely fumes from both all those vehicles in series (THAT I want to see :) ) and the not-so-innocent vapour from batteries being charged.

            Yeah, testing beforehand may be a slightly better approach.

            1. Kiwi
              Boffin

              Re: don't wait 20 mins !

              .. plus all the lovely fumes from both all those vehicles in series (THAT I want to see :) ) and the not-so-innocent vapour from batteries being charged.

              Not been to any carpark major highway in recent times? Or a sports ground/large shopping mall at quitting time?

              1. Anonymous Coward
                Anonymous Coward

                Re: don't wait 20 mins !

                Fewer battery vapours there..

                1. Kiwi
                  Boffin

                  Re: don't wait 20 mins !

                  Lots and lots of cars idling, many trying to top off their battery after just starting, few even close to normal operating temps..

                  I'd suggest a few in the desert adding some time to a battery bank would produce far less fuel fumes or hydrogen vapour than the equivalent area of a motorway/main road at peak times :)

            2. Doctor Syntax Silver badge

              Re: don't wait 20 mins !

              "Yeah, testing beforehand may be a slightly better approach."

              Testing before hand is often regarded as optional and seldom has a budget. Recovering from the problem usually has one allocated immediately, just hope there's enough in the kitty to provide it.

            3. Alan Brown Silver badge

              Re: don't wait 20 mins !

              " and the not-so-innocent vapour from batteries being charged."

              _That_ only happens under 2 conditions:

              1: You're putting too much current into them (unlikely as static batteries are usually far larger sized than vehicles ones - and in any case solved by reducing RPM)

              2: The batteries are fully charged and above float voltage - in which case the vehicle would have been regularly shagging its batteries anyway.

              You don't need 500A jumper cables for this kind of scenario and they're a liability due to their current capabilties. Normal cables and a _VERY LARGE_ (as large wattage as you can get) 12-24V lightbulb in series work better.

        2. Gaius

          Re: don't wait 20 mins !

          In that case you also have to arrange the right number of vehicles in series as well.

          They could have used a tank regiment, on a military base!

      2. ma1010

        Re: don't wait 20 mins !

        I am a ham radio guy and do training for emergency communications. Virtually all our radios run on 12 volts, and I have told folks for years that, no matter where you are, you have a 12 volt generator set available if you have a vehicle with fuel.

        My own "box 'o radios" has a power gate with two inputs: a 120 V in / 12 V out power supply powered by either mains or a 120 V generator and a cable to my car battery. The power from it goes to the power bus for all the equipment. That way, even if the 120 V generator we're using in the field stops, I still have power. If the outage is extended, I can start my vehicle and use its alternator to keep the car battery charged. Simple and works very well.

        1. Orv Silver badge

          Re: don't wait 20 mins !

          Yup. Although you would be shocked how long it takes to charge a car battery at idle. Alternators do not put out anywhere close to full rated power at idle, and they taper off current rapidly as the battery charges. In my camper my experience is going from 50% to 80% charge on a 40 Ah battery takes close to an hour of idling. Getting that last 20% takes a solid two hours of driving at highway speeds.

        2. Dave 32
          Coat

          Re: don't wait 20 mins !

          One issue is that a lot of automobiles get quite unhappy if left idling for an extended period of time in hot weather. They tend to overheat. It's a lot easier on them if they're moving, such that the airflow from the movement can help keep the engine cool. Now, that may not apply to military vehicles, which are designed to operate in extreme environments. Maybe.

          The other issue is that automotive alternators tend not to produce much power at idle speeds. The situation is not as bad as it used to be with generators, but, to get the rated capacity out of an alternator, it may be necessary to run a gasoline engine at 1500 RPM, rather than at the 500 RPM idle speed. Diesel engines may be different, though, since Diesel engines typically operate at lower speeds, and the alternators would, presumably, be geared differently (via the pulleys/belts which drive them).

          Dave

          P.S. You'd better have a fairly long set of very heavy gauge cables to connect the vehicle up to the batteries. Even a 1000 Watt UPS is going to be pulling over 83 Amps from a 12 Volt battery! Also, don't forget that a lot of vehicle alternators are only rated at 100 Amps or so.

          P.S. I'll get my coat. It's the one with the 4/0 jumper cables in the pocket.

          1. katrinab Silver badge

            Re: don't wait 20 mins !

            Turn the heating on full blast and open the windows? Does that still work to keep the engine cool?

          2. 404

            Re: don't wait 20 mins !

            US military uses 24v on their vehicles - have to step it down perhaps?

            1. Kiwi
              Pint

              Re: don't wait 20 mins !

              Charge 2 batteries in series?

              (Not if you're keeping them hooked up to the backup system, but if your bank is large enough to let you split off bits...)

              That said, if they use 24v and the backup batteries are old vehicle batteries...

          3. Orv Silver badge

            Re: don't wait 20 mins !

            Commercial service vehicles often have a "high idle" switch for this reason (and heavy-duty cooling systems to match). It's still a very inefficient way to charge batteries; in the long run a separate generator with a more appropriately-sized engine will pay for itself, if you're doing this sort of thing regularly. But then you have another engine to maintain...

    2. Marshalltown
      Pint

      Re: don't wait 20 mins !

      "...And yes, whoever thought it would be a good idea for the local team to not have Vsphere access was a complete retard...."

      Government SOP where "sensitive systems" are concerned. Keep the authorized personnel on different continent, in a different time zone, without 24 hour coverage.

      1. Mark 85

        Re: don't wait 20 mins !

        Government SOP where "sensitive systems" are concerned. Keep the authorized personnel on different continent, in a different time zone, without 24 hour coverage.

        I'm surprised the person who had the password wasn't required to fly to the site. Seems to be the way government usually works.

    3. JimboSmith Silver badge

      A friend told me of the time three backup generators at his dad's workplace fell over. The reasons were from memory that first was officially out for maintenance, the second developed a fault shortly after starting up, the third ran out of diesel. Now that might be embarrassing for any company but it's a bit worse when you're a large multinational oil firm.

      A broadcasting firm I worked for was getting ready for Y2K. We'd been told that whatever happens you will keep broadcasting into the millennium. This was I believe to help stop people from panicking in the event that things elsewhere went TITSUP thanks to the bug. So generator ups at the ready at 8pm we told any remaining people in the offices to save their work and shut down their computers. I was monitoring one floor so not at the generator or UPS. The idea was that the studios would retain power as would everything needed to broadcast that was not in a studio. Strangely what was supposed to be a 15-30 min test lasted only 2-3. The reason was that the UPS had worked as expected but the generators hadn't been automatically switched over to. I was away the day of the post mortem so can't tell you exactly where the fault was. Everything was fixed, worked as expected on the next test and thankfully unneeded on December 31st/January 1st.

      1. Anonymous Coward
        Joke

        > I was away the day of the post mortem so can't tell you exactly where the fault was.

        We blamed you. :-)

        1. JimboSmith Silver badge

          You're closer to the facts than you know although with a different team. I worked somewhere where we had a small team and the name of everyone went into a jiffy bag (we didn't have a suitable hat). Then when a cock-up occurred and blame wasn't easy to attribute a name was pulled out of the bag and if it was your name it was your fault. It didn't matter if you were having the day off when it happened still your fault and you fixed it. It worked well because it made everyone potentially responsible and other people knew it probably wasn't actually your fault despite the fact that you were fixing it. Not sure HR approved though.

          1. A.P. Veening Silver badge

            Not sure HR approved though.

            Of course HR didn't approve, their names also went in the bag/hat, but twice because HR ;)

          2. OurAl

            The sign on my office wall... " I never said it was your fault, I said I was blaming you "

      2. Orv Silver badge

        My mom worked at a hospital where the generator failed during an outage because the bottom of the diesel tank was full of sludge. The sludge clogged the filter and that was it. They tested infrequently, and when they did only for a few minutes at a time, which was insufficient to keep the fuel fresh.

        1. Anonymous Coward
          Anonymous Coward

          Ah yes, the expiry/unmix date of fuel. Not noticed in petrol stations because of the rapid turnaround, very much in play with stationary long term storage and boy, is that sludge a pain.

          I'm in the process of planning a number of new offices, and two of them need power resilience - thanks for the process reminder..

          1. Orv Silver badge

            I knew someone who had a backup generator for his house that was powered by natural gas. Might make sense if the natural disasters in your area aren't the type to interrupt gas service. I mean, it's probably a good idea for snowstorms, windstorms, etc., probably not such a good idea if the threat is earthquakes. (He was in Michigan, so no worries there.)

            At home I have a portable emergency generator that runs on propane from standard 20-pound BBQ tanks. Unlike gasoline or diesel it won't go bad in storage. Also, gasoline is impossible to buy if there's no power to the pumps, but places around here sell pre-filled propane tanks on an exchange basis.

        2. SImon Hobson Bronze badge

          Yes, that's a real problem with standby generators.

          However, there is a good way to reduce the problem. There are a couple of outfits that will pool your generator capacity and sell it to the grid as STOR (Short Term Operating Reserve) - ie stuff they can call on when there's either an unplanned loss of a large generation plant, or a sudden spike in demand, or even to work over the "half time and the kettles go on" and there's limited spare capacity.

          You get paid just to have the facility available, and get paid more when it's used. But when it is used, it gets to turn over your fuel supply.

          In addition, the mods required to allow you to run the generator and export power means that when you do run your own tests, you can properly load up the generator by tweaking it to try and produce a slightly higher voltage/frequency than the grid - and thus load up the generator to full power. Without this parallel operation, it's difficult to properly test the genny, and manglement are generally reluctant to spend the extra money over a simple "switch over with break in supply" arrangement.

      3. tinman

        we had an outage at one of our major hospitals when the mains power supply went down after the supplier lost a transformer external to the site. No generator you ask? Oh yes, estates had those and tested regularly, but only under a brown-out condition with the mains still available and not a black test with no external power.

        And that's when they discovered that the battery packs necessary to kickstart the gennies were all expired so they couldn't start automatically. It took fifteen minutes to get someone in to restart them manually but you can bet the testing SOP was rewritten very quickly afterwards

      4. Alan Brown Silver badge

        "Strangely what was supposed to be a 15-30 min test lasted only 2-3"

        Funnily enough, a Telecom NZ had the opposite problem whilst preparing for y2k - a 15-20 minute test resulted in a 20 hour outage for 20,000 customers and 6-8 weeks of chaos as several years worth of database logs of number changes were replayed.

        Yes, they'd been backing up just fine - but they were backing up corrupted data and noone had noticed for a _very_ long time.

        Sometimes the UPS testing and recovery procedures work just fine - but the anciliary activities (like a "precautionary restart" of critical equipment) are where you find your shoelaces have not only been tied together but someone superglued your feet to the floor too.

  6. diver_dave

    During the Manchester BT fire one of my colleagues gatecrashed a DR recovery meeting to TRY and point out that just because our Stockport office could call it's own lines didn't necessarily mean that the whole system was back up. Mentioning that if the whole system was available we should be able to call BETWEEN offices.

    He was reprimanded for being negative.

    Ah., Joy...

    Pints all

    Good weekend.

    Dave

    1. ArrZarr Silver badge
      Headmaster

      I really hope that DR stands for something other than Disaster Recovery in that context.

      1. diver_dave

        Well..

        The day it started I was first in at 7. By the time anyone from the DR team was in I had already arranged the Linklines to be pointed at another office. That a communication went out internally and recorded a temp message for the top level IVR to explain to customers any delays. I'd also arranged a single temp line by virtue of dropping a cable out of the window to Otis who had the floor below and were on a dedicated C&W system. No idea why they were still running as I thought they used the BT backbone. System D in action.

        Fortunately we were running a beta of Sametime so had some additional real time comms

        Head of DR rolled in at 9:30. I paged them at 7:10.

        My colleague and I came out of it quite well despite steamrollering the DR process. DR did not exactly cover themselves with glory that week.

        Dave

    2. Doctor Syntax Silver badge

      "He was reprimanded for being negative."

      BT management at its most typical.

      However, being in a meeting kept them safely out of the way.

      1. Mark 85

        However, being in a meeting kept them safely out of the way.

        Only if you nail the doors shut and turn off the phones in the room.

        1. Doctor Syntax Silver badge

          "Only if you nail the doors shut and turn off the phones in the room."

          No problem about the latter. It was the phone building on fire.

        2. the Jim bloke
          Pirate

          You forgot about sealing all the air vents

          1. A.P. Veening Silver badge

            You forgot about sealing all the air vents

            With proper construction those air vents are smoke inlet vents in the board room, but you need a Bastard Architect From Hell to get it working properly.

            1. Alan Brown Silver badge

              "those air vents are smoke inlet vents in the board room"

              No need for that, just lock the doors and ensure that the inert gas fire suppression system has an outlet in there.... After all, smoke damage is hell to clean up.

              And it's in there because they saw all that space being used for computers and decided they'd have it for the boardroom - relegating all the servers (and IT staff) to an unventilated space situated in the basement under the toilets.

    3. Orv Silver badge

      Next you'll tell me 'ping localhost' doesn't tell me if the network is up. ;)

      1. diver_dave

        Oh.....

        You are so much closer than you think!

  7. Anonymous South African Coward Bronze badge

    We also do virtualize our stuff, but on MS HyperV

    The HyperV host that is hosting the domain controller is not part of a domain, it is a standalone host. Just because. And I don't trust things 100%. Because Mr Murphy.

    Gotta love Mr Murphy.

    I decided on that because a DC in a VM is easier to move over to another host than transferring a physical DC from one server to another.

    1. Mr Humbug

      I am confused

      > because a DC in a VM is easier to move over to another host

      But don't the hosts have to be members of the domain in order to move a VM between them? Or would you just copy the virtual disk file(s) and create a new VM to use them?

      My Hyper-V hosts are members of the domain, but they will start up OK, allow you to login (with cached credentials) and start virtual machines without a DC being available

      1. mr_souter_Working

        Re: I am confused

        "But don't the hosts have to be members of the domain in order to move a VM between them?"

        no - it takes some work, but you can get a Hyper-V cluster running with standalone (workgroup) servers - and you can have servers move between them - provided you have decent shared storage.

      2. Anonymous South African Coward Bronze badge

        Re: I am confused

        But don't the hosts have to be members of the domain in order to move a VM between them? Or would you just copy the virtual disk file(s) and create a new VM to use them?

        No. As long as you can do any of the following :

        - Restore from a good backup

        - Move the VM over to another host (said host also doesn't need to be part of the domain, just a standalone host)

        and the VM starts up fine without any errors, you're good to go.

        All of the backup DC's can be on hosts joined to a domain, I'm more concerned about the primary domain controller, for with it you have the keys to the kingdom...

        You can also move VM's between standalone hosts, but it takes a bit more effort. (See the previous poster's post above mine).

        Replication between standalone hosts is a major PITA to set up, but replication between hosts on the same domain is a piece of cake.

        But should the worst come to the worst, and you cannot get the host up and running, but can access the storage and copy the VM's VHD off, then you can just copy this to another host, set up a new VM with the existing VM's virtual HDD and Bob's your uncle.

  8. Steve Davies 3 Silver badge

    Bloody Sand gets everywhere

    Even into a really clean DC as I found out in Riyadh.

    By everywhere, I mean everywhere including inside one of those big RED Button switches that you find inside most DC's.

    During a test, the button was pushed and nothing happened. Only after the sixth press did the system start failing over.

    Said switch was removed after the test and found that it was half full of sand.

    further investigations revealed that it had been installed while the room was still a building site and naturally you don't think to clean inside a switch when making the room clean now do you?

    The Backup DC had exactly the same problem when the insides of the 'big Red Button' switch was examined the following week.

    This wasn't on some super secret network but just the ATM system for a major bank.

    1. Alan Brown Silver badge

      Re: Bloody Sand gets everywhere

      "Said switch was removed after the test and found that it was half full of sand."

      Between that and the switch at the backup DC, one assume that ALL the electrics were inspected and/or replaced on spec after that?

  9. ibmalone
    Coat

    Baghdad battery

    Couldn't they have popped out for some?

  10. Anonymous Coward
    Anonymous Coward

    There's an obvious moral

    If you don't have sufficient know-how and experience in the right heads and the right place at the right time, spending more money on kit is likely to make matters much, much worse.

    Which accounts for most things we have observed about the Pentagon and the British MoD.

  11. coconuthead

    rows of car batteries baking in the 48° heat

    It shouldn't be any worse for a car battery to be under shade in 48° heat than in a car parked in 48° heat, which happens all the time. In fact one strongy suspects car batteries are designed to handle 48° heat and much worse.

    1. Loyal Commenter Silver badge
      Boffin

      Re: rows of car batteries baking in the 48° heat

      Assuming the electrolyte is kept topped up, batteries actually work better if they are warmer, as the electrochemical reactions go faster if warmer. A case in point is that in cold environments, dead torch batteries can often be brought back to life for a short period by popping them under your armpit for a while.

      Goggles, because you'd be a fool not to wear them when topping up the elecytrolyte in lead-acid batteries

      1. A.P. Veening Silver badge

        Re: rows of car batteries baking in the 48° heat

        In cold environments (nearly) dead car batteries can sometimes be brought back to life by turning the lights on, causing enough heating in the battery to thaw it out.

        1. ricardian

          Re: rows of car batteries baking in the 48° heat

          My friend had a Skoda umpteen years ago. The handbook recommended turning on the headlights for a few minutes before trying to start the engine when temperatures were below -10C

        2. Alan Brown Silver badge

          Re: rows of car batteries baking in the 48° heat

          "In cold environments..."

          Which is why the owner's manual of the Lada Niva rather famously recommends that drivers do this for several minutes before attempting to start the car if the ambient temperature is below -40C

      2. Orv Silver badge

        Re: rows of car batteries baking in the 48° heat

        They hold more charge at higher temperatures, but battery *life* is reduced by heat. Which may have been relevant here. The charging voltage also goes down with temperature, so if your controller doesn't have temperature compensation you may end up overcharging and boiling off electrolyte. (Automotive voltage regulators usually have temperature compensation, but chargers intended for batteries in indoor environments might or might not.)

        1. Anonymous Coward
          Anonymous Coward

          Re: rows of car batteries baking in the 48° heat

          Chargers for large banks of lead acid cells should use 3 or 4-step charging, in which the main charge is done at constant current until the voltage starts to rise, and then charging reverts to a periodic anti-sulfation kick until the voltage drops to the point of restarting the charging cycle.

          My charger is now about 10 years old and still going strong, they are a worthwhile investment.

      3. Stevie

        Re: rows of car batteries baking in the 48° heat

        "Goggles, because you'd be a fool not to wear them when topping up the elecytrolyte in lead-acid batteries"

        Ah, the wisdom of yoofs.

        Lead Acid batteries were - when one could "top them up" as opposed to the sealed units one sees these days - topped up with distilled water. One only needed to add sulphuric acid on very rare occasions when called for by hydrometer readings. They didn't spit during these toppings-up.

        Honestly, the young people today seem to treat chemistry like everything is Chernobyl Reactor #1 after the lid came off. I blame the EC nanny-state regulations.

        But wear the goggles for the steampunk ambiance by all means. Add a multi-lens loupe a-la Frank Thring in Mad Max Beyond The Thunderdome for extra credit.

        1. Anonymous Coward
          Anonymous Coward

          Re: rows of car batteries baking in the 48° heat

          back in highschool i had a weekend job for a truckie who delivered to one of the local tyre and battery retailers. Back then car batteries didnt have fancy handles built in, so you picked them up by squeezing them tightly by the sides. This resulted in droplets of acid jetting out the vent holes in the caps, stinging flesh abraded by the sharp edges of the plastic molding, and causing my jeans to develop frayed holes long before such became fashionable.

          Unloading the tyres was fun, though.

          1. Stevie

            Re: rows of car batteries baking in the 48° heat

            And I had a 7x5 inch hole rot in he side of my jeans leg a couple of weeks after splashing them with copper sulphate solution so weak you could probably have drunk it without serious harm. Your point is?

            Adding a few glugs of distilled water to a battery was a common thing when I was young and goggles were not needed to guard against being hosed down with molar acid due to Hollywood Physics.

            1. Muscleguy
              Boffin

              Re: rows of car batteries baking in the 48° heat

              In 2nd year undergrad Biochem lab I aspirated 1M NaOH into my mouth because we were using old fashioned mouth pipettes. After my accident the proper pipettemen came out for the rest of the lab.

              Then in Physiology respiratory lab we were doing different mixtures, hypercapnia (O2 stays constant CO2 rises), hypoxia (run by medically qualified member of staff) and asphyxia (rebreathe a volume of air.

              I was sitting on a stool with the valve in my mouth doing the asphyxia. My partner watching and periodically asking me if I was alright and I would nod. He became concerned when I started making 'seal noises' and switched me to room air. A while later I came to again. Apparently I kept nodding (reflexes are wonderful and can be unconscious).

              After that the rule became you had to give a thumbs up in response to the 'are you alright?' question.

              I managed a higher CO2 level than I had in hypercapnia. I kept the traces for some years. They may still be in a box in the attic but I suspect not.

              I survived my undergrad courses and even managed a PhD in that Physiology dept. When demonstrating that lab I made a point about the thumbs up.

              There needs to be a guinea pig icon.

              1. Stevie

                Re: rows of car batteries baking in the 48° heat

                Hmm. During my A levels I got a mouth full of Oxalic Acid from using one of those mouth pipettes. I spat and rinsed immediately but my teeth had the Coca-Cola "chalky" feeling for months afterwards.

                Definite thumb up for Guinea Pig icon.

                1. A.P. Veening Silver badge

                  Re: rows of car batteries baking in the 48° heat

                  With anything more unpleasant than water, we used balloons (study pharmacy at Rijks Universiteit Utrecht).

        2. DropBear

          Re: rows of car batteries baking in the 48° heat

          Nothing "could" about my bike's battery - it's very much a standard lead-acid unit, complete with a long breather tube and adorable little caps you definitely do need to add water through if the level misbehaves. Yes, sealed options are available. Yes, they're quite a bit more expensive, how did you guess...?

        3. Alan Brown Silver badge

          Re: rows of car batteries baking in the 48° heat

          " They didn't spit during these toppings-up."

          You've never topped up telephone exchange traction batteries (about 8-10 inches a side and 2-3 feet high - PER CELL - times 24 cells). They might not spit but splashback was always a worry. Goggles AND an apron thanks - acidwash jeans may have been fashionable but it's not a good idea to get the chemical in question on them whilst wearing them.

        4. Orv Silver badge

          Re: rows of car batteries baking in the 48° heat

          Deep-cycle lead acid batteries still often have removable caps. Most "flooded cell" industrial and "golf cart" batteries do too, because they're intended for a long lifespan, and its normal for a certain amount of water to evaporate during charging. (Well, "evaporate" isn't quite right. Most of it isn't turning into water vapor, it's being turned into hydrogen and oxygen via electrolysis. Have I mentioned battery rooms have to be carefully ventilated?)

          Railroad signalling installations and power substations used to use banks of flooded NiCads, which are interesting. They usually have glass cases and look like something Thomas Edison would have experimented with, and will last a very long time as long as they aren't allowed to go dry. In many cases these are being gradually replaced by deep-cycle lead acids, because while they have to be replaced more frequently, the overall life cycle cost is lower due to them being more widely manufactured.

          (Why would a power substation need batteries? Because they don't rely on line power to power the systems that control the big circuit breakers that protect the high-voltage lines. If there's a short circuit somewhere, voltage can drop very low as the current skyrockets. You need a dependable local power source to ensure the breakers can trip and clear the fault.)

    2. Paul Hovnanian Silver badge

      Re: rows of car batteries baking in the 48° heat

      Think about how hot your engine compartment gets while idling in a traffic jam.

    3. Dave 32
      Coat

      Re: rows of car batteries baking in the 48° heat

      One certainly hopes that they weren't "car batteries", although they may have been Lead-Acid batteries. The issue is that car batteries are designed for starting service, and running one down will usually kill it forever (Something to do with the Lead particles flaking off of the plates, accumulating at the bottom of the cell, and then shorting out the plates.). Deep-cycle/Marine batteries look a lot like the starting-service car batteries, except that they're designed to be deeply discharged without being killed (Oh, and they cost about 1.5 times as much, too, but that shouldn't be a surprise to anyone.).

      Dave

      P.S. I'll get my coat. It's the one with a pocket full of batteries.

      1. Anonymous Coward
        Anonymous Coward

        Re: rows of car batteries baking in the 48° heat

        "Deep cycle" batteries still shouldn't be used much beyond 50% of the rated charge. Car batteries have thin plates to give a high discharge current, "deep cycle" batteries have thicker plates but it's only a matter of proportion, discharge below 50% repeatedly and they quickly start to die.

        For UPSes which are expected to be used very rarely this is not a problem, for boats where they can replace ballast they are good because they are cheap and you can easily add more of them.

        Traction batteries, which tend to be spiral wound, can take much more discharge for much longer but they are very expensive and very heavy. Great for forklifts where the ballast is useful, but even there nowadays there are a lot of gas fuelled forklifts about. And since small but robust Diesels came on the scene, even the electric milkfloat isn't much of a use case.

      2. The Boojum

        Re: rows of car batteries baking in the 48° heat

        Possibly an obscure addition to the thread but I suffer from sleep apnoea and have to use a CPAP machine to get a decent night's sleep. Some people with this condition like to go camping, which means they need a serious power source to run overnight, and 'proper' CPAP batteries are expensive. So the question is often asked, "Can't I just use a car battery instead?" To which the answer is, "No. Car batteries are designed to deliver a massive current for a few seconds then be recharged quickly. If you subject it to an extended drain over a period of hours then you will kill your car battery in very short order."

        1. PerspexAvenger

          Re: rows of car batteries baking in the 48° heat

          Or do what I do and have a leisure battery and an inverter (Resmed, handily, makes a proper 12V supply for my machine).

          Cost less than £100 all up and lasts me at least a week.

      3. rcxb Silver badge

        Re: rows of car batteries baking in the 48° heat

        > car batteries are designed for starting service, and running one down will usually kill it forever

        The sealed, VRLA batteries in (practically all) UPSes are very similar to car batteries (they are NOT built like "Marine" or deep- cycle batteries). It's a simple matter of having the electronics treat 10.5V as zero, and drop the load at that point, instead of continuing to discharge and permanently damaging the battery.

        1. YetAnotherLocksmith Silver badge

          Re: rows of car batteries baking in the 48° heat

          And that's why the UPS "suddenly failed".

      4. 's water music

        Re: rows of car batteries baking in the 48° heat

        Marine [anything] cost[s] about 1.5 times as much , too, but that shouldn't be a surprise to anyone.

        I am surprised that you have sourced anything boat related at such a low cost multiplier

    4. Anonymous Coward
      Anonymous Coward

      Re: rows of car batteries baking in the 48° heat

      I’ve never seen a UPS using car batteries.

      1. Anonymous Coward
        Anonymous Coward

        Re: rows of car batteries baking in the 48° heat

        I have, as backup to site-wide UPS, for for a large data center, it was about 40 linear feet of car batteries.

      2. Alan Brown Silver badge

        Re: rows of car batteries baking in the 48° heat

        Go to XYZ developing country and you'll see row after row of them - and the only thing you can GET are car batteries.

        Even after saying you don't WANT car batteries, you want proper deep cycle batteries, you'll be sold - a car battery - because the sellers really don't know any better. They also sell WD40 as lubricating oil.

  12. Anonymous Coward
    Anonymous Coward

    And that non-functional generator and balky switch? Still a mystery

    Probably some poor grunt in the shower shorted it out and screwed things up as they got electrocuted.

  13. Anonymous Coward
    Anonymous Coward

    How things are done in government organisations

    We had to deploy a config change which meant bouncing the servers for our public facing website. Management were wringing their hands all day, wondering how much it would inconvenience our single user.

    1. trevorde Silver badge

      Re: How things are done in government organisations

      Our single user wasn't even using the system that day! FML.

  14. druck Silver badge
    Coat

    Lack of Bacon

    "to get my turkey bacon sub (hold the bacon)"

    That's your first mistake of the day...

    1. Reg Reader 1

      Re: Lack of Bacon

      Was he being polite due to being in Riyadh? That was the location wasn't it?

  15. SirDigalot

    that sound you get when...

    the n00b* forgets that soft bypass on the ups is not actually full bypass and they cut the power to do inverter maintenance

    or should i say, the sound you don't get

    from anything

    server rooms are really quiet when things are not powered on.

    *this may or may not have happened to me in a former incarnation, but seemingly at my last job it was a rite of passage for everyone no matter how well we trained them and documented it

  16. Anonymous Coward
    Anonymous Coward

    Reminds me...

    Not that long after the fall of the Wall, a visitor to a NATO meeting from a former Warsaw Pact country was asked who would have won WW3.

    "We would, of course," he said, "We'd have reached Bonn before you lot stopped showing one another PowerPoints."

    As Stanislaus Lem pointed out in one of his stories, bureaucracy can stop anything.

  17. Ian Bush
    Mushroom

    Temperature Units

    Sigh ... To all those who don't understand the archaic temperature units which are the primary measure in the article can I remind you of

    https://www.theregister.co.uk/Design/page/reg-standards-converter.html

    so they can be converted into something more comprehensible. For starters 120F=2.9Hn

    1. Doctor Syntax Silver badge

      Re: Temperature Units

      All temperatures should be quoted on the Gas Mark scale.

  18. Anonymous South African Coward Bronze badge

    Speaking of gennies and the such, somethibg which I have all but forgotten :

    A while after the company got their genny commissioned and in working order, we had a power failure. We sat back with smug faces knowing that we can continue working...

    ...which we did, for about 30 min. The genny went off. Everybody packed up and went home, and I did an orderly shutdown of everything in the server room.

    Next day, power was restored, and we requested an engineer to come out and chexk the genny.

    When the engineer arrived, we went to check the genny. Everything was ok, except for the coolant level.

    The engineer noticed that the flange holding the radiator cap in was a bit bent upwards at one side, and gently fixed it with a few taps of a big wrench. Of course the coolant got refilled.

    Seems the slight rise in temperature was enough to trigger the controller software, which initiated an emergency shutdown.

    From that point onwards we haven't had any issues with the generator at all.

  19. Anonymous Coward
    Anonymous Coward

    This story indicates they weren't doing full DR scenario tests

    You not only want to test your redundancy, you want to (at least once,but preferably more often like once a year) test what happens when your redundancy you spent millions on to make fail-proof fails.

    Had they done that here, and tested what happens when the redundant generator circuits plus the backup local generator plus the backup battery bank all fail and you have no power, they would have identified the issue with lacking access to start the VMs. They might have also identified who needs those backup analog phone connections, and created a policy to shut down non essential services immediately if running on battery.

  20. Anonymous Coward
    Anonymous Coward

    The old trusty UPS...

    Wasn't too long ago I was working as an Ops engineer at mid-sized payment services provider one morning when facilities did one of their quarterly generator tests. We had a single moderately-sized server room on the 1st floor and a single large UPS within it that was sufficient to run everything in it. Ever quarter the test would occur, the lights would go out but the screens and emergency lighting all stayed on, and the server room hummed merrily away.

    So one morning at 830ish there are just a few of us early risers in the office and the room just goes dark. Lights, screens, everything. My immediate reaction was to stand up, brow furrowed, and stare at the bank screens as I realised it was uncomfortably quiet...

    ... A quick dash to the server room confirmed that the entire room was dark. A few frantic phone calls were made as I summarily dispatched a support analyst to "get that maintenance ****et to turn the ****ing power back on!!!!" A few mins later the lights came back on and I went through the server room turning things on manually before dashing back to my desk to try and bring our applications back online as the network guys arrived and started checking what hadn't re-appeared.

    Root cause? Trusty old UPS has failed some weeks before and no one had noticed all the screens and leds on it had gone dark. The incident report for that was not digested well.

  21. Lord Kipper III

    I once bought one of those cheap combination lock safes to keep documents in at home. Not especially secure but kept important things together and away from prying fingers (of children). Powered with a couple of AA batteries it started giving a low battery warning which of course I made a mental note to replace but never quite got round to. One day the batteries died and the safe wouldn't open. Never mind, I'll get the key that also opens it and was put in a safe place with all the other spare keys for cars and the house for such an eventuality.

    Not there.

    Ask the wife - 'Oh yes, I found it on the floor and it looked important so I put it in the safe'.

    I now how really insecure those cheap safes are.

    1. Doctor Syntax Silver badge

      "I now how really insecure those cheap safes are."

      Expensive safes are also insecure if you can afford the time to get into them. As with securing anything you have to balance the value of what they'll hold against the cost of breaking them. The cost, of course, involves the chances and consequences of getting nicked if you're not entitled to break in.

    2. tinman

      "I now how really insecure those cheap safes are."

      as are safes and filing cabinets at ultra top secret atomic bomb projects if you read Richard Feynman's memoir, "Surely You're Joking Mr Feynman"

      Spoiler

      .

      .

      .

      .

      .

      change the factory default combination numbers before use, but read it anyway, he tells it better

      1. irrelevant

        Filling cabinets

        1980s, I rescued a filing cabinet that was being thrown out at work (Ferranti)... Not sure of the age of the thing, but it had a Ministry of Aviation asset tag on it.

        I didn't have the keys, so naturally I managed to lock it accidentally by pressing in the protruding lock. My dad looked at it, compared it to a similar cabinet he had, measured back a precise distance from the front and drilled a small hole in the top. One poke with a bit of wire and we were back in. I guess it's easier when you have another one to look at..

    3. DropBear

      I still don't understand why these don't have 9V battery attachment nubs, the way smart door locks kinda all do, instead of a traditional key as a backup. Yes, you'll still need to get hold of _something_ that outputs 9V or so (not necessarily an actual 9V battery) that might not be immediately at hand, but as this very story proves, the key might be far from readily at hand too. Plus, an "emergency" mechanical lock kinda reduces the (otherwise _potentially_ decent) security of an electrical lock you cannot reach with the ludicrous rakeability of a typical mechanical lock (or someone might just know where you keep that key, as it clearly can't be securely in the safe).

      Yes, the external electrical contacts do introduce further opportunity for mischief if disabling the safe is the objective not opening it (by zapping it with high voltage), but even that could easily be worked around by internally routing the aux power input through a tiny, potted DC-DC galvanic isolation brick (costing pennies). At worst, an attack would blow that up, but the rest of the internals would be unaffected; and at any rate, one could do the same through the keypad contacts anyway so this doesn't seem to be of much concern.

      And also yes, sure there could be a malfunction of the electronics or motor that would lock you out without a keyed backup - but let's be honest, that's about as unlikely as it gets, plus these safes would last a mere few minutes against someone with the authority to use physical brute force against them...

      1. Alan Brown Silver badge

        "I still don't understand why these don't have 9V battery attachment nubs"

        A lot of electronic safes have the batteries on the OUTSIDE for this very reason.

  22. afrihagen

    "I warned you this would happen" : the Cassandra Principle...

  23. Griffo

    Mirror ain't backup

    I went out to visit a new client to get an overview of their infrastructure, and to perform a quick health check. This was a largish architecture firm, I think they had about 200 architects, designers etc.

    Their "backup" system comprised of the owner pulling out 1/2 of a mirrored set of disks each night. He'd take it home, then bring it back in the morning and let it re-sync.

    Yes.

    Really.

    I spent a month trying to convince him that that this was a recipe for disaster and that he needed to spend a relatively small amount ($5k or so) on a backup system. He refused, and we pretty much parted ways.

    Fast forward 3 months, he calls us in tears. All his data was gone. Every client drawing ever produced had gone "poof" one morning when he plugged the disk back in and things didn't go well.

    I didn't give him much time, asked him how expensive that $5k backup system sounded now, and told him to go try ring someone else. I have zero regret in refusing to help him.

    1. Anonymous South African Coward Bronze badge

      Re: Mirror ain't backup

      Their "backup" system comprised of the owner pulling out 1/2 of a mirrored set of disks each night. He'd take it home, then bring it back in the morning and let it re-sync.

      I facepalmed at that.

      Cheap option would be to keep the mirror unbroken, but get an external drive (or three or four) then back up to these. (What would you guys've done or suggested in this scenario?)

      Fiddling with mirrors and/or RAID backup sets with production data on it, will have a certain Mr Murphy take a very unhealthy interest at that...

      1. Anonymous Coward
        Anonymous Coward

        Re: Mirror ain't backup

        I, too facepalmed at this one.

        The poor chap's data would probably have been safer with no "backup" solution whatsoever !

      2. Orv Silver badge

        Re: Mirror ain't backup

        What would you guys've done or suggested in this scenario?

        Depends on what era we're talking about. External or hot-swappable drives (that are NOT part of the mirror) are often more economical than tape, unless you need to keep archived data. They're also usually faster. But nowadays my first instinct would be to look at a cloud-based solution, or one hosted at another location if they had more than one data center. Assuming sufficient bandwidth of course. My experience is the more automatic you can make it, the more likely it is to actually happen. Eventually people tend to stop bothering to take the disks/tapes home with them.

        Breaking a mirror is especially bad here because when a mirror gets out of sync, it's not necessarily clear which version of the data is the correct one. It's like what a teacher of mine told me once: "Carrying two compasses is useless. If they don't agree you don't know which is wrong. Either carry one, or three."

    2. JulieM Silver badge

      Re: Mirror ain't backup

      If the drives are really hot-swappable, and there isn't an active swap partition on the one you pull out, such an abuse of RAID1 isn't actually a terrible way of backing up data. At least with mechanical drives, the rate determining step is getting the zeros and ones onto and off the actual ferric oxide; writing the same data onto two drives at once does not take any longer than writing to just one drive.

      However, you do need to have more than just two drives, so you always have what is hopefully a spare copy of the data that hopefully is being resync'ed; just for when -- not if -- it goes Pete Tong and decides to copy the contents of the drive you just inserted over the one that was in there the whole time.

  24. Diez66

    A very satisfying Flash / Bang

    So a big old PBX, just installed and running the site.

    "Lets check the standby batteries" said the boss.

    He turn off the mains, silence and no lights? The battery fuse was disconnected so no batteries.

    We get to witness; boss runs around a bit grabs the fuse and pops it in and the is a big old flash and a bang as the batteries start to supply current.

    Boss falls backwards and does a bit of a somersault; all very pleasing to us chaps as we had told him let us check it had all been installed correctly and ready for testing but no!.

    It appears on a more expensive system there is some circuitry to prevent big bang/flash but boss did not want to pay.

    Made us very happy and did no damage other than split sides.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like