back to article What came first? The chicken, the egg, or the bodge to make everything work?

It's another tale from the world of telephony where everything goes wrong in this week's On Call. Today's story comes from a Register reader Regomised as "Greg." Greg was the applications manager for a now-defunct telco, and was the throat to choke for all the applications in the customer-facing side of the business. The …

  1. Korev Silver badge
    Coat

    That was an Eggcelent bodge

    1. Dr. Ellen
      Facepalm

      The chicken or the egg?

      The egg came first. There have been recognizable eggs for millions of years before the first recognizable chicken appeared. This also applies to computers: the code in the egg must be subject to proper conditions before a chicken can be created. The code in the computer is the same, and one of the proper conditions is the computer being in a state to reach and run the code.

      1. yetanotheraoc Silver badge

        Re: The chicken or the egg?

        "The egg came first. There have been recognizable eggs for millions of years before the first recognizable chicken appeared."

        Yes but, we are not talking about "eggs", we are talking about "the egg" -- the one from which the first recognizable chicken appeared. Clearly (or maybe not so clearly, but let's skip over that part) the parent was not a recognizable chicken, the offspring was a recognizable chicken. What then about "the egg"?

        1. Hero Protagonist
          Holmes

          Re: The chicken or the egg?

          Well the contents of the egg is identical at the DNA level to the chicken that emerges from it, so the egg came first.

        2. Martin
          Holmes

          Re: The chicken or the egg?

          But still, the egg included the recognizable chicken. So still, the egg came first.

          More precisely, something which was very nearly a chicken produced a fertilized egg, which had had a small mutation on the DNA in said egg, which was going to produce the first actual chicken....

          I am not an expert in evolution, however - feel free to correct me!

          1. Muscleguy
            Boffin

            Re: The chicken or the egg?

            So you can tell a chicken embryo from a quail, a turkey or a guinea fowl can you? At what stages of development pray tell? Have you ever seen a live chook embryo? I have, lots. Manipulated them, tried to introduce genes into. In a lab which went through 70-80 dozen fertile hen eggs a WEEK.

            BTW I’m on 4 papers doing the embryology on two mouse knockouts and two knockins. I can spot a mouse developmental defect from 20 paces.

            1. yetanotheraoc Silver badge

              Re: The chicken or the egg?

              Asking questions is so much easier than providing answers. That's why my previous comment ended with "?".

        3. pmb00cs

          Re: The chicken or the egg?

          It depends how you define "the egg" if it is "an egg laid by a chicken", then obviously the chicken came first, however if it is "an egg that if fertilised would hatch a chicken" then it is obviously the egg. If your definition is anything other than those two options, and covers what is recognisable as a shelled, egg shaped, object for a biological method of external reproduction, then the answer is not immediately obvious from the definition alone, but is the egg.

        4. Muscleguy
          Boffin

          Re: The chicken or the egg?

          There is an unbroken chain of interbreeding animals between the Theropod dinosaurs and your chicken. Trying to draw a line and say THAT egg hatched the first SE Asian jungle fowl is essentialism and Biology knocked that on the head, disembowelled it, burnt the remains and threw in a hydrogen bomb explosion.

          The question also devolves into ‘when did sex evolve?’ also anisogamy (different sized gametes). Then you have to ask how big the difference has to be before the large one is an ‘egg’?

          Nature does not carve at lines just because us puny naked apes like to draw them to make the universe more tractable.

          1. yetanotheraoc Silver badge

            Re: The chicken or the egg?

            "Trying to draw a line and say THAT egg hatched the first SE Asian jungle fowl"

            Didn't you see me skip right over that part?

        5. Black Betty

          Re: The chicken or the egg?

          Actually. it would be the "rooster", who's mutated DNA fertilized the "first egg".

      2. knottedhandkerchief

        Re: The chicken or the egg?

        The cock, of course.

        1. Omgwtfbbqtime
          Trollface

          Re: The chicken or the egg?

          The Bantam cock? Sort of NSFW in this day and age

      3. Mark 85

        Re: The chicken or the egg?

        So chickens and software have much in common... both are mutations.

        1. Muscleguy

          Re: The chicken or the egg?

          All life contains mutations. It has been estimated at birth we contain 100 mutations neither of our parents do. Mutations provide variation which might make your descendants the pick of the crop.

          One of the sprogs has TypeII diabetes the other is hypothyroid. Neither myself nor my wife are either. One is also cross lateral, no innate sense of right or left or preference. That one might be me, I’m ambidextrous. I can do anything left handed except write legibly and that is lack of practice. Neither of my parents are.

          1. Doctor Syntax Silver badge

            Re: The chicken or the egg?

            2I can do anything left handed except write legibly and that is lack of practice."

            By that measure I'm ambidextrous - I can't write legibly with either.

            1. W.S.Gosset

              Re: The chicken or the egg?

              I know a chap of the lavender persuasion who, in conversation, gestures quite theatrically with both arms. I believe he is flambidextrous.

            2. Terry 6 Silver badge

              Re: The chicken or the egg?

              That sounds a bit like me- though I can teach ( and demonstrate) writing correctly formed letters using either hand. It's the actual writing of more than one or two words where it all falls apart. The formation is correct - but consistency of size, position and curvature is all over the shop. And after any prolonged period of writing everything starts to get illegible. I tend to write right handed because some bloody stupid teacher 60 odd years ago forced me to. Which in a sense was partly the foundation of my career, since I spent much of the last 40 years sorting out kids' reading and writing problems. Many of the latter caused by stupid teaching in the early years - and a fair chunk of the former too for that matter.

              I'm ambidextrous, but in general there aren't many jobs I can choose to do with either hand, there's always a preference. I've never been able to find any discernible pattern to why I use any particular one though- other than if I was clearly taught it that way. So, I tend to catch or pick things up left handed. Clean my teeth and shave right handed. And so on.

          2. Dave314159ggggdffsdds Silver badge

            Re: The chicken or the egg?

            Identified, relatively common conditions like diabetes and hypothyroidism are much more likely to be recessives combining than mutations.

          3. W.S.Gosset

            Re: The chicken or the egg?

            > All life contains mutations.

            When I was in primary school and got glasses, all the other kids laughed at me and called me six-eyes. I haven't worn them since.

          4. MJB7

            Re:Ambidextrous

            One of my friends (adult) children describes themselves as "ambiclumsly", they are equally inept with either hand.

            1. Precordial thump Silver badge

              Re: Re:Ambidextrous

              Ambisinister?

      4. swm

        Re: The chicken or the egg?

        The egg came first. I ordered both a chicken and an egg from Amazon and the egg came first.

      5. Anonymous Coward
        Anonymous Coward

        Re: The chicken or the egg?

        I just finished reading Dawkins.

        The chicken and the egg are machines created by the Genes in order to reproduce themselves.

  2. Korev Silver badge
    Facepalm

    We had something similar in a DR rehearsal once where we obviously didn't want to do it connected to the network to avoid clashes. I had few problems bringing up the Linux side of things, the guy doing the Windows side of things couldn't bring up up the Windows side of things as the was no AD available* and no way of getting it. In the end someone from a different site had to provide a backup which he restored from and could then get going, this took quite a while though.

    * If there was a real emergency then we'd have AD etc available from the company's other sites as soon as we had network.

    1. Doctor Syntax Silver badge

      And that, folks, is why you have DR rehearsals. The purpose of the first one isn't to practice doing it, it's to find out why you can't.

      The purpose of the second one, of course, is to find out why you still can't.

      1. Anonymous Custard
        Headmaster

        And so on and so forth.

        It's management excuses all the way down...

        1. Will Godfrey Silver badge
          Happy

          Otherwise known as turtles

          {thanks Pterry}

          1. Tom 7

            When managers are involved its turdles.

            1. Montreal Sean

              I thought those were turds.

        2. Anonymous Coward
          Anonymous Coward

          @Anonymous Custard

          "It transpired that the team had missed a dependency". That's from the article.

          So yes, it was management's fault. For hiring incompetent staff.

        3. Doctor Syntax Silver badge

          In practice I think you usually get there. The most significant thing we learned from our first DR was that, left to itself, the vendor's backup routing left /etc well down the list. We almost ran out of time before it was restored. Having rewritten the backup to prioritise the directories we needed to boot, we could do that and then get the database restore running in parallel with everything else. I suspect the everything else that originally stood in the way of /etc was largely a huge stack of crud whose owners would have been prepared to defend it to the death.

      2. Evil Auditor Silver badge
        Thumb Up

        Can't upvote this enough. We (i.e. in one of the former lives) had several of those why you still can't, not all of which were found in rehearsals. One case of actual emergency was when AD was unresponsive for reasons I cannot recall. To recover, admin access was required. And admin access was requested and granted via a web application, which required AD authentication...

        And another one, luckily found during paper rehearsal, was that the management console for emergency power (several marine diesel engines and a couple of gas turbines) was not connected to UPS batteries. If the mains failed, it might have been possible to start the diesel, if, in the heatdark of battle, someone thought of powering the console with an extension cord from a UPS wall socket...

        1. Yet Another Anonymous coward Silver badge

          See also, computers are connected to UPS but auxiliary equipment like printers and MONITORS weren't

        2. Anonymous Coward
          Anonymous Coward

          From a previous life in datacentres, part of our "oh crap, everything's dead" procedure to get some gennies started was contingent on the fact that whoever was fixing things was likely to be there for a while, and would probably cope without their car batteries for a bit...

          (At least until we could gang two together to turn over a generator, and then bodge in some cabling to get 12V to the main site control panels, which weren't parasitically powered and as such normally ran off a pair of 12V wall-warts...)

      3. Alan Brown Silver badge

        Which reminds me of a weekend Civil Defence exercise a while back when I was working for a telco. The main radio repeater failed (it was at a telco hilltop site) All the orgs involved wanted to call it off or call out the telco techs (me) to repair the thing at 9pm Friday

        The local head of CD smiled and told them: "This is excellent practice for the real world. Find workarounds!"

        First I knew of the issue was Monday morning after I came off weekend on call duty. I was most happy as the road up there was a muddy bitch in the dark with a 600 foot sheer drop on one side that tended to be unforgiving if you got it wrong.

      4. Anonymous Coward
        Anonymous Coward

        I did some work recommending a DR strategy at a very large and well known company who had suffered a serious albeit brief outage following power issues to their primary global data centre.

        The secondary data centre - I hesitate to call it a DR site because it was on the same physical (albeit large) site as the primary data centre - was small and uncomfortably warm due to underspecified HVAC, and could only house a subset of the "most critical" services.

        Unfortunately, DHCP was not considered one to be of those critical services.

        This, while easily fixable, was the beginning of a litany of issues, not least of which was that no-one really had a very good idea of which services were run out of the primary datacentre, which of them were critical, or what the dependencies were. Also, there was a highly plausible scenario which ran from a toaster fire in a staff kitchen through to complete loss of one datacentre, a significant number of IT staff hospitalised or worse, loss of physical access to the secondary, and globally felt impact.

        The outsourced IT provider had refused to do a test for the last seven years because they'd provided warnings of what would happen if they switched things off and had asked for money to try to fix it, which they hadn't been given.

        It took three years, quite a large team, and some serious spending, but they did eventually manage to get everything migrated to three brand new datacentres. At which point I breathed a sigh of relief.

        1. Doctor Syntax Silver badge

          "small and uncomfortably warm due to underspecified HVAC"

          I hope you explained that that's not what a hot standby means.

          1. W.S.Gosset

            I used to have a hot standby but she got married.

      5. anothercynic Silver badge

        Sounds familiar. The joys of DR at a UK bank with their trading software... ;-)

    2. molletts

      Sounds familiar - I had to bring an unfamiliar system at a small customer site up from cold last year. The Hyper-V cluster wouldn't come up because it depended on DNS for the nodes to find each other (and probably AD for configuration); the DNS servers were VMs in the cluster...

      Fortunately, I was able to attach a keyboard and monitor, log in locally on one of the hosts and manually start one of the DCs. Once both nodes noticed that they now had DNS, they started talking to each other and everything sorted itself out.

      (Actually, "from cold" is probably not the best description - all the switches, the router and the SAN had started up automatically when the power came back on goodness-knows-how-long before I got there; the office-style air conditioner needed to be turned on manually by pressing a button. The tiny room was absolutely baking and the shrieking of the fans in the switches was deafening.)

      1. Jou (Mxyzptlk) Silver badge

        Oh, how I know that. Because MY Hyper-V clusters are designed to be able to boot up, their AD IS ready when the cluster needs it. Tested and proven for EACH installation. That includes Server 2008 (with and without R2) clusters which are way less tolerant than 2016+ HV clustering regard of unreachable AD.

        The funny part: Too many VMWare guys blame it on Microsoft, but VMWare suffers a similar problem. Though when it comes to VMWare, those guys read the documentation and best practices. But they don't give M$ the same amount of attention and then blame.... MS.

        The funny part 2: The VMWare guys design their cluster auth to work without requiring external auth or the "main customer AD being available". Doing the same for Hyper-V is always a discussion - which I always win, but those endless discussions.

  3. ColinPa

    Where are the instructions?

    As told to me...

    A customer was very organised, they had online documentation to cover all the scenarios they could think off, eg the start-up order of hardware and software etc. Most of it had been tested. It looked very impressive

    They had a site wide power down. The shutdown worked well. The problem was the restart.

    "The restart instructions are on this machine - ah it's been powered off."

    They had a quick brainstorm, and worked out the restart order, disks, network controller, PC's, until they could logon and get the proper instructions. Then they shut it all down again, and went through a proper restart.

    After this they had a folder with the "machine room boot-up" printed instructions in the operations area. it reviewed and updated every month.

    1. Giles C Silver badge

      Re: Where are the instructions?

      Everything online is as you say great until you can’t get online.

      The relevant people should at least have a local copy of the dr procedures, preferably one on a dead tree.

      I know these days the cloud does everything but there is always something that can break connectivity and in these days of ransomware at least the process will be available.

      Yes it might seem outdated but try restarting a generator with no power and the instructions are on a server somewhere. If they are next to the machine then it is a case of reading them.

      1. I Am Spartacus
        Childcatcher

        Clouds and Blue Skies

        The "cloud" is simply a short hand way of saying "someone else's computer". It can and will go wrong just as your own system do. Only now it's not your fault so you can feel better about ot.

        1. Chloe Cresswell Silver badge

          Re: Clouds and Blue Skies

          "Only now it's not your fault so you can feel better about ot." You missed off "and can do nothing about it"?

          1. Tom 7

            Re: Clouds and Blue Skies

            Pub o'clock!

          2. yetanotheraoc Silver badge

            Re: Clouds and Blue Skies

            "and can do nothing about it"

            Au contraire, management *can* do something -- send an email to the customers saying it's AWS's fault. Which (Tuesday) it was, at least if we are only considering the proximate cause. Email sent, waited for Amazon to fix it, more emails all around.

            Management: Proximate cause? What's that? Don't get technical with me. I'm beginning to think it's *your* fault!

            1. Doctor Syntax Silver badge

              Re: Clouds and Blue Skies

              What emailing customers achieves depends on circumstances but if the emails are effectively saying "place your orders with the opposition" then just sending emails doesn't really count as doing something about it.

        2. JimboSmith Silver badge

          Re: Clouds and Blue Skies

          The "cloud" is simply a short hand way of saying "someone else's computer"

          I was in a meeting where somebody who assumed the "cloud" was the magic solution to everything spoke. Another attendee then said "the cloud is just somebody else's computer permanently connected to the internet". This was a meeting about moving information that was currently secure on an air gapped network and on DVD/tape back up on and off site onto the cloud. The meeting ended with the outcome that having that info, encrypted or not potentially accessible by anyone via the internet wasn't going to happen.

      2. Anonymous Coward
        Anonymous Coward

        Re: Where are the instructions?

        This is a bit like the "business continuity" planning my local authority employer adopted a decade or so back. Until then I'd had a plan for my service on 1 side or two of A4 paper with copies accessible from standalone PCs and on a CD or 5.

        But Oh no. We had to adopt the new system. Which was some kind of complex database with stems and branches containing folders that each hard parts of the plan. In my head it always seemed like a kind of giant Christmas tree full of linked lights and individual baubles of information.

        I assume it was designed by someone thinking of a big corporate user with an admin or several to administer it and a centralised command and control system. But a local authority also has hundreds of front line teams each with too little admin staff to save frontline staff from wasting valuable time doing routine low level admin. And to help us use this mega-complicated DB like planning thing one staff member (in this case me) was given a whole two hours of group training.

        But by the nature of the thing, should disaster strike we probably would have no access to the bloody database, and even if we could access the centralised system, it would have been an nightmare to actually use of it.

        So they eventually made two decisions. 1) have someone from central admin come and set it all up for us, enter the information etc. and 2), they told us to keep using the old paper based system. Which let's face it, was what we actually needed*

        *To be fair I'm sure they bought into this because a lot of sections just hadn't written a plan or those plans weren't actually functional and checked/rehearsed, We had.What they should have done was just come down hard on the idiots that hadn't got theirs done and then had a repository with copies of our plans. But that wouldn't have given some Higher Up a chance at glory.

    2. Anonymous Coward
      Anonymous Coward

      Re: Where are the instructions?

      > Then they shut it all down again, and went through a proper restart.

      10/10 for that bit.

    3. Anonymous Coward
      Anonymous Coward

      Re: Where are the instructions?

      I think I've written this before, but it is reminiscent of a situation I found once when auditing an offshore oil platform.

      The oil company had a document control policy that said paper was only to be used for active jobs and procedures were to be printed off as needed and then shredded as soon as the job was complete - to ensure copies used would always be the latest. It makes sense to somebody sitting in an office onshore...

      The platform restart checklist (for when there had been a system shutdown and only safety critical systems would remain online, powered by a muckle great stack of lead acid batteries feeding the main control room) were held on a PC in the main control room. Everyone assumed the PC would remain operational as it was in the CR. However, unbeknownst to those on site, the designers had recognised that its 13A ring main could also be used for "domestic" kit (like a kettle or microwave oven) and such use could be an unnecessary drain on the vital emergency power - besides, the last thing you should be doing during such a restart is making a cup of coffee.

      No power to the PC, no restart checklist and a real risk of serious trouble. It presented the OIM (Offshore Installation Manager) a conundrum as he took great pride in enforcing the no-paper policy!

      When I returned the following year, I was shown a paper checklist that now kept in a drawer in the CR.

      1. Will Godfrey Silver badge

        Re: Where are the instructions?

        So...

        When did it go wrong, prompting that?

      2. Tom 7

        Re: Where are the instructions?

        We had paper versions in a safe for the latest restart instructions passwords etc. I came back after leave to discover the safe had been upgraded to an electronic one that wouldnt work when the power was off.

        1. Arthur the cat Silver badge

          Re: Where are the instructions?

          My sister-in-law lives in the wilds of Wales and lost power during Storm Arwen. No problem, she's had a generator since she spent 5 days without power the first year she moved there. Problem, the diesel had gone off. No problem, her neighbour, a farmer, just had a huge new diesel tank installed. Problem, the only way to get diesel out is via an electric pump (nope, no idea why no backup manual pump or simple tap).

          In the end he used a small petrol driven generator fuelled by petrol siphoned out of his car and lashed up electrics to get enough diesel out to run his generator and let my s-i-l have some for her generator.

          1. SImon Hobson Bronze badge

            Re: Where are the instructions?

            no idea why no backup manual pump or simple tap

            I can take a bit of a stab at that being in the done that, got the tee shirt club.

            Going back maaany years when I worked on a farm, the tractors were small and we'd fill the tanks with a portable can filled by putting it under the tap of the storage tank. I think the biggest tank we had to fill was about 5 gallons. And the filler was low enough, despite being on top of the tank which was on top of the engine, to be able to lift the can up to pour it in.

            Then as some point, the farmer bought another tank (ex heating oil tank as the village had got gas now), and placed it high enough up that we could fill the tractors with a hose with a dispenser nozzle on the end. Such luxury.

            The tractors these days are "a bit bigger". And I know of a number of farms where the larger tractors can't be filled unless the storage tank is fairly full due to the height off the ground of the filler.

            The proper answer to that is to lift the tank up some more, but then you get into questions over stability and safety. Plus, to do that means emptying the tank so it's light enough to lift without falling apart, lifting it up, and then re-filling it. Of course, that takes planning, and sod's law means it's only empty enough when it's a busy time and you can't afford to be without your diesel supply - and can't afford to do without diesel while you wait for a new delivery and for all the shdirt to settle out of the diesel. One thing you can be sure of, when you move the tank to put some more supports under it, it'll stir up the dirt in the bottom.

            And another issue with lifting the tank up is that you then have to provide safe access for the delivery driver to fill it.

            So it may well be simplest to just buy a pump (they aren't expensive) and keep the tank at low(ish) level where it's safe - both from the stability PoV and for filling.

            .

            As an aside, and yes with the value of hindsight and knowledge it's something to cringe at. Back in the early days I did rig up a pump - because we were lazy and lifting up a can of diesel once a week (or per day when harvesting) was too much of a chore. I used some random bit of plastic pipe, a pump out of a washing machine, and an electric drill - and a drum tap (the sort you screw into the end of a 25l drum) on the end of the hose. It did work, but Ex rated it certainly wasn't !

            1. Yet Another Anonymous coward Silver badge

              Re: Where are the instructions?

              When the hurricane hit New York a few years ago. Lots of companies had backup generators on the roof, added after 9/11.

              The fuel tanks were in the basement, because of weight, filling, safety etc

              Guess where the electrically driven pumps for the fuel were ?

              A lot of people trying to carry tons of diesel up 20 floors in buckets

              1. vtcodger Silver badge

                Re: Where are the instructions?

                "Guess where the electrically driven pumps for the fuel were ?"

                Serious question: I'm not an engineer as this question will probably demonstrate.

                I think that a mechanical pump can PUSH fluids a long ways. 20 stories? Sure, why not? But I don't think it can SUCK fluids all that high. 10m (four stories?) max for water? A bit further for diesel which is less dense? But not 20 stories?

                OK, maybe the pumps can't be on the roof with the generators. And the bottom where the water to be pumped is flowing in possibly is not a good idea. What's plan B?

                1. Yet Another Anonymous coward Silver badge

                  Re: Where are the instructions?

                  Yes you could use a pressurised system where the pumps push air down into the sealed tanks which forces the fuel up, or you can have staging tanks above flood level with suck and blow pumps.

                2. Anonymous Coward
                  Anonymous Coward

                  Re: Where are the instructions?

                  What's plan B?

                  "A lot of people trying to carry tons of diesel up 20 floors in buckets"

                3. SImon Hobson Bronze badge

                  Re: Where are the instructions?

                  Yes, correct.

                  If you try and lift (i.e. suck from above), then you create a partial vacuum in the suction line. The absolute maximum you can lift is the equivalent of one atmosphere's worth of pressure on the liquid in the tank down below - so around 30ft for water, further for diesel as it's less dense. In practice, many pumps don't like negative inlet pressure - many aren't self priming, some will cavitate. And you are also limited by how much the liquid will evaporate under reduced pressure.

                  So a typical installation will try and have the primary lift pump close to the tank - and preferably lower than the bottom of the tank so there's positive inlet pressure at all times. Outlet pressure (and hence the head you can pump to) is limited by the pump and pipework, and I suppose in extreme the pressure that might cause the fuel to do "interesting" things.

                  The proper answer would be to run the pumps through dedicated cabling from upstairs - so that they can run off the generator's output. The genny itself should have a local buffer tank it can run from for at least a short time - so the lift pump doesn't need to work before the genny is running.

                  It's also important to cycle the fuel. Old fuel goes off in various ways - so a genny only run periodically on light load for testing may not run at all when needed as the fuel could have gone off. In the UK, there are companies that sell a service where they will add remote control to your genny, and sell it as emergency capacity to the grid (only worthwhile for large amounts of capacity - look up Short Term Operating Reserve) - with one of the selling points being that when called upon you get to test your genny under load, and cycle your fuel.

        2. Phil O'Sophical Silver badge

          Re: Where are the instructions?

          You only had one copy?

      3. This post has been deleted by its author

        1. Terry 6 Silver badge

          Re: Where are the instructions?

          My swimming/life saving teacher said much the same thing. If you see someone drowning the first thing to do is take off some clothes.

          1. Yet Another Anonymous coward Silver badge

            Re: Where are the instructions?

            And change into pyjamas then retrieve a rubber coated brick

        2. imanidiot Silver badge

          Re: Where are the instructions?

          Always has been and still is my tactic in the (nowadays rare) cases I'm involved in customer escalations (semiconductor lithography equipment). If I'm involved it's 99% of the time already an extremely long down (more than 12 hours downtime, measured in 6 figures an hour in lost revenue) It's not rare for things to go "more wrong" before they get better and in my experience it's in these scenarios that people tend to do stupid things that make things go from "minor oops" to "that'll take a few weeks to sort out". So in those cases my usual reaction is to close my laptop, get up and loudly proclaim: "well that's borked, let's go get coffee while the system is rebooting" or words to that effect. And I won't take No for an answer.

          Especially when working on stuff that requires effective team work and coordination, that never happens in cases where every feels like "this has to be solved now" when the reality is that taking 30 minutes for a short break (Which is usually spend more casually discussing what just went wrong and formulating an effective plan that everybody agrees on and where everybody knows what to do next) usually actually saves time over the route of people running off to do something without coordinating and losing time all over the place because nobody knows what anybody else is doing.

      4. The man who sold the moon

        Re: Where are the instructions?

        "the last thing you should be doing during such a restart is making a cup of coffee."

        Nonononono, that should be the *first* thing.

  4. This post has been deleted by its author

  5. Pete 2 Silver badge

    The one thing that is never planned for:

    > Although everything had been planned. Nothing could go wrong. Right?

    ... is a failure of the planning process.

    After all, if it was admitted that the planning process could fail, then the knock-on is that nothing could be trusted. Although that degree of paranoia realism is probably the secret of successful planning.

    And if the planning process is considered open to fault, the conclusion can only be that it is impossible to plan for that. Since that plan would also involve the (faulty) planning process.

    1. Doctor Syntax Silver badge

      Re: The one thing that is never planned for:

      "that degree of paranoia realism"

      Paranoia is the required degree of realism.

  6. Anonymous South African Coward Bronze badge
    Coat

    bootstrapping any unknown process is always fun...

    ...when viewed from the safety of Another Place Far and Away...

    1. SW10
      Go

      Bootstrap

      [DTA] [2] [D] [0] [DTA] [ADD] [RUN]

      (Bootstrap for a custom Perkin Elmer I used in the late eighties, and which has remained in my wet ROM ever since.)

      1. knottedhandkerchief

        Re: Bootstrap

        "Wet ROM" - read only memory - is that a new definition for dementia?

        1. doublelayer Silver badge

          Re: Bootstrap

          No, I think it's honest. I have lots of stuff stored in my brain that is of no use or which I would actively like removed. I can tell you phone numbers for people who I haven't called in a decade and the number might not even work any more. I can remember instructions for operating equipment that is so obsolete that there probably aren't any examples still running. I can also remember embarrassing incidents which have no effects on my life today but which can cause small periods of discomfort any time my mind wanders to them. If my memory was read/write, I could easily press delete on those and put something else in their place (IPV6 addresses, perhaps, because I can never remember them). I think it qualifies as read only.

          1. vtcodger Silver badge

            Re: Bootstrap

            Useless stuff stored in the brain? Yep. I have this four digit number stored there that I memorized about 60 years ago. Why? I have no idea. An employee number? a phone extension? An address? Quien sabe? I use it nowadays when a four digit PIN is needed.

            1. Doctor Syntax Silver badge

              Re: Bootstrap

              It's probably the re-use as a PIN that keeps reinforcing it.

            2. Will Godfrey Silver badge
              Facepalm

              Re: Bootstrap

              I can still remember a telephone number from when I was a nipper in the 1950s (back in the letter/number days). It was BRI2479

    2. Herby

      Bootstrap from long ago...

      I remember it as if it were yesterday:

      3400032007013600032007024902402111963611300102 (then the R/S key).

      Boy, am I getting old! I was also lazy as I didn't want to feed a card into the hopper.

  7. Anonymous Coward
    Anonymous Coward

    The Bodge...

    A story I've told before, but I worked for a pharmaceutical company, and in one department we filled glass ampoules with various sterile liquids.

    The machine which filled them was ancient. It was always breaking down, resulting in delays to orders, and was temperamental which made even normal use unreliable.

    It fell upon me to sort out the customer arrears, so the first thing I did was find out why we couldn't do the job people were paying us to do. It turned out that one of the main problems with the machine was the turret - the bearing was so badly worn that the turret 'rocked', and that threw all tolerances during set up out of the window.

    It turned out the periodic (and quite frequent) 'down for maintenance' weeks - which were a major factor in the arrears themselves, since the idiot department still took orders as if nothing was wrong - involved a bodge know as 'shimming the bush'. The engineers would dismantle the machine and insert shims around the bush to try and level it up and reduce play.

    This lasted for a few days until problems started again.

    After involving senior management, I got agreement to fix it once and for all with a new bush. This in itself is a long, long story, but suffice to say we didn't order an original bush from the German manufacturer of the machine - senior management said it would be too expensive. So it fell to the department engineer to source one himself.

    While it was being fabricated, I was speaking with the engineer and the winds of apprehension were triggered by something he said. It turned out we didn't have the precise measurements of the bush, since the engineer 'couldn't find' the CAD drawings that were supplied with all the German machines. When I asked why we didn't request a new set, the answer was... 'it would be too expensive'.

    I asked: 'so how does the fab company know the proper dimensions of the bush?'

    He replied: 'I've sent them the old one we took out.'

    I said: 'So they're going to copy a worn out bush and we're going to put that in?'

    He answered: 'Oh, no. They're going to estimate the correct dimensions'.

    The conversation involved quite a lot more.

    1. Sam not the Viking Silver badge
      Pint

      Re: The Bodge...

      I've been involved in a near 'International Incident' where a bearing-bush had been made incorrectly (the lubricant feed-holes had been omitted). The power station was going to have to shed load with highly political consequences. In the site workshop we made a new bush and the machine was up and running before proverbial hit the fan, or worse.

      Although we and the end-user were not blameless, the customer's chief engineer was most gracious in his thanks. Neither side sought publicity over this event which could have had more serious repercussions. A bland invoice which was duly rescinded..... for good customer relations.

      Subsequently one of our best customers!

      1. Antron Argaiv Silver badge
        Thumb Up

        Re: The Bodge...

        Honesty, respect for the other person's position and a willingness to solve the problem correctly goes an awfully long way.

    2. Dave314159ggggdffsdds Silver badge

      Re: The Bodge...

      Tbf, any machining company worth its salt ought to be able to get the original dimensions, unless the worn part is truly buggered. Usually things are made to whole-unit* dimensions, and by carefully measuring what you have, you can make an educated guess, then test for fit.

      ('Whole units' including things like microns, thousandths of an inch, etc, on very small scales.)

      For example, if you have something you know is worn and now measures, say, 9.7cm, it's possible it was originally 9.75cm, but much more likely it was 10cm.

      1. Anonymous Coward
        Anonymous Coward

        Re: The Bodge...

        "('Whole units' including things like microns, thousandths of an inch, etc, on very small scales.)"

        Which begs the question... which scale? Is that 1in or 2.5cm? 1lb or 450g?

        1. Sam not the Viking Silver badge

          Re: The Bodge...

          On machinery, there's generally some original dimension which to base the guess.... It's the necessary clearances between running parts that can be difficult to assess. Getting it wrong leads to seizure or a rattle. For metals designed to run together, with lubrication, a good start is "a thou and a half per inch" (0.0015"/").

          The ratio is the same in linguine.

        2. herman

          Re: The Bodge...

          Three eighths of a polony skin.

        3. imanidiot Silver badge

          Re: The Bodge...

          Since we're talking German (custom or low scale) manufacture here it's pretty safe to assume it'll have been designed in metric. I've yet to meet a German that'll voluntary use Made-Up units.

      2. imanidiot Silver badge

        Re: The Bodge...

        The problem comes in when you have to have a running fit for a bushing or bearing mounted in what is probably no longer the correct bore size (because the original bearing has walked and "embiggened" the hole slightly that has to smoothly fit a now worn axle. If possible the best option is to make a new set of axle and bushings and fit the whole lot to a freshly bored out to size (and if need be relined or properly shimmed) bore. If it's a lightly loaded bearing you might get away with "wet fitting" a (very) slightly undersized bearing bushing in epoxy if you can't get the bore skimmed when too damaged.

        I most definitely would at the very least have taken a micrometer to the axle and bore to give them actual measurements of the parts that the bushing needs to have a tight and/or running fit to.

    3. yetanotheraoc Silver badge

      Re: The Bodge...

      'I've sent them the old one we took out.'

      Just send them the old part _and_ the usual shims.

      1. The Oncoming Scorn Silver badge
        Pint

        Re: The Bodge...

        Short version of a tale told to me from a friend at Siemens\Plessey Swindon.

        Arrived at work, we are going to Roborough in Devon, they have an issue wanna tag along as its almost your neck of the woods.

        Sure..

        Drive down in the pissing rain, arrive on site, everyone bar him piles into the conference room to discuss the issue at hand, he gets "distracted" for a few minutes & joins the meeting.

        OK the issue is that X machine keeps failing every few weeks, costing us hours in lost ti.........

        Ohh that ....I just fixed it

        Swivelling heads at a speed akin to whiplash acquired in a car crash whiplash!

        YOU WHAT!

        Fixed it...on ours we just put a rubber band to provide tension on Y part, every few months we replace the band, when it fails. I noticed there wasn't one on it as we passed & just asked if they had one kicking about & fitted it.

        He was both popular & unpopular at the same time for the undocumented\unapproved fix & as they managed to fit in a pub lunch (Icon) en-route back in time to finish for the day.

        1. SImon Hobson Bronze badge

          Re: The Bodge...

          Ah, memories of the old Research Machines 380Z from the 80s. A backplane into which various cards were plugged to provide the required config.

          Only the support arrangement for the cards was notoriously poor - plastic guides that stuck up and were lacking in strength and rigidity. The result was that the cards could wobble and cause bad connections. The fix - a rubber band around the two card guides to hold them tight of the ends of the card. Leading to a diagnostic step - machine behaving erratically ? Check the rubber bands !

          1. Martin an gof Silver badge

            Re: The Bodge...

            I have a home-built NAS in a low-profile case. Unfortunately the motherboard didn't have enough SATA ports so there are two on a PCIe card. If the card bracket is screwed to the case, the thing rocks just enough to make the PCIe connection unreliable. Cue lots of faffing around with pliers and loosening screws... and eventually realising that the computer is fastened into a rack and perfectly stable enough that the card doesn't really need to be screwed on.

            I thought these things were standardised?

            M.

            1. imanidiot Silver badge
              Trollface

              Re: The Bodge...

              "The great thing about standards is that there are so many to choose from."

              Andrew S. Tanenbaum

            2. Robert Carnegie Silver badge

              Re: The Bodge...

              A dimension standard specifies, very exactly, how wrong can you be.

              I own AA batteries in distinctly varying sizes, plus it might be wear and tear, but some of them, the knob on the positive end doesn't stick out very far from the battery. Like, not far enough to reach a contact at the bottom of a little slot, and actually work.

              1. Martin an gof Silver badge

                Re: The Bodge...

                Tinfoil is your friend here :-)

                M.

                1. Robert Carnegie Silver badge

                  Aha! Thank you!

        2. herman

          Re: The Bodge...

          A nameless military aircraft company had a problem with a resonance in a shaft which caused the gearbox of a nameless fighter plane to fail. I suggested that they use an old car trick and squirt a can of oil into the shaft, since that will somehow cause a noisy shaft to self balance. The chief mechanical engineer went quiet. I could see him trying to visualize in his head how in hell that could work, but he did not dismiss the idea, since it sounded like plausible old trick. Years later, I heard they solved the shaft balance problem by pouring a can of oil inside it.

  8. trevorde Silver badge

    Batteries not required

    When we travel, my wife *always* prints out everything ie car park booking, boarding passes, hire car bookings, hotel bookings. It's saved us on more than one occasion.

    1. Anonymous Custard
      Thumb Up

      Re: Batteries not required

      Have been a business traveler (current COVID and Brexit situations not withstanding) for over 2 decades, and that was one of the first things I ever learned. Whenever I go away, I have a little folder tucked into my laptop bag with very much the list you mention.

      And likewise, more than once it's been more than useful in averting delay and disaster.

      So an upvote to your wife, which you can of course share.

      1. Flightmode

        Re: Batteries not required

        You shouldn't share your wife with others.

        Well, unless you're into that sort of thing. And she is too.

    2. Dave314159ggggdffsdds Silver badge

      Re: Batteries not required

      I generally save offline copies to multiple devices. Airports (at least in Europe) always have 'internet cafe' pcs and printers available if necessary as a last resort, but multiple devices are unlikely to all fail together.

      1. yetanotheraoc Silver badge

        Re: Batteries not required

        "multiple devices are unlikely to all fail together"

        Unless they all need AWS. (Your downvote was from someone else.)

        1. Dave314159ggggdffsdds Silver badge

          Re: Batteries not required

          Offline copies? Are there devices that won't even run if was is down?

    3. Anne Hunny Mouse

      Re: Batteries not required

      I do too, including photo copies of passports.

    4. Tom 7

      Re: Batteries not required

      We've got a couple of holiday cottages and we send the details to customers and over the phone explicitly make sure they've have the documents on the mobile devices and a printed copy of the directions as postcodes in the country can be quite large.

      Result - people finally getting in at 3am and phoning us to ask for the wifi password so they can download the documents!

    5. The Oncoming Scorn Silver badge

      Re: Batteries not required

      Granted these days there's redundancy with booking info held on laptop & phone(s), but years back (Before smart phones) on a trip to Canada we had a thunder storm, the storm finished & was travelling to the East. Shut down the laptop.

      Next morning totally dead & never computed again.

      Thank goodness for my foresight of printing out the itinerary & booking info, else time would have been spent trying to find a friendly PC shop to extract the Excel spreadsheet from the HDD.

      Recently had to go back to carrying paper copies with all the various Covid tests & vaccination proofs, in addition to copies held on laptop & phone.

      1. Anonymous Coward
        Anonymous Coward

        Re: Batteries not required

        Queuing in the Post Office - a young lad was having problems with an urgent licence renewal. He had the necessary official documentation on his smartphone - but the counter assistant insisted she needed to see it on paper.

        As I lived nearby I took him home. There he sent the document to me as an email. I printed it for him on my PC - and guided him back to the shop. One happy bunny.

    6. anothercynic Silver badge

      Re: Batteries not required

      Very smart woman, your wife. Paper is always useful, as I discovered even in this day and age. Especially now that vaccine passports, pre-departure tests and the like are all needed more often than not.

      :-)

  9. chivo243 Silver badge
    Go

    waiting for the other shoe?

    I was getting the feeling that the temp fix would come back and bite them... faking a heartbeat and all...

    1. Anonymous Coward
      Anonymous Coward

      Re: waiting for the other shoe?

      As the poster over my desk said - "Every solution breeds a new calamity".

  10. hayzoos

    Startup/Shutdown or vice versa co-dependencies

    In my experience, unless there is some configuration change verboten, co-dependent systems can be configured to start/stop properly. I do have to confess that I did support systems which were required to be startup and shutdown daily or even more frequently by end users themselves - for security. So these had to just work or I had to be available to coax them to work. I had taken it upon myself to design startup and shutdown scripts to pause on an incompatible condition of the system, provide intelligible feedback to the user/operator and instruction if a manual task was required (such as close that file). There was a need for some cellulose based storage with toner applied to be posted in various locations and a modicum of training for the user/operators. It worked well enough for me to tend to other tasks rather than use the less efficient "only the admin may touch the power buttons". I only needed to intervene on a rare occasion.

    The seed of this mindset started when I was contracted to write an application for an Apple //e. The project liaison was a former comp. sci. professor who had written the specs. of which one was graceful recovery from unexpected power loss. It was required to be written in the built-in Basic language. I cannot now remember if it was the native text file read or write which was slow as molasses, but I believe it was the write and the read was relatively fast. After a bit of research and testing I found the binary write was very fast but required two dimensional (so to speak) coordinates whereas the text write only single. I had to implement a pointer and data map to support it. I also implemented write on each field entry since there was no longer an unacceptable disk access penalty. This combination proved to withstand unexpected power loss very well. The workflow was to enter data from paper reports, calculate, and produce new reports. Only the initial data was saved. All calculations were made as needed to produce new reports. On the shakedown demonstration of the system the power loss requirement was brought up while the customer was typing, I pulled the plug while saying "Like this?" and holding the plug up for all to see. The look of horror on both the customer's and the professor's faces was priceless. After power was restored the prof. wanted to know how bad the data loss was. I instructed the customer to navigate to the record they had been working on. Only the field which had not been confirmed by pressing the enter key was lost, and the customer just started typing from there without being encouraged to do so. The prof. was impressed. I did confess that if power was lost during an active disk write there could be an unreadable file. Commercially available data recovery software was available and re-inventing that wheel at a higher cost made no sense. The project did specify a backup routine which I had implemented on a data entry session basis so even that was deemed acceptable by the prof.

    1. ColinPa

      Re: Startup/Shutdown or vice versa co-dependencies

      "In my experience,.. co-dependent systems can be configured to start/stop properly."

      Sometimes.. Ive worked with mainframe systems, where you have a sysplex with one Db2 database spread across 10 massive hardware boxes, with 1000's of CICS, IMS etc, and transactions with 2 phase commit across of these transaction managers.

      As this sort of restart is an area which isnt done very often, you can get timing (or deadlock) issues if you do things in a different order.

      Develop the golden path which have tested exhaustively, and which you always follow, to minimise these sorts of problems.

      1. The Oncoming Scorn Silver badge
        Boffin

        Re: Startup/Shutdown or vice versa co-dependencies

        Security had the turnstiles at a Pharma running off a couple of aged HP Vectra, not supported by us.

        Short version, when there was a outage, both systems shut down & powered back up, gates were locked & had to be manually operated giving security something to do (Check badges & press a button) instead of sleeping/reading the paper or opening the gates to let pill press's be stolen by people removing equipment off-site (That's another story).

        Transpired that both systems came up together at the same time & one of them was expecting a handshake from the while the other was still booting, didn't get it so carried on booting.

        The fix was working out which one of the two had to be started up first.

    2. swm

      Re: Startup/Shutdown or vice versa co-dependencies

      We once had a power failure on a file server. The restore program was pretty good and marked all of the bad records. They made a neat spiral as the heads retracted while writing!

      We did have backups so one reformat and copy and we were back in business.

  11. Paul Hovnanian Silver badge

    The rooster

    ... of course.

  12. Paul 195
    Holmes

    But did they ever fix it...?

    I enjoyed this story and the ingenuity of the fix. But... circular dependencies are never good news. Was the root cause of the problem ever addressed?

  13. The Oncoming Scorn Silver badge
    Pint

    The UPS & Building Shutdown (Who Me!)

    So I received a notification that I was to be on-site prior to a weekend building shutdown at one of our office locations, where we had a floor.

    I checked with my site contact as she had been there for years about what was required for me to be on-site for - Nothing she knew of, asked my managers etc & in the end decided against boosting my mileage claim for the week & informed all of the same.

    Building shutdown happens as planned & powered back up early hours 3am, when the remote technical support person starts ringing my office phone (Unsurprisingly I'm not there) leaving VM's as the servers are not coming up. I get to site as planned to discover the same as I was there to ensure everything was up & running for Monday. Turns out the reason I was supposed to be there but totally un-documented was to switch the big UPS in the server room to bypass.

    Few hours waiting for the UPS batteries to be checked as they were totally depleted, before the servers could be powered up & testing of the office PC's & printers etc could begin.

    Older & wiser heads may well down vote this of course, but this was my first exposure to a planned building shutdown, coupled with a lack of handover knowledge & any documented procedures. Still I got a extra couple of hours at double time, as did the contractors with their call out fee's & no real harm done.

  14. SImon Hobson Bronze badge

    Must admit I've also met this problem - but in our case decided it wasn't worth the effort to automate it, just know how to deal with it when required.

    If we had to do a cold start of our server room, then things didn't come up properly - a bit of A depends on B, B depends on C, C depends on D, and D depends on A. But at least one of those would some up well enough to satisfy the next one. So it was a case of start up systems in a particular order knowing that some things would be suboptimal - then when they were all up, reboot each one in turn so it started up in a clean config.

    I do recall that one of the dependencies was DNS - if you use IP addresses in configs then things break every time you change stuff, the configs are unreadable, and it's generally a PIAT. If you use FQDNs then you can move stuff and other stuff generally just needs a graceful restart or reload to pick up the change - and it's less of a PITA. IIRC I added a DNS secondary for certain zones onto certain servers (which you'd never expect to need it's own DNS service) to provide the required DNS resolution when booting without the main DNS being up.

  15. DarkwavePunk

    Telcos and Bodge.

    One of my mentors at a Telco in Oz used to write CGI scripts in shell and C. I expanded on this to use sed and awk (mainly because I knew it, but also because Perl 4 was utter shite). It stuck with me as an ethos. I'm so sorry to every company I've worked for in the last 25 years. Sort of. Maybe. Lot's of love - The Eternal Bodge.

  16. Stoneshop
    Headmaster

    Wrong adjective

    "We had never tested a total power down before, because we had redundant systems, so a total outage was unthinkable."

    inevitable.

    1. A.P. Veening Silver badge

      Re: Wrong adjective

      Those two words are synonyms as a corollary of Murphy's law.

  17. Tom 7

    Worth remebering to keep hold of the test harnesses

    you used to build the thing.

  18. Primus Secundus Tertius

    Out in the sticks

    Back in the 1990s, I worked at a place out in the countryside. (One day I saw a fox stroll past the window.) Frequently we would arrive Monday morning to find the electricity had gone down over the weekend, and all our computers were 'resting'.

    Our internal network had 'just grown', with many interdependencies. It took most of a Monday morning to restart the computers in the right sequence. Eventually we recruited an experienced network engineer to sort us out.

    1. Terry 6 Silver badge

      Re: Out in the sticks

      Err I live in North London. Bloody foxes are everywhere.

      1. FrogsAndChips Silver badge

        Re: Out in the sticks

        Same here, I remember the good ole times of watching them chase each other in the communal garden at $HOME[-1], while bottle-feeding little froggy at 2am...

  19. Bruce Ordway

    Remote connection and mapped drives

    Reminds me in the early 2000's when people first started connecting remotely to a site.

    User connected to a Windows domain OK but... complained they were missing all of their mapped drives

    Everything worked when the user was on-site, hard wired.

    Discovered that the script that was supposed to map the drives ran before a remote PC actually had access to the domain.

  20. Anonymous Coward
    Anonymous Coward

    It is surprising no one has mentioned the ubiquitous phrase for these situations - "Catch 22".

  21. albegadeep

    "It transpired that the network system didn't need to see much more than a glorified heartbeat to confirm the customer service system was active."

    Have something similar at work. There's a system that absolutely insists that a tape drive be connected before it'll boot. But it's incredibly rare to actually need to USE the tape drive. So there's a tape drive with busted mechanical bits, but fully functional circuit boards, attached to it. (With a note, so nobody will actually put a tape in.) If ever needed, we can borrow a drive from another system for a few minutes.

  22. Malcolm Weir Silver badge

    From the article....

    "On my side of the world," he told us, not at all highlighting a worryingly siloed side to the business, "we had a customer service application, which is where the sales clerks entered the details of new customers.

    What the blue-blazes is "worryingly siloed" doing there? It may be a typo for "sensibly partitioned"?

    [ What idiot would give a rando in a high-street store any kind of access to the operational side of a production system? Nice logical (if not physical) data diodes are your friend... ]

  23. This post has been deleted by its author

  24. Potemkine! Silver badge

    Nobody expects the Spanish Inquisition

    because we had redundant systems, so a total outage was unthinkable

    unthinkable != impossible

    1. Terry 6 Silver badge

      Re: Nobody expects the Spanish Inquisition

      Exactly that. "Unthinkable" means, at best, that they are convinced that they've covered every possibility. It doesn't mean that they have covered every possibility.

      1. Anonymous Coward
        Anonymous Coward

        Re: Nobody expects the Spanish Inquisition

        Expressed as:

        "There are a finite number of things you can think of that may go wrong - and an infinite number of things that could go wrong".

        Often the "unknown unknowns" are constraints that people didn't know (or forgot) that existed in underlying components of a system.

        1. Terry 6 Silver badge

          Re: Nobody expects the Spanish Inquisition

          Yes, that "unknown unknowns" line by Rumsfeld got laughed at. But only by people who've never had to clear up the mess.

  25. Not Entered

    Which came first ?

    The rooster

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like