back to article Gas supplier blames 'rogue' code for Channel Island outage

The small island of Jersey's natural gas supply is still switched off five days after a software problem caused its main facility to failover to a safety mode, leaving engineers struggling to reinstate supplies to homes and businesses. On Saturday, the island off the coast of northern France lost its gas supply. The following …

  1. KittenHuffer Silver badge

    Million to one chances ....

    .... happen 9 times out of 10! - PTerry

    1. Lee D Silver badge

      Re: Million to one chances ....

      But what if it's not EXACTLY a million to one?

      1. M.V. Lipvig Silver badge

        Re: Million to one chances ....

        Non million to one shots fail. Every single time.

    2. Anonymous Coward
      Anonymous Coward

      Re: Million to one chances ....

      Especially if you’re a plant in Gaza.

      Perhaps the hackers got mixed up between Jersey and Jaza…

      1. Excused Boots Silver badge

        Re: Million to one chances ....

        Easily done!

    3. Drat

      Re: Million to one chances ....

      I will now keep a lookout for anything coming from Mars

    4. jmch Silver badge

      Re: Million to one chances ....

      Love the Pterry reference......

      .... and I wanted to note, saying something has a "million to one" chance of occurring is meaningless without any event-related or time-related reference frame. For example if a software controlling a physical process is pinging some sensors and making some adjustments once every minute, it's going through a million cycles in less than 2 years, so "million to one" chance of one cycle screwing up really means a certainty of failure (one would expect industrial equipment to be in operation for decades rather than years). Without a reference framework, "million-to-one" seems to imply "over the operational lifetime", which is wildly optimistic.

      Directly related to the above, (and since I've been re-reading Feynman's "What do you care what other people think?") with reference to his investigation of the shuttle disaster, his observation was that because of certain psychological and organisational characteristics that are in-built in human psychology and business organisational structures*, with every step up in management level, there is an order of magnitude change in the belief of probability of bad things happening. So if the official spokesperson from the C-level says it's a million to one, the VPs think it's 100k-to-1, the middle managers think it's 10k-to-1, the team leads think it's 1000-to-1 and the devs actually working on it *know* it's probably closer to 100-to-1.

      * which means it's highly likely that, unless the organisation recognises it and takes specific steps to counter it, that this issue will be present 'by default' in any company, government or organisation

  2. Anonymous Coward
    Anonymous Coward

    Clusterfsck

    1. cyberdemon Silver badge
      Devil

      Curious

      Obviously incompetence has to be considered before (and despite) malice (if you left the door open and your company got burgled, then the blame lies jointly between you and the burglar)

      But it is curious that this event happened so close to a second mysteriously exploding gas pipeline between Finland and Latvia this week.

  3. Dan 55 Silver badge
    Facepalm

    Well that was like reading an error report from a user

    I mean, who knows what "rogue code" is.

    The only thing you can be sure is if the main system fails and the backup system fails in exactly the same way, it's not the because the chances are "like winning the EuroMillions" lottery, it's because the same software failing in exactly the same way on the exactly same input.

    1. KittenHuffer Silver badge

      Re: Well that was like reading an error report from a user

      Pretty much the same situation that took down NATS recently

      1. Strahd Ivarius Silver badge
        Trollface

        Re: Well that was like reading an error report from a user

        Could it be the same software?

        1. Androgynous Cupboard Silver badge

          Re: Well that was like reading an error report from a user

          Of course. Some plugged the gas network into air traffic control, and the air-traffic network into the gas plant. Rookie error, they should have used different connectors

          1. tip pc Silver badge

            Re: Well that was like reading an error report from a user

            who, me?

            or

            on call

    2. John Miles

      Re: Well that was like reading an error report from a user

      If someone told me their system had "rogue code" I'd expect it be an Easter Egg, logic bomb or supply side attack

      This sounds like a bug and someone is worried about being sued

    3. Herring` Silver badge

      Re: Well that was like reading an error report from a user

      It's a mistranslation and a typo - from the French Code Rouge

    4. Roland6 Silver badge

      Re: Well that was like reading an error report from a user

      > “it's because the same software failing in exactly the same way on the exactly same input.”

      Agree, it is usual for DR to be between systems from different suppliers, even in fail safe environments. On British Railways, for example, it was only considered for Solid State Interlocking.

      It is this, same and immediate failure which lends weight to the “rogue” software being part of the original build and not third-party stealth ware.

      1. Ken G Silver badge

        Re: Well that was like reading an error report from a user

        Usual in utilities but maybe the small population/customer base drove a cut down solution. I've seen it happen in places bigger than Jersey.

    5. Excused Boots Silver badge

      Re: Well that was like reading an error report from a user

      Exactly, no it's not a 'million to one chance' it's actually an absolute certainly, because both systems are obviously vulnerable to the same issue.

      Rogue code? Surely a misprint for 'rouge code' - see it's the Russians, Putin's fingerprints all over this!

      But seriously, I've seen this sort off thing before, a virtualised server is replicated to an offline system which will fire up and cut in if there is a failure of the primary server. All good, unless a dodgy update or some malicious event happens to the primary server, and this gets replicated to the secondary. So when the primary system falls flat on its face because of said dodgy update, the secondary fires up and promptly falls over as well for the same reason.

      Cue my being in a meeting with the C-Suite guys saying to them, 'you do remember that time when I told you that this wasn't necessarily a good idea, replication is not actually the same as backup........, oh you don't remember. Luckily for me I keep copies of all the emails I sent you warning of such a possibility, oh and the read receipts which you failed to disable and would indicate that you had received and read said emails.*.

      * Always cover yourself, and yes a read receipt doesn't imply that the person has actually read and understood the implications of what is being said. However, legally; at least on the right side of the pond and I'll assume the same on the left as we share a common legal framework; it's going to be a hell of a lot harder to show they won't aware.

  4. Pascal Monett Silver badge
    Stop

    "because of the code"

    No. The code is not responsible here. It may be the direct cause, but the real responsible is the idiot who bungled the specifications and did not do sufficient testing to ensure that the code would work in extreme situations.

    Blaming the code is easy, but code is just the materialization of a list of scenarios. If you have a scenario that was not foreseen, then the code is likely to not solve the problem.

    Somebody didn't foresee this, or didn't put in the effort to check if the code was viable in that situation.

    It's still not the code that is at fault.

    1. DevOpsTimothyC

      Re: "because of the code"

      Not to mention that code is often a reflection of the corporate culture.

    2. Sandgrounder

      Re: "because of the code"

      It's not the code, the developers or even the analysts and designers. It clearly is the testers at fault for not running all one million scenarios, documented and undocumented.

      That's why we have test managers. Someone to blame when things go wrong.

    3. Roland6 Silver badge

      Re: "because of the code"

      > It's still not the code that is at fault.

      I would agree, as it seems the failed over system also did a fail safe shutdown: “ the plant turned itself off to protect the network,”

      If the code was “rogue” I would not expect both systems to fail safe.

      However, I wonder whether the first system in turning the plant off also turned off the power supply to the failover/DR system…

      1. FirstTangoInParis Bronze badge

        Re: "because of the code"

        Teams I’ve worked in now routinely test far more for failure than success. Program a system to throw garbage input, test race conditions, 24/7. Then see what broke and fix it. As above, a simultaneous failure suggests the backups didn’t use different clean room code. This is therefore a rogue management problem, not code. Computers are just rule following idiots.

  5. Anonymous Coward
    Terminator

    A software problem caused facility to failover to safety mode?

    Meaningless pseudo techno babble.

    "All of them failed at exactly the same time because of the code"

    I don't believe it!

    “Rubis Énergie and Rubis Terminal continued to deploy their collaborative software for the preventive maintenance of facilities (computerized maintenance management system). Once the relevant information has been loaded into the database, these systems allow the planning of monitoring and preventive maintenance work.”

  6. MOH

    So exactly the same chance as anything coming from Mars?

    But still ...

    1. Evil Scot Bronze badge
      Joke

      Ohhhhhhh Laaaaaaa... La.

  7. disgruntled yank Silver badge

    One in a million

    From the early days of microcomputers, there is the story of the coder stating that something had a one in a million chance of occurring, and another saying that at the then clock speeds the millionth time would occur would occur "tomorrow morning".

  8. Anonymous Coward
    Anonymous Coward

    Euro-balls

    "All of them failed at exactly the same time because of the code," Cox said, describing the probability of the occurrence as "like winning the EuroMillions" lottery.

    Yeah, if you've stuffed the Euromillions draw machine with balls of all the same number.

  9. hairywill
    Facepalm

    statistics is just too hard for some people

    'the probability of the occurrence as "like winning the EuroMillions" lottery'

    and regularly people do

    1. Pascal Monett Silver badge

      Yeah but, those who deem themselves superior will counter with the proven fact that, mathematically, your chances of winning are negative.

      And all the people who have already won will get a good chuckle out of that.

      I really would like to be of those who can chuckle.

      1. Benegesserict Cumbersomberbatch Silver badge

        0 <= Pr(anything) <= 1

        (Something happening twice is a thing. Something that happens then unhappens is also a thing.)

        1. Ochib

          Anything that happens, happens.

          Anything that, in happening, causes something else to happen, causes something else to happen.

          Anything that, in happening, causes itself to happen again, happens again.

          It doesn't necessarily do it in chronological order, though.

  10. DevOpsTimothyC

    Incentives

    I wonder if the supply issue would still be ongoing if every day the company isn't supplying consumers was a day the CEO doesn't get pay, bonuses or similar incentives

    1. Ken G Silver badge

      Re: Incentives

      I doubt that would speed up recovery, it'd just make it funnier to watch.

      1. CrazyOldCatMan Silver badge

        Re: Incentives

        I doubt that would speed up recovery, it'd just make it funnier to watch

        And give the techies a bit of leverage.. "You know, that pay rise of ours that you turned down while awarding yourself a 10% rise - you might want to re-think because our low wages are very definately causing low productivity. And good luck firing us and trying to find someone else to fix things.."

  11. Anonymous Coward
    Anonymous Coward

    Is there an agreed decent period of time before one can star in "Who, Me?"

    Asking for a friend.

    1. M.V. Lipvig Silver badge

      Long enough that nobody cam figure out who you are ot, barring that, a couple of years past the statute of limitations.

  12. munnoch Silver badge

    Days or even weeks to turn it back on?

    WTAF!!!

    1. Richard 12 Silver badge
      Mushroom

      Re: Days or even weeks to turn it back on?

      It takes a long time to repressurize all the tubes.

      It's not a big truck.

      All the gas appliances have a minimum safe operating pressure, they need to somehow get it back up to that pressure before each customer turns their gas supply back on.

      They'll also need to get all the air out of the system that will have inevitably leaked in via all the gas appliances across the network, as otherwise... see icon.

      1. PRR Silver badge

        Re: Days or even weeks to turn it back on?

        > Re: Days or even weeks to turn it back on? .... It takes a long time to repressurize all the tubes. ....gas appliances have a minimum safe operating pressure, .... get all the air out of the system that will have inevitably leaked in via all the gas appliances across the network, as otherwise... see icon.

        I dunno how they do it in Jersey, but in NEW Jersey most/all older gas burners had Pilot Lights, a small standing flame. When the gas goes to zero, the pilot goes out. When the gas comes back, all those pilot lights bleed UN-burned gas into their rooms: basement, kitchen, etc. We were always told if we smell gas to NEVER use a (wired!) telephone or an electric switch, or we go BOOM.

        The expectation was that a gas system would "never" go dry. When gas came to town, all houses start with gas OFF. Then they go around knock the door, turn-on the gas, then run to all the gas appliances to light the pilots. The few minutes this takes is not enough gas to explode.

        What could go wrong?

        My water heater is on a pilot. My new furnace has a hot-coil, no pilot. My gas fireplace goes both ways: in summer I can spark it with a battery; in winter I leave a pilot going because the heat is a benefit.

        1. Benegesserict Cumbersomberbatch Silver badge

          Re: Days or even weeks to turn it back on?

          Well, the gas appliances around here have pilot sensors in the flame which cool down when the pilot goes out and shuts off the gas valve.

          Holding open the ignition valve button while it heats up again can make your fingers really tired.

          1. collinsl Silver badge

            Re: Days or even weeks to turn it back on?

            If you even have a button - lots of modern boilers operate entirely on sensors and don't have any way for you to control the gas at all.

            Most of these rely on a pressure switch though to determine if it's safe to attempt to light since they don't operate a pilot flame any more to save energy.

      2. Roland6 Silver badge

        Re: Days or even weeks to turn it back on?

        >” It takes a long time to repressurize all the tubes.

        It's not a big truck.”

        That is the approach however, when dealing with water mains, as the big trucks/tankers permit the network to be filled from multiple high points, before the pumps are turned on to pressurise it.

        This approach also helps to minimise the residual trapped air.

      3. localzuk

        Re: Days or even weeks to turn it back on?

        Tubes? I didn't realise they got gas over the internet in Jersey.

      4. munnoch Silver badge

        Re: Days or even weeks to turn it back on?

        How did the gas network suddenly find itself empty?

        If there was no new gas going into it then as you say all the appliances would eventually turn off due to low pressure. The low pressure threshold would be some amount above atmospheric, so the pressure in the network would be low but not zero.

        Eventually it might balance out at atmospheric pressure due to leakage, days or weeks later. Then fresh air might get in via percolation, but doesn't seem like this would be a massive effect.

        This is a pretty catastrophic failure mode. And, if, as you say, the consequences or trying to black start it without all/majority of your customers complying with instructions is a big fireball then that's really hard to believe.

        <Gallic shrug>

        1. Roland6 Silver badge

          Re: Days or even weeks to turn it back on?

          Useful reference: ” Gas supply emergencies - consumer self isolation and restoration (SI&R) of gas supply”:

          ” The gas network cannot simply be switched back on since the order in which premises are restored must be coordinated to ensure that pressure in the network is maintained.”

          It would seem as the Jersey network will have been off for more than 24 hours and impacts circa 4,000 customers, safety concerns dictate a controlled restoration of supply.

          At the domestic level, if people haven’t isolated their home pipes from the mains, so they retain pressure, there is an increased risk of there being a combustible gas-air mix in the pipes, when mains supply is restored. So yes, turning gas back on does have an increased risk of explosion. When my local gas main was repaired recently, all houses on that segment had a first engineer visit to turn off their gas supply, and when the supply had been restored, a second engineer visited to check pressure and that all appliances (boiler, cooker, fire) were operational. Naturally the windows were left wide open for a few hours to permit vented gas to escape…

          Aside: This sort of answer is going to challenge AI, as it requires a deductive leap to link your query to the relevant HSE document.

  13. Strahd Ivarius Silver badge
    Devil

    Is it a plot by the goverment...

    ... to find radical ways of reducing the bill for end-users?

    1. Ken G Silver badge
      Angel

      Re: Is it a plot by the goverment...

      I love your confidence in the Baliff's ability to plot in secret without the whole Bailiwick finding out.

  14. DS999 Silver badge
    Facepalm

    "All of them failed at the same time"

    That's not a million to one coincidence if they were all using the same sensors and running the same code to determine if the condition that causes a shutoff had been met.

  15. Ken Moorhouse Silver badge

    Cox called on local residents to pull the lever next to their gas meters

    In the upwards direction?

    That sounds like a cock up.

    1. TimMaher Silver badge
      Headmaster

      Re: Cock up

      But only if you turn the meter through 90 degrees.

      1. Ken G Silver badge
        Trollface

        meter?

        Mine's just a foot (but I seldom use it as a rule)

  16. Conundrum1885

    Black swan event

    An outrage this large is a very unusual event.

    To be caused by a computer glitch is virtually unheard of.

    Makes me wonder if they tried to patch something and if all went Strange Loop.

  17. Tron Silver badge

    This is becoming way too common.

    Japan's banks' clearing system fell over the other day with reversion to back-up systems for two days. I'm not sure it even made it to El Reg's news pages. I may have missed it.

    https://japantoday.com/category/business/Japan-bank-payments-clearing-network-disruption-continues-for-2nd-day

    There is a real need for both a way of rebooting stuff easily and quickly when it falls over, and a recognition that everyone needs a viable Plan B. That may be an Alt-System that can supply barebones functionality after malware locks you out of your main kit, or paper, phones and human beings. Lazily relying on Plan A needs to cost serious money in compensation payments. Only the financial pain from that will nudge some companies into having a reliable alternative.

    The belief that a system will not fall over, and the inclusion of poor fail safes in the belief that they will never kick in, are amateurish.

    Instead of the endless whining about privacy, we need to focus on resilience.

    1. Martin-73 Silver badge

      Re: This is becoming way too common.

      Why not both?

    2. M.V. Lipvig Silver badge

      Re: This is becoming way too common.

      Corps have a really bad habit of assuming that if it never happened, and a backup plan costs money, that it's not worth spending money for a maybe. Then, when it does happen, won't spend the money because what are the chances it'll happen again?

      Now make the C-suite personally liable...

  18. Henry Wertz 1 Gold badge

    Airplane safety

    Just to point out on Airbus and the like, this is why some of the safety-critical redundant systems, if there's 2 or 3 of some sensor for instance they'll be from 2 or 3 different vendors. It avoids the situation where if failure is due to some design flaw, software flaw, or manufacturing flaw, it's possible for your redundant systems to all succumb simultaneously to the same flaw.

    I don't realistically expect gas plants, power plants, etc. to get redundant systems from 2 different vendors, but anyway in some limited cases that's actually done.

    Of course, maybe they really asked some programmer "How likely is this to happen?" "Million to once chance" (but the code runs like every 5 minutes so it'd run a million times in just over 9 years.) There's been a few kernel bugs (in the unstable versions) that would essentially have a million to once chance of triggering, but if it's in some driver that's pushing like 10,000 packets or screen draws or whatever a second that means kernel panic within a few minutes.

  19. Anonymous Coward
    Anonymous Coward

    Some blame the management, some the employees.

    Everybody knows it's the industrial disease.

  20. Zack Mollusc

    still wondering

    what "the power had seized in the plant" means.

    The plant lost its connection to the grid?

    The grid itself was down?

    The plant generates its own electricity from gas and the generator's bearings failed?

    Anyhoo, I have been running simulations on a global network of supercomputers and there is a possibility of maintaining gas supply and pressure even when the computers fail. This radical new concept takes the form of a large gas-tight storage cylinder which can telescope vertically. Even when the myriad of excel spreadsheets being used as databases, process control and mission-critical safety monitoring become unavailable, the weight of the gas storage unit's upper section bearing down on the volume of gas below will maintain gas pressure to the consumers until either the problem is resolved or an orderly shutdown can be initiated. Scale the volume of stored gas to the rate of consumption to give the amount of time needed to avoid loss of supply to consumers.

    Although the luddites infesting this site will mock this system as unworkable as it fails to use either blockchain or AI, I think it is worth trying wild crazy ideas just in case they work.

    1. Giles C Silver badge

      Re: still wondering

      You mean to rebuild all the gas cylinders everyone has been ripping out over the last 30 or so years.

      They probably weren’t seen as modern enough as they worked on gravity, there used to be one outside where my grandparents lived and depending on the time of year was anything from 20 to 100 feet tall

      1. david 12 Silver badge

        Re: still wondering

        They were used because the system of generating Town Gas from coal had no excess capacity: supply was built up during hours of low demand, then used during hours of high demand.

        The natural gas wells have higher-than-demand capacity, and are buffered by long runs of transmission pipes. Gasometers are no longer required*.

        *(explain OP joke here)

      2. Roland6 Silver badge

        Re: still wondering

        From what I can gather we are effectively rebuilding them but out of battery packs to support the electricity grid which will rely more on fluctuating generators.

  21. Anonymous Coward
    Anonymous Coward

    Script Kiddy From Outer Space

    A system always has a chance of failure. What's your point?

  22. Vlad
    Mushroom

    I wonder if they use SCADA?

    https://www.sciencedirect.com/science/article/pii/S0167404822004205

    I must admit that I haven't read it all.

  23. Snowy Silver badge
    Holmes

    Not much of a back-up

    If the same failure took out the main system and the back-up!

    Did a rogue engineer write the rogue code?

  24. Nifty

    It was an upload of bad French gas.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like