back to article Firewall upgrade linked to three deaths after Australian telco cut off emergency calls

Australian telco Optus says its staff may not have followed established processes when a firewall upgrade they conducted resulted in customers not being able to call emergency services for 14 hours – a period during which it is thought three of the carrier’s customers died after trying to seek help, according to the company's …

  1. sanmigueelbeer Silver badge
    Coat

    when a firewall upgrade

    "a" firewall. Singular, i. e. No redundancy.

    The things companies have to resort just to earn a (in)decent buck.

    1. runt row raggy

      I'm not sure you can conclude from the use of the term "a firewall" that there is a lack of redundancy in implementation. it looks more like a monitoring failure to me. always sad when technical failures have real consequences

    2. DS999 Silver badge
      Facepalm

      You're conflating

      "A firewall" as in meaning a singular device and "a firewall" as in meaning the network definition of a firewall.

      They may very well (and I would bet money that they do) have redundant hardware (whether it is active/active with multiple paths or active/passive) and updated first one, then the other as part of this update. You really trust a press release to give you detailed enough information to deduce topology information about their internal network?

      We also don't know what the procedure was that wasn't followed. Did they bypass some change control procedures they normally have in place? Did they not do it during a defined maintenance window? Did they not follow some procedure about notification and testing? Were they supposed to only upgrade part of an active/active environment and if they identify an "intermittent" problem will know that their update had some unintended effect?

      1. MiguelC Silver badge

        Re: You're conflating

        Whatever procedure was not followed, I can only hope they've added "verify 000 is working" to the book

        1. JoeCool Silver badge

          Re: You're conflating

          The other missing proceedure is "Assume reports of 000 failures are correct until conclusively disproved"

  2. TrevorH

    What possible relationship is there between a firewall and calls failing to one specific number when all other calls connect correctly. A firewall might block ALL calls but calls to a specific destination, no, I don't think so.

    1. Flat Phillip

      firewalls and voice are evil

      One possibility is that 000 calls look a little different and are handled differently to normal calls. For example, there could be location information which isn't normally there. The way the calls are routed (e.g. speak to THAT endpoint for those calls) is also slightly different.

      Firewalls can be finicky, but firewalls with voice traffic that is not your bog-standard arrangement (and inside a carrier classifies as not standard) are diabolical.

      Could a firewall block certain calls and not others? You bet it could. They fail in new and exciting and definitely non predictable ways.

      That being said, it could also have just been a typo'ed config line.

    2. DS999 Silver badge

      You might as well be asking "how could connections only to a single network port be blocked, but connections to other ports got through just fine". Yes I know phone numbers are not network ports, but unless you've worked in a telco networking environment you don't know how calls are routed and what sort of information is available for blocking or routing.

      I would guess that they use information about what number is being called to route calls differently, so a regular call goes one place, perhaps a toll free call goes another, "911" goes to another - because clearly it is handled very differently than typical calls. If the 911 calls got routed to the wrong place or had the wrong flags set or whatever, the router they were sent to might have dropped them on the floor as invalid.

      1. MonkeyJuice Silver badge

        Precisely- they will be popping things onto subnets behind the scenes.

        It does beg the question, if they knew the protocol and were able to follow it, why did they not have a better CI smoke test on the live 000 lines.

        Bet they add one pretty soon.

        1. collinsl Silver badge

          Perhaps that's the procedure which wasn't followed - I.E. a full test of all types/codes of dialling (regular home, toll free, mobile/cell, emergency, premium rate etc)

    3. Flying-Fossil

      Sorry I can't answer authoritatively. But there is a special process for 000 calls. For example, telcos have to forward them, even if your SIM has no coverage!! For example I used to live in a remote part of Western Australia where my Vodafone coverage was patchy and seemed to depend on a base station that was installed for a construction project and never decomissioned (Vodafone denied that they provided service in my area, despite me having 3 bars in my backyard). Although I had a vodafone SIM, if I was out of town where there was no Vodafone service, a 000 call would be picked up by Telstra and connected. Long winded way of saying there must be a special and separate protocol for 000 calls.

    4. Graham Cobb

      I have no idea what happened here, but it is easy to imagine scenarios. Modern phone networks use protocols where the signalling tells the network what sort of call it is - not the actual number. Signalling is provided by end user equipment (dumb phones, mobile phones, VoIP, etc) in many different forms and using different protocols, and is validated and heavily firewalled on entry to the telco network (for example, so that people can't pretend to be another operator delivering a call to avoid being charged or traced). Pretending your call is an emergency call might be used by hackers to avoid charges or to cause a denial of service attack, for example. So equipment and firewalls apply all sorts of validations.

      I could easily imagine that if a link had been incorrectly marked as "emergency calls are never carried on this link" a firewall might reject the call. Or if a software upgrade to a firewall broke the configuration somehow I could easily imagine this failure.

      Of course, with hindsight, there should have been (i) proper testing, (ii) high priority alarms generated when rejecting validation for calls claiming to be emergency calls, (iii) proper capture and very rapid escalation of the call centre reports of emergency calls failing.

  3. Anonymous Coward
    Anonymous Coward

    Optus

    A company with an ongoing bad reputation.

    This is their second time at this malarkey.

    People died.

    Time for another CEO to spend more time with the family?

    1. An_Old_Dog Silver badge

      Optus Out, Please

      At least the Optus CEO can spend time with their family!

    2. Anonymous Coward
      Anonymous Coward

      Re: Optus

      second time? there's been numerous high impact issues and outages within Optus over the recent years.

      IIRC they were all due to fat fingers or insufficient testing.

      I wonder if this was an Optus or Nokia person on the hook for this one.

      From speaking to people in vendor land, apparently neither optus nor nokia would stump up the cash to automate this stuff properly..

      Optus view was Nokia should pay - they are paid for an outcome.

      Nokia view was there wasnt enough fat in the contract to cover it.

      Rinse and repeat a few times where the impact was inconvenient or commercially costly.

      Now people are dying due to culturally ingrained incompetence.

    3. Benegesserict Cumbersomberbatch Silver badge

      Re: Optus

      Recent reports indicate the death toll is at least six. They haven't completed their welfare checks on all of the missed calls yet, so that number might continue to rise.

  4. Yes Me
    Coat

    Is it too soon...

    ...to point out that this was a tragic 000PS ?

    1. UCAP Silver badge

      Re: Is it too soon...

      This "humour" is in very bad taste given that people probably died as a direct consequence of this incident.

  5. EricM Silver badge
    Thumb Down

    Optus obviously directly decided to shift blame to employess

    > In a Sunday statement Rue said the company is speaking to staff who performed the upgrade to understand why they did not follow procedures.

    Reads like: "Oh sure, dear public, OF COURSE we have safe procedures that do not allow things like this to happen, IF ONLY these pesky engineers would do what we told them to do. It was them, not us"

    A statement like this, accusing employees to be personally at fault for causing death or injury (and at least hinting at the corporation bearing no responsibility) , should at the earliest be issued after a thorough investigation establishing that fact.

    Issuing this statememt a few hours after an incident on a Sunday is nothing more than a despisable way of preemptive corporate damage control: A corporation throwing their employees under the bus ahead of any real investigation in order to deflect responsibility.

    Pathetic.

    1. Headley_Grange Silver badge

      Re: Optus obviously directly decided to shift blame to employess

      Came here to say the same. You can't start a proper corrective action and improvement review based on the assumption that your procedures are correct and therefore the staff must be at fault for not following them. If they are going into the review with prejudices then "arrogant management" might also be a better start than "staff didn't follow proceducres."

      1. Anonymous Coward
        Anonymous Coward

        Re: Optus obviously directly decided to shift blame to employess

        A Coronial Inquest into the deaths might shine a very public light on just how broken Optus is...

    2. Doctor Syntax Silver badge

      Re: Optus obviously directly decided to shift blame to employess

      Irrespective of whether it was a case of staff not following procedures a company like this, with a critical public service function needs to have a public service ethos, for want of a better term, that ensures everyone is aware of the consequences of actions. Such an ethos starts at the top. If a thoughtless action at any level was the cause that is ultimately due to the attitude of the board and senior management.

    3. Sudosu Silver badge

      Re: Optus obviously directly decided to shift blame to employess

      The truth may come out in the lawsuits.

    4. Concerned but optimistic

      Re: Optus obviously directly decided to shift blame to employess

      They are likely operating under a quality regime (ISO27nnn etc) that mandates audit of compliance with procedures, so they can't just deflect blame to employees, if they never set up a process to audit compliance. Company and management are still to blame.

  6. Anonymous Coward
    Anonymous Coward

    The actions of the corporation leads to the death of three people. Is that corporate manslaughter? Is AU sensible and hold the management responsible, or is like the US and UK where they get paid the mega bucks, but have no responsibility?

    1. Like a badger Silver badge

      Give them credit, they've "expressed remorse".

      Presumably in the form of a final bill to those three customers, with "Sorry you're dead" printed at the bottom.

      1. webstaff

        Admitting liability.

        Hahahahahaha

        Don't so deluded.

        The lawyers would be all over that one.

        They would go so far as to wait outside the person's door to intercept the bill and cut out any indications of sorrow.

        Best outcome here is health and safety equivalent gets them to court.

  7. vogon00

    Commitment required.

    Back in the day when I was playing with the UK PSTN, emergency calls were of paramount importance and were treated as such by any entity with a stake in the 'emergency' process> Things got tested to death during our development / test, and doubly so pre- and post- each upgrade. We all took this seriously - my lot as exchange / system manufacturers, our customers as exchange/network operators and the industry/country as a whole.

    In those days, we had very detailed call records with a Call Termination Reason field (Records were generally used for billing, but they were also very useful telemetry for us!). Standard procedure during our automated testing was to 'grep' call records for emergency calls and examine these CTRs in detail and investigate if anything other than a 'normal' call termination occurred. Us testing types were allowed to be far more creative and to take longer with the '999' tests than with any other feature.

    Admittedly, it was easier to do then, as things were simpler, moved slower, and people generally gave a shit, instead of treating things as an exercise without any real-world consequences for other people.

    Standard procedure both during and after any real-world upgrade included a period of monitoring (Statistically and from human feedback) to catch any outlying 'edge' cases and - here's the rub - explicitly check the operation of the '999' system. In fact, there always was a 'background task' running to monitor the performance of emergency calls. There probably still is :-)

    Reading the article shows someone in Optus-land got it wrong. They don't have the monopoly on cockups in this area, see this. The scale of these two seems similar to the Optus hiccup, and the causes (Human error, procedural error) look similar also.

    Why did I choose "Commitment required" as the title? Because you MUST be committed to avoiding stuff like this. IMO, there are way too many technological fuckups - ones with either actual or potential real-world serious consequences for others, that is - that can be avoided if people gave more of a shit and accepted that this giving more of a shit in the important areas wasn't a financial liability. There are times where it's more importance to be altruistic than make a monetary profit for yourself or others. Allowing people to reliably call for help is one of them.

    Here endeth this too-verbose rant.

    1. Anonymous Coward
      Anonymous Coward

      Re: Commitment required.

      You're absolutely correct, and in general BT in those days did do proper testing.

      Even so, glitches occasionally happened, especially for non-essential services. One that comes to mind was the Directory Enquiries (DQ) service. One area was getting complaints that some people calling DQ were left hanging on a ringing phone for very long periods (no fancy 'you are position X in the queue' then) without having any idea how long they would have to wait. A rule was made that no-one should wait more than 5 minutes, and this was implemented by monitoring the queues and if they got close to 5 minutes new callers were given a busy tone, rather than being left in the queue. This was assumed to affect only a few callers, and those who got ring tone would know that they had no more than 5 minutes wait ahead. Others could try again later, without an indefinite wait on hold.

      Needless to say this all looked great, metrics showed that no-one queued for more than 5 minutes (obviously!), pats on the back all round. Until some months later someone looked at the call records, to discover that the load had been waayyy underestimated, and that 90% of calls to DQ in that area were being given busy tone. Ooops.

    2. Doctor Syntax Silver badge

      Re: Commitment required.

      "Here endeth this too-verbose rant."

      Not at all too verbose. It's all about what I termed in another comment "a public service ethos".

      One thing stands out in the example you linked - difficulties switching to a DR system. The things that people tend to do badly are those they do seldom simply because of lack of practice. DR rehearsals are important, initially to introduce plan to reality so that the plans, rather than reality, can be changed, secondly so that documentation can be filled out by recording what's actually done and finally to give practice to those who will have to carry them out.

    3. Throatwarbler Mangrove Silver badge
      Coat

      Re: Commitment required.

      In fairness, this firewall upgrade was also "tested to death." Unfortunately, the testing occurred in Production.

  8. Flying-Fossil

    Optus have blocked 000 before, there was a major failure to abide by the telco law in 2023. They clearly have their emergency spinners out and working hard, it was reported consistently on ABC as being a 'techical fault', as if checking stuff, having procedures and following them, etc., were not management responsibilities. Clearly a proactive attempt to shovel the blame onto some poor techie so the management can go away scot free, once again, with bonuses intact. By the way, several of the people who could not get to 000 rang optus's call centre. However, being an outsourced overseas operation, they thought nothing of it and just logged the calls leaving Optus none the wiser. As you might be able to tell from the tone of my comment, I am annoyed.

  9. DarkwavePunk Silver badge

    Optus

    i worked in the Australian telecoms industry back in the mid to late nineties. Optus were a bunch of cunts to deal with then. Sadly some things never change. Telstra aren't exactly saints either. Horrible industry as a whole.

    1. Anonymous Coward
      Anonymous Coward

      Re: Optus

      i worked in the Australian telecoms industry back in the mid to late nineties. Optus were a bunch of cunts to deal with then. Sadly some things never change. Telstra aren't exactly saints either. Horrible industry as a whole.

      Back then Optus was Cable&Wireless and full of the very worst US corporate crap which also seemed to have infected Telstra (remember Trujillo and his three amigos?)

      Unfortunately nothing in my experience has changed except for the worse since Singtel long ago acquired Optus but Telstra does seem to have regained a little customer focus.

      A lot of the nonsense around unsupported 4G phones when the 3G network was switched off was the mandatory VoLTE support on which 000 calls depended.

      Previously 112 worked fine even without a SIM in the phone but I am not sure whether that's still the case as I suspect it was a GSM/3G feature.

      1. Tom66

        Re: Optus

        112 hasn't worked without a SIM for many many years as it was vulnerable to crank callers. However, you don't need credit or an active 'account', and I believe it works even if the IMEI is blacklisted. Presumably the emergency call operator can blacklist "frequent flyers" of the service (or potentially schedule a visit by two nice officers who can relieve the individual of their freedom).

  10. Valeyard

    He also vowed to implement an escalation process for any reports of problems with calls to Triple Zero.

    surely this is top of the checklist stuff, even if the rest of the network shits the bed the emergency calls should still be working.

    This statement just shows it for the tacked-on afterthought it is with Optus

  11. Decay

    This is where an independent 3rd party review and audit of the process would be useful. Mistakes get made, it's terrible in this instance lives were lost, you figure out where the mistakes were made, update the process and at least ensure the same mistake doesn't get made again. If it turns out to be an honest mistake, let's say it was a unique edge case no-one ever even thought of, the testing didn't trap that scenario and for the sake of argument, it was humanly unforeseeable, then like most rules written in blood, you update the process, add it to the "thou shall not" list and ensure it is followed.

    However if the mistake occurred because testing was limited due to time/budget constraints or someone took a gamble that it will be fine, or some other foreseeable scenario, then severe punishments should follow. Not to punish the organization that caused this particularly, but to raise it up the risk threshold for other companies. If you want to stop these types of issues reoccurring, the risk of sufficient pain, be it monetary or reputational or even being put out of business, concentrates minds at the right level to ensure sufficient attention is paid to the problem. Pointing the finger at some lowly tech or even middle management person has about zero impact to the company from a commercial perspective.

  12. Grunchy Silver badge

    The phone number for 911 is secret

    I called up the cop shoppe on the non-emergency number this one time, to report some asswipes driving golf balls off the ridge this one time. (They thought they could hit across the river and land on the driving range over there, which charges $7.50 per bucket of balls, and gives you the balls, but these jokers had some stolen ones or whatever and were trying to return them, but were so incompetent they could not even reach the river! And were menacing the hikers walking beneath the ridge!)

    So while explaining all this, I casually ask “oh and by the way, what’s the phone number for 911” to which they snorted, and said (as expected) well yeah that would be 911. So I explained to them, well no, I don’t use the phone anymore, I have the freephoneline.ca which is freeware voip, plus I have the “tablet plan” on the cellular, which has no sms or voice service, and all I got is free voip and they don’t have the 911, so can you tell me the phone number for 911? And the cop says, well there is no phone number only 911. So I said, suppose you phone “out” using the 911 phone, and it registers on someone’s call display, what number does that say? And he says he’s not allowed to say, and I say ok what does that mean, is it a secret? And he says no no, there’s no secret, but nobody is allowed to know, and I say well why is that, and he says they don’t want anybody calling that number, and I say what’s the difference if the phone rings either way, and he says they are not allowed to disclose the 911 phone number, and I say what if I come and do industrial espionage and steal the 911 phone number, and he said good thing I’m calling the non-emergency phone line because stealing the 911 phone number is probably a crime, and I’m like ???

    Meanwhile the golfers walked away, so I’m like, ah Forget The Whole Thing This Conversation Never Happened and he’s like, and what’s your number?

    But I don’t have phone service anymore!

  13. JoeCool Silver badge

    Security by obscurity.

    Don't forget, AT&T was hacked by a cereal box whistle toy.

  14. Soruk

    The son of a good friend of mine was involved in an incident that Thursday morning. While investigations are ongoing, Optus was his service provider.

    Suffice to say, he didn't make it.

    I believe as reports emerge that the death toll due to Optus's catastrophic failure will only increase.

    1. webstaff

      Thats so sad to hear, my condolences to his family.

      I hope they can get justice for all those who have died.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like