back to article Comms watchdog to probe errors that left Brits unable to make emergency calls

Britain's communications watchdog is investigating former state telco BT over a "UK-wide disruption" that prevented some calls connecting with emergency services on 25 June. The technical glitches at the telecoms operator first showed up on Sunday morning at 0830 local time, and the dominant telco was forced to move to a …

  1. Headley_Grange Silver badge

    Dry Run

    It's just a dry run for what's going to happen when BT turns off the copper phone network and everything goes VoIP.

    1. Claverhouse

      Re: Dry Run

      I just renewed --- for two years I think -- my fibre to the cabinet [ BT having recently confirmed any better fibre etc. is not coming near here --- despite a flood of offers from various firms over the last two years for fibre to the house as soon as available ( please register your interest and we will call back asap ) to Vodafone; which has beenOK just about... at a rather expensive £24 a month. Possibly could have got it a pound or two cheaper from another ISP, but it wasn't worth the hassle since nothing would change.

      No new router, but they posted me a gizmo to plug in the back of the old instead of plugging the landline into the wall, thus enabling VoIP instead of copper.

      I don't think it actually works, but didn't care as I have no interest in landlines.

      1. Richard 12 Silver badge

        Re: Dry Run

        Openretch told me there were no plans in the next three years, then ran fibre to the pole outside less than three months later.

        BT couldn't run a bath.

        1. Jim Whitaker
          Angel

          Re: Dry Run

          Rather similar to what happened round here. The Parish had been told to collect pledges of funding (~£1k for each household) to co-fund fibre rollout. Then one day I saw a guy climbing the poles at the foot of the garden and was told that we would all be FTTP in a few weeks. No explanation, no further information, just BT planning at its best. Still it works just fine so I'm grateful for that.

      2. David Hicklin Silver badge

        Re: Dry Run

        My Sky went "out of contract" so went up in price as the "in contract" discounts stopped.

        Their web renewal was broken so spoke to a very helpful (!!!**) person on the phone who not only got me everything back for less than before, I will now have FTTP and openreach were here yesterday fitting the external box.

        ** and yes she really was very helpful....still in a state of shock!

        1. Doctor Tarr

          Re: Dry Run

          @David H. Exactly the same for me. I'd recently received a letter openreach IIRC saying that there were issues at my end of the road so no fibre for me.

          I called sky as the landline had died and then had the same experience as you.

          I went from a patchy 12mb to a solid 350mb+

      3. Martin-73 Silver badge

        Re: Dry Run

        24 pound a month, expensive? That's half what I pay

  2. Anonymous Coward
    WTF?

    Ensure uninterrupted access?

    "Our rules require BT and other providers to take all the necessary measures to ensure uninterrupted access to emergency organizations as part of anti call services offered. They also require providers to take all necessary measures to ensure the fullest possible availability of calls and internet in the event of catastrophic network breakdown or in cases of force majeure."

    This is the typical kind of bullshit spouted by people who have zero understanding of how anything works on a technical level. People who think because something is "regulated" that means everything will be ok.

    There seems to come a point very quickly where systems can (will, and do) fall over. Whether their application is something serious like emergency services or banking doesn't make them exempt from this premise.

    Whether it comes down to incompetence, mistakes or mis-management - and let's be honest if it involves BT that's entirely possible - is a separate matter. But I cannot stand this naivety that critical infrastructure is somehow 100% fault tolerant. It isn't, and no amount of policies, regulations or highly paid consultants are going to change that. Ever.

    It's quite scary how much masking tape and spreadsheets keep infrastructure running in the first place. Add in some human error and nothing is guaranteed.

    1. cantankerous swineherd

      Re: Ensure uninterrupted access?

      I can't remember 999 calls being unavailable any time over the last 40 years fwiw. BT have been cutting corners is my guess.

      1. blackcat Silver badge

        Re: Ensure uninterrupted access?

        There was a hiccup back in the early 2000s as I was working for the company that provided the kit for the emergency network at the time. It turned out to be a config change by BT that ballsed it up.

      2. Ferry Michael

        Re: Ensure uninterrupted access?

        There were a handful of major outages in the 18 months I worked on System X in the 1980s that resulted in loss off 999 calls. Some were hardware failures.

        One was a configuration error related to overload management, where 999 calls were impacted by a general overload. A bug report was raised related to the "New Faces" television programme. The overload caused by telephone voting for the programme knocked out 999 calls.

    2. Claptrap314 Silver badge

      Re: Ensure uninterrupted access?

      Certainly, "forcing regulation" is a bunch of political BS. But this isn't Microsoft, so serious reliability engineering can, assuming this was a software problem, keep the chance of something like this vanishingly low. It is NOT easy, however, and without a serious look under the hood, I could not venture to guess as to what exactly they did wrong.

      We had a global distributed key store at Google when I was there (2015-6) that only went down when the SREs took it down--one minute a quarter, so that people would not build apps that depended on it always being up.

      I consider six 9's to be theoretical as an SLA unless you are in something like a single-site manufacturing facility, but if you engineer for it, five 9's is quite doable by a competent team.

      1. Anonymous Coward
        Anonymous Coward

        five 9's is quite doable by a competent team

        You seem to have forgotten this incident involved BT. Which means no competent team could have possibly been involved.

        And we're only talking about three 9s anyway.

      2. Anonymous Coward
        Anonymous Coward

        Re: Ensure uninterrupted access?

        What BT mostly did wrong was not tell anybody. The NPCC was furious because they found out via Sky News, as did Whitehall. The backup system is not good at identifying the location of mobile phones, so calls take longer to assign to the correct force.

        1. 42656e4d203239 Silver badge

          Re: Ensure uninterrupted access?

          >>The backup system is not good at identifying the location of mobile phones

          Then it is not a backup is it? It is a work around at best or, more likely, given the fact BT are involved, its a feat of arse...

    3. Stuart Castle Silver badge

      Re: Ensure uninterrupted access?

      Re: "This is the typical kind of bullshit spouted by people who have zero understanding of how anything works on a technical level. People who think because something is "regulated" that means everything will be ok."

      Regulation enables them to blame someone. All that will happen is that at the end of the inevitable inquiry, they'll announce "Lessons must be learned" and do nothing else.

      If it turns out someone has died, or been seriously injured as a result of this, the same will happen.

    4. John Robson Silver badge

      Re: Ensure uninterrupted access?

      There is 100%...

      and there is taking 90 minutes to switch to the backup system... that's not even 4 nines.

    5. Martin-73 Silver badge

      Re: Ensure uninterrupted access?

      When it goes fully VoIP there will be NO backup.... 999 calls won't work if your power is off. (unless you provide your own backup supply)

      1. Ferry Michael

        Re: Ensure uninterrupted access?

        People make most of their calls on mobiles.

        I can't imagine many 999 calls from VoiP at home. The use of AML in mobiles to provide location information in 999 calls will probably be a better option when landline numbers are not tied to a physical location.

        And from my experience of switching to FTTP on JohnLewis/Plusnet it is very difficult to keep a landline number and transition to VoIP. Their support staff did not even know how it could be done.

        1. stewwy

          Re: Ensure uninterrupted access?

          On Zen everything went incredibly smoothly, email a couple of weeks before to plug the phone into the router instead of the wall at a certain time, and it would work, it did.

          Keeping my number was simple, as was adding a DECT phone, which I had lying around in the box of tech crap that I might get round to using when I can be bothered to fiddle with it.

          Mind you the modem they send is pretty decent, mesh networking available (I rolled my own which worked, with the router and I saved a few quid a month).

          Unfortunately, it doesn't have an OpenWRT image available for it, which is the only downside. But it's fast, gets updates frequently enough, and the usb3 port works as my home NAS. So there is that.

  3. Keith Langmead

    Government expectations

    I love the comment from the government reported by the BBC - "The government has said it took BT nearly three hours to alert ministers to the problems it was experiencing."

    which seems like a typical non-technical failure to grasp the idea of priorities. If the issue started at 08:30 on Sunday I'd take a wild guess that the only relevant BT staff working (either actively or on call) would be the engineers, and they'd be busy trying to fix the problem. They're engineers, so they're sure as hell not gonna be calling Whitehall who likely can't contribute anything practical to fixing the issue, they'll just be updating their immediately line managers. I imagine you'll have to go up a few levels of management (none of whom would be actively working on a Sunday morning, so possibly not immediately contactable) before you reach someone with the authority to speak direct to the government on behalf of BT. They don't indicate whether any status pages or similar were updated with information, but that's presumably more likely to happen in the short term, than finding someone willing and senior enough to place the call, especially when initially they'd likely have zero information to pass on anyway.

    Standard "I can speak to you about the issue, or I can fix the issue... I can't do both".

    1. Claptrap314 Silver badge

      "I can speak to you about the issue, or I can fix the issue... I can't do both"

      That's a reasonable attitude if your tech team consists of you and your manager. But this is BT, and we are talking about EMS.

      For even slightly mature incident management, external communications is generally the first responsibility that gets spun off by the incident commander--usually to the nearest manager, it keeps them out of your hair & lets them be the reassuring face to the clients while the workers are trying to figure out exactly what phase the moon is right now.

      For something like EMS, weekend staffing also shouldn't matter much. Given the scope of the problem, the pagers should have been hitting everyone within half an hour--including the first line manager, whom, as mentioned above, is going to be responsible for communication. That includes understanding what the regulatory requirements are for communication, and calling up whomever as necessary.

    2. Anonymous Coward
      Anonymous Coward

      Re: Government expectations

      "The government has said it took BT nearly three hours to alert ministers to the problems it was experiencing."

      I would imagine they all a little <cough> hungover partying at Glasto, aside from Sir Moggie who would have been marshalling his own flock in preparation for the first service of the day. Seriously, I would have expected OfCom and the Cabinet Office to be informed promptly, as per their checklist, and for OfCom/CO to alert ministers

      "the only relevant BT staff working"... would be the network operation centre(s), who would be seeing any 999 alarms at the top of the priority list. I'm guessing there were no alarms until someone/something noticed that the call traffic to the emergency services had dropped to zero (assuming that they flag an abnormal levels of failed and prematurely terminated calls as an alarm)

      1. Jellied Eel Silver badge

        Re: Government expectations

        "the only relevant BT staff working"... would be the network operation centre(s), who would be seeing any 999 alarms at the top of the priority list. I'm guessing there were no alarms until someone/something noticed that the call traffic to the emergency services had dropped to zero (assuming that they flag an abnormal levels of failed and prematurely terminated calls as an alarm)

        How many Ministers do we have now, and how many would really need to be notified? Does the Min of Ag & Fish, or "Levelling Up" need to be notified? Or just the Home & Cabinet Office? And also why BT given they're a service provider, and the service users, ie Police, Fire, Ambulance and Coastguard would have hopefully been aware of the outage pretty quickly. And then what Ministers could do to help. Get the MoD to issue distress flares? Or worse, seize the opportunity to 'modernise' the service.

        Having done some work on this in the past, I know BT and every decent telco takes 999 services seriously, as do the customers because it's a safety-of-life issue. But it's been complicated a few times due to politics, like the infamous FiReConTrol* plan to shrink the number of fire & rescue ECC's down to five, and outsource a lot of it. That was fun because I refused to design a cheap solution based on ADSL to the fire stations. And a memorable quote from a senior fire officer telling me that nobody wanted this other than Prescott and the PFI bidders.. And then there were the E.911 conversations where VoIP operators would chuck the problem of accurately locating callers over the fence to BT, and expected them to figure it out.

        I suspect this will be down to an IP issue, and the important issue's really why it failed, and how long it was down for, not how long it took to notify ministers.

        *I may have capitalised this incorrectly, but it was capitalised weirdly. But it was modern!

        1. Richard 12 Silver badge

          Re: Government expectations

          Failure to notify is pretty serious, because it proves there were no managers paying attention.

          It also implies that BT themselves didn't know for rather a long time.

          Disaster response for this kind of thing is supposed to be two-track - the management tell all the users and internal support staff that the system is down and they are using the backup which has known limitations x, y and z, while the technical folk get to work fixing it.

          If you don't have that first track, two really nasty things happen:

          1) The system users don't know it's failed, only "It's been pretty quiet for a while", and/or despatching a few ambulances to completely the wrong place.

          2) When the users discover or start to suspect that it's failed, they start hammering in reports and complaints, distracting the technical folk and potentially even pulling them away from fixing it.

          This is also why service status pages are necessary.

  4. Mr. V. Meldrew
    Facepalm

    Local time?....

    Please enlighten me...

    "On Sunday afternoon, local time, the Metropolitan Police...."

    Reminds me of the joke about the local police station having their toilets stolen. The Police commented later that "they had nothing to go on".

    Toodle pip.

    1. sanmigueelbeer Silver badge
      Joke

      Re: Local time?....

      (This actually happened somewhere in the UK.)

      A pensioner called the nearby police station to report a break-in. The police replied with, "We're busy at the moment. Can you please call back a few hours from now?".

      A few minutes later, he rang the police again and said, "Don't worry (about the call). I have taken care of them." and hung up the phone.

      Within minutes, with sirens blaring from several vehicles, in bullet-proof vests, full riot gear, the police came screaming around the pensioner's property and arrested the burglars.

      "You said you took care of them," spluttered the police.

      "And you said you were busy," replied the pensioner.

  5. Anonymous Coward
    Anonymous Coward

    Finally TheReg posts an article...

    I know you've gone all American recently, but this was a pretty big story.

    As someone in the know about these things I value the commentard's opinions very much.

    This kind of thing is going to be increasingly common. Wouldn't have happened back in the heyday of Nortel.

    Anonymous, for reasons above.

    1. Martin-73 Silver badge

      Re: Finally TheReg posts an article...

      gah bring back SxS :)

  6. spold Silver badge

    Obviously....

    Your call is really important to us [plays muzak]

    1. R Soul Silver badge

      Re: Obviously....

      You forgot to include the adverts for BT Broadband interrupting the muzak.

      1. David Hicklin Silver badge

        Re: Obviously....

        Sky offered me several choices for the type of music whilst on hold i.e pop, classical, rock etc

        1. Anonymous Coward
          Anonymous Coward

          Re: Obviously....

          Nice to know. Too bad they didn't offer an option to immediately put you through to someone who could fix your complaint. However that would mean the designers of these call centres got off their arse and did some proper work instead of finding new ways to annoy and frustrate their victims. Which simply cannot be allowed to happen of course.

  7. TimMaher Silver badge
    Coat

    66 years.

    That is how far back I remember our first telephone, $TownName $Number.

    I have never heard of this happening before.

    The management at BT should face very serious consequences.

    But, they won’t.

    Sigh.

    1. R Soul Silver badge

      Re: 66 years.

      I wonder how many useless twats from BT/Openretch/Ofcom/Westminster will emerge to assure us "lessons have been learned"?

    2. Headley_Grange Silver badge

      Re: 66 years.

      One failure in 66 years and you want someone sacked?

      1. Anonymous Coward
        Anonymous Coward

        Re: 66 years.

        FFS! Sacking whoever is responsible for this epic fuckup is reasonable. Though nobody here has actually suggested that. What sort of punishment would you consider appropriate for a massive outage of a nationwide life-or-death emergency service?

        It doesn't matter if there hadn't been a failure in 66 years either. The service did fail. Bigly. Heads must roll. But they won't - as usual.

  8. Kistelek

    999 not working? Just dial oh one one eight nine nine nine eight eight one nine nine nine one one nine seven two five three

    1. Anonymous Coward
      Anonymous Coward

      Then type # followed by three eight two five two seven *

      1. blackcat Silver badge

        Which country and I speaking to?

      2. Martin-73 Silver badge

        *looks at 746 on desk... what is this # and * you speak of?

  9. Anonymous Coward
    Anonymous Coward

    Hull?

    Was everything working fine in Hull?

    The one area that doesn't use BT I think???

    1. Anonymous Coward
      Anonymous Coward

      Re: Hull?

      If I recall not being on BT is on the risk register for the local police force..

    2. Anonymous Coward
      Anonymous Coward

      Re: Hull?

      Kingston upon Hull is the one area that isn't BT/Openreach in the UK yes, KCOM are the incumbent provider there, however KCOM use BT for 999 services.

      KCOM connect to BT in Hull to enable BT to provide the 999 services for them, they got a large fine back in 2017 for a four hour outage in Hull back in 2015. It turned out back then all the routes that KCOM had into BT relied on a single BT exchange next to the river in York which got flooded. KCOM rather than BT got the fine for not having the appropriate resilience in place, be interesting to see what Ofcom does this time when it was a much larger outage.

      Details of KCOM fine and issue: https://www.bbc.co.uk/news/uk-england-humber-40860169

      As others have said I can't remember an outage like this before, South Yorkshire Police failing to answer the call when put through by the BT Operator yes but issues on the BT side no.

  10. Ochib

    It’s not DNS

    There’s no way it’s DNS

    It was DNS

  11. Will Godfrey Silver badge
    Unhappy

    Spooky

    LESSONS will be LEARNED

    -V-

    LIVES will be LOST

  12. Adam JC

    Obligatory IT Crowd Reference

    I wonder if the emergency services could still be reached via... "0118 999 881 999 119 725 … [long pause..] 3" :-)

    https://youtu.be/HWc3WY3fuZU - For those of you not familiar. (Shame on you!)

  13. Anonymous Coward
    Anonymous Coward

    Backlog of calls?

    If you need 999 and can't get through, then try again later causing a backlog hours later.....you didn't need 999

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like