back to article Level3 switch config blunder blamed for US-wide VoIP blackout

Backbone provider Level3 says an outage that knocked out VoIP service for much of the US Tuesday morning was the result of improperly configured equipment. It seems the outage, which smashed call services offline for much of the country, was not the result of any fiber cuts or facility damage, but rather some classic bad …

  1. xeroks

    impressive demonstration

    It used to be difficult to take out a whole country's telephone network.Now you seem to be able to do it by accident.

  2. Anonymous Coward
    Anonymous Coward

    Update RFO - TL;DR:config change error

    Reason for Outage (RFO) Summary: On October 4, 2016 at 14:06 GMT, calls were not completing throughout multiple markets in the United States. Level 3 Communications¿ call center phone number, 1-877-4LEVEL3, was also impacted during this timeframe, preventing customers from contacting the Technical Service Center via that phone number. The issue was reported to the Voice Network Operations Center (NOC) for investigation. Tier III Support was engaged for assistance isolating the root cause. It was determined that calls were not completing due to a configuration limiting call flows across multiple Level 3 voice switches. At 15:31 GMT, a configuration adjustment was made to correct the issue, and Inbound and outbound call flows immediately restored for all customers. Investigations revealed that an improper entry was made to a call routing table during provisioning work being performed on the Level 3 network. This was the configuration change that led to the outage. The entry did not specify a telephone number to limit the configuration change to, resulting in non-subscriber country code "1" calls to be released while the entry remained present. The configuration adjustments deleted this entry to resolve the outage.

    1. Lee D Silver badge

      Re: Update RFO - TL;DR:config change error

      Some lower being in their IT rolled out a duff config to mission-critical routers affecting some - what, millions? - of customers, because they didn't bother to pre-test, check, verify or anything else on their config change and manage to take down - what? a million? - phone lines.

      Of course, none of this was caught by testing or configuration or change management, and it was only when it got to the top bod who actually knew what he was doing, who started shouting, that someone owned up to putting a stupid config on their main devices without testing.

      This obviously all took hours to happen and fix rather than someone pushing a change to a set of switches they manage, testing them immediately afterwards, and then immediately rolling back when they realised they weren't working as before. Because, nah, forget all that, our customers will tell us if something doesn't work.

      It doesn't matter WHAT scale of business, the same stupid junk happens all over.

      1. BlackKnight(markb)

        Re: Update RFO - TL;DR:config change error

        Im yet to work in an enviroment that actually has a test enviroment for the comms guys.

        app developers, servers sure. but all comms equipement I've ever touched has been "live".

        Makes "testing" difficult. and of course with a config change like that you need realistic load to test with.

        This also sounds like one of those standard changes that gets done multiple times a day and more or less waved through change management. and the "lower being" probably verified by making sure the line looks like his doco then went to the next job, and would have been the last to find out it had gone TITSUP.

  3. Anonymous Coward
    Anonymous Coward

    You can't expect the NSA to splice in the monitoring of voice calls....

    ... and not have some disruption to service!

    1. Anonymous Coward
      Anonymous Coward

      Re: You can't expect the NSA to splice in the monitoring of voice calls....

      They routed all the calls through to the NSA but forgot to route them back out again!

  4. Sloppy Crapmonster

    I have a couple of SIP trunks through Twilio and I could place calls to AT&T cellular (the only service I tried) throughout the outage but not receive them.

    1. Anonymous Coward
      Anonymous Coward

      The problem was with the 911 calls, that had to be routed to ECRCs over tertiary route. Some of those sent over PSTN, received fast busy as the lines become saturated. Everything was FCC reportable, so we shall receive an FCC notification soon.

  5. Version 1.0 Silver badge
    Coat

    Good excuse

    But the truth is the BOFH got hungry and went out to the pub with every intention of finishing the job afterwards ... but you know, that's a damn fine beer - I think I'll have another one - I've got the company credit card in my coat pocket - the next round's on me.

  6. EveryTime

    It sounds like a simple configuration change that was supposed to restrict the call volume to a single phone number (e.g. someone published the wrong phone number for a business, or there was a contest call-in that went bad). Instead they entered it so that it limited the call volume of everything in +1.

    To put it in Unix admin terms, they did

    FILE=tmp/foo

    rm -rf / ${FILE}

    1. CrazyOldCatMan Silver badge

      > FILE=tmp/foo

      > rm -rf / ${FILE}

      Been there, done that. Mind you, this was in my very early days of linux (slakware pre-v1), at 3am, on my own PC..

      Fortunately, ctrl-C worked and I only lost some binaries - the /etc directory hadn't been touched.

  7. Matt Bryant Silver badge
    Facepalm

    When your competition is your friend!

    ".....Level 3 Communications¿ call center phone number, 1-877-4LEVEL3, was also impacted during this timeframe....." Many years ago, I knew the managers for the NOC at an UK telecom, and they admitted to me that they had half their on-call phones on a competitor's network just in case theirs went down, and their competitor vice versa. I assume no-one at Level3 thought about that option.

  8. phuzz Silver badge

    It's hard to tell from their statements if they realise how important these services are to their customers, I mean, they barely mentioned it.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like