Somebody plugged a 4-mbit card into a 16-mbit ring?
... turned off spanning tree protocol?
Wyoming is the latest US state to formally probe CenturyLink's network outage, which black-holed 911 calls over Christmas. America's comms watchdog the FCC, and regulators in Washington state, are also investigating the blunder – asking exactly how it happened, and why it took so long to resolve – along with Wyoming's Public …
If it was a spanning tree issue, they might not have root guard on the core. I have seen a customers kit take the root away from an isp / farm supply provider.
I don’t work for these guys but I have seen them pull all nighters just like the rest of us. I would like to see the equipment vendor in the hot seat as well. When you pay that kind of money for networking kit, “odd, we have never seen it do that before” just doesn’t cut it.
Time for the window, cattle prod and roll of carpet.
Vendor: You installed the kit, correct?
Tech: Yes.
Vendor: It was new in the package, correct?
Tech: Yes.
Vendor: So you opened the package, correct?
Tech: Yes.
Vendor: Your honor we move for immediate dismissal of the court case. The box clearly states that by opening package that the customer agrees to the license agreement. The license agreement clearly states we are not liable for any cock-ups caused by the kit beyond the value of its original purchase cost.
thanks to a little "mine is bigger" argument between two branches of government, some 15 agencies of the US government are in the process of closing due to no budget. the FCC is basically shutting down everything except oversight of life safety and the sale of spectrum.
Normally you wonder if this was a BGP route reflector doing stupid crap. These days it's just as likely to be an MPLS route reflector too. While it's been a long time since I had enable on what was then Global Crossing, they did suffer some MPLS traffic engineering issues late 2000 caused by Cisco's inability to support more than 10000 LSP's per router (as I remember it!). Perhaps RSVP was being signalled bad data and caused some links to overflow?
Q: Was the card made by Huawei?
A: No. They need their cards to work reliably ready to assist the Great Takeover(tm), when it comes.
Okay, that's a joke, but an anecdote: years ago, a colleague told me he did some work using a 4-transputer card (that dates it) to generate Ethernet frames by driving the line voltages directly. He said it was remarkably easy to upset (i.e. completely bugger) most of the network cards available at the time simply by sending malformed Ethernet. This incident suggests that not much has changed since.
Most 911 centers do not use the Internet -- that would be ridiculously foolish. The failure in this case was of the optical layer. You'd think that would be localized. But a nice leaked outage report in Telecom Digest gives some better clues. They were losing optical connections all over the place. What can do that? My suspicion is GMPLS, which applies Internet routing techniques to optics. A bad card sent out bad GMPLS packets and the other devices didn't discard it as they should have. Hilarity ensued. The vendor is not named... you might want to google around though to see who sells to CLQ.
The failure in this case was of the optical layer.
Had something very similar with a bank's home office. The. Entire. Network. Was. Down. Hard.
Flooded with traffic. Set a sniffer, went out and enjoyed a smoke, came back and read jibberish on a copper network. But, I did get enough fragments to have the MAC and traced it to one cheap, off brand NIC. It was mangling packets in just the proper way to be broadcast to each and every switch.
Padded the time, of course, since they should have had packet examination switched on in the core, which would've only dropped a small segment, if that.
So, what we really have is, yet another case of "critical services" having a single point of failure.
Because, public safety is number one...
Indicated via a raised third digit.
yeah, Telstra in Australia have used that as an excuse recently too... Funny how a cheap network card becomes a single point of failure, bringing down their entire network....
Even more strange that it occurred after massive redundancies, and off-shoring support to cheap 3rd world countries. Totally unrelated to any management decisions of course....
The worst network problems are usually intermittent issues that cause repeated failovers and churn in the routing process
Which is when the management software marks that carrier/route as deficient and switches to the properly functioning, reliable carrier, triggers an alert to be ignored by management.
But, I'd have digitally signed e-mails notifying management of single point of failure mode now, due to the primary carrier becoming unreliable. Something a plaintiff would find fascinating in any discovery case. And an absolute defense, as I'd re-warn management and the superior in the case of non-response. And preserve a "shut the hell up" response, proving a lack of due care and due diligence.
Giving me quite a legal shield, while the highest levels of management have their pantsless sacks hovering millimeters from the fan blade.