The words...
...putting all your eggs in one basket spring to mind!
Colt Telecom customers might be finding themselves without internet access today after an undersea cable was cut. A spokeswoman was unable to give us any details beyond admitting there "were some issues". But Register readers have been told that a severed undersea cable running between the UK and mainland Europe is the cause …
The RIAA probably told Evil Lord Mandelson that there was an illegal copy of the latest Britney Spears MP3 flying down the cable so Mandelson ordered his Royal Navy to cut the cable.
I say his Royal Navy because well, he's clearly the one running the government nowadays, despite that being against the will of the entire population.
Wasn't one of the great points about T'Interweb being its resilience, that it could magically reconfigure itself to get round failure points ?
Of Course T'Interweb Two Dot Nowt won't have such problems. When your Cloud comes untethered and drifts off over the horizon you won't even notice. Honest.
Makes for reading the PR on Colt's web site quite amusing; "There must be rock-solid resilience because downtime just doesn’t cut it". Ah, if only it were a prefect world.
The cable cut idea doesn't sound very likely to me. This is an outage affecting access to their Ethernet services around Europe. Businesses in the UK have lost their ethernet IP access, as have those in Dublin, Brussels and Milan. Besides, the Colt network is peered with many other networks on both sides of the channel, so the whole thing shouldn't go down.
Apparently the hardware vendors have been called in to try diagnose the problem from traffic captures, which makes it sound like a multiple simultaneous hardware failure. Perhaps some kind of cascade failure or one of these "look how easily routing can be subverted" attacks that the doom-mongers have been predicting for years.
Whatever the reason, I don't envy the Colt engineers right now!
Does anyone really believe that a telecoms provider can be so dumb about the internet that losing one cable cuts off Europe?
Well, it might, if they buy capacity from two different providers that happens to be routed down to same cable, or just the same duct when a JCB digs a hole in the wrong place.
I wonder how many "independent" internet connections go through the Channel tunnel?
"Wasn't one of the great points about T'Interweb being its resilience, that it could magically reconfigure itself to get round failure points ?"
Indeed it is - but only if your ISP has spent the money to provide alternate paths to the rest of the internet.
Several years of experience in the operation of carrier level networks, including submarine, tell me that customers will happily sign up for cheap capacity, only starting to speculate on the deeper implications of words like "pre-emptible", "unprotected" and "non-restored" in their contract after a failure has occured.
This, of course, is what happens when a technically ignorant salesperson sells capacity to an equally technically ignorant purchaser.
Nice to know that a premium managed service provider is just as crap at actually running a fault-tolerant network as the regular providers ;) Why is it everybody harps on about n+1/n+n resilience (and charges accordingly) until someone plugs in a hoover, trips the main breaker and takes all of Northern Europe's connectivity with them?
I'm looking at you too, certain British datacentres
If it's an Atlantic cable that was cut, how come I couldn't see El Reg or Auntie BBC? Even via static (cached) IP address? Or other unnamed UK sites where I know the servers are within a few hundred yards of my office?
Sorry, I don't mind a brief failure and hiccuping DNS, but taking 6 hours where you can't even be bothered to take advantage of simple re-routing, that's sheer incompetence. Or they are lying about the cause. Time for a new provider I think.
We've had about 5% of our unprotected Colt circuits affected by this issue and all affected sites have been installed in the last 12 months. None of our protected sites are affected which would suggest it is a vendor issue.
I've heard DDoS and broadcast storm mentioned but trying to get a firm answer doesn't really help.
I think Colt have been pretty good - I've been updated every 2 hours with "there is no further update" since about 4AM on the 8th...
From Colt yesterday...
<snip>
Dear customer,
COLT is currently experiencing a backbone network incident impacting services in locations around Europe. Current investigations by our Engineering teams and vendors indicate performance issues on our Ethernet over SDH platform are causing degradation on the IPVPN, LANLink, Ethernet and IP Access services.
A diagnostic plan is currently being worked through including the next key steps.
1. Identify the source of exceptional traffic into the network,
2. Correlate all affected services to the network topology to build a picture of the impacted network and identify common areas,
3. The platform vendor has implemented debugging systems to capture and analyse network traffic and management.
We sincerely apologise for the inconvenience caused by this outage.
</snip>
Doesn't sound like a cable cut to me...
I used to work for one of the major mobile comms companies in the UK (who may or may not sponsor McLaren F1)
One morning, some time ago, monitoring showed a loss of all global roaming. We tracked down a third party cable breakage but were puzzled as to why our protection (secondary) link was not carrying the traffic.
Long story short. The supplier of our secondary link ran our connectivity for about 6 miles and then branched in onto the same fibre as our primary. This company (who shall not be named) had a little bit of explaining to do once the final turd slid off the fan!
To stop me from being sued, I ought to add that this was addressed pretty damned sharpish!
I was affected by this from about 1am to about 4pm yesterday, in Dublin. There was never an absolute loss of connectivity, it was possible to get the odd UDP packet through - I could make DNS lookups if I was very persistent, so I'm going with the DDoS theory.
When I reported the fault originally, the lady I spoke to told me that only customers connected via certain types of routers were affected, and that the problem was Europe-wide. The information then got vaguer as the outage continued.
It seems COLT meltdown isn't related to a cable cut, nor a DDoS... but is a direct result of their own doing! This is what COLT has been telling us: Alcatel was brought in for some testing, flipped the wrong switch and overloaded their circuits with millions of requests... Once they have realised what happened, it was too late and brought down most of their kit...
I am so surprised with this, a, because we still don't know why they couldn't stop this test, and why it's now taking more than 24hrs to fix this issue. Although apparently most of their IT is outsourced to India, no sarcastic comments about that then..... oh alright then, you get what you pay for!
I agree with AC. I used to work at COLT and they managed to bring down their entire internal network for 20 minutes when someone plugged in a switch into a Cisco core by mistake. The IT Director went mental ,fired a bunch of guys just for being in teh same room as the guy who plugged the switch in. Then someone had to explain to the american tw*t that you cant just fire people like that.....
I've just been talking to COLT support, there was another short outage for about 5 minutes this morning, and while they're not very forthcoming about the root cause of what's been happening, they tell me that they still have some issues ongoing in continental Europe - but maybe this is their way of telling me how lucky I am to have connectivity right now?
Anyhow, this incident has not been laid to rest yet, and the message that I read into what was said (this was not stated clearly) is that there may be more 'glitches' to come. They did say that connectivity in Ireland has been fully restored, despite this morning's 'glitch' - but yet they are keeping my ticket open. Maybe I'm being unfair to them and they're doing this because they don't wish to be hasty, but my natural cynicism tells me otherwise!
"Meanwhile thousand of users are unable to work, ... and again you cannot give any information to your users ...
Sorry but excellent example of how it shouldn't be done ...."
Petard. Hoist. Own, do these words spring to mind? you have "thousands of users" but only one provider?
Sorry but excellent example of how it shouldn't be done ....
We've had 2 days of downtime now where I work (in Switzerland), and no sign of any recovery. Its getting beyond merely annoying, and starting to be a liability, as we are effectively cut off from the outside world (how did we manage 10 years ago?). It also makes it far more difficult to fill all those empty hours at work...
The only positive from this how situation is that its meant that our IT guy can probably persuade the management to pay for a second connection now :P
From my perspective (London) thre was a blip of connectivity as I've noticed and they went dark again. One of our offices is in a building where Colt provides connectivity - you could easily tell who in the building is management, who's staff. First group was green and going mental, others were bored to death or going out for looong lunches. Colt will have to provide some serious explanation... and looking at the article from less than a month ago (lost link - sorry) that said Colt expects to grow as over 80% of customers are happy and plan to spend more with them... it seems at least outdated and inaccurate in my opinion.
Some stats say that after being out of business for several days large part of companies have to close down in 6-9 months due to loss of profit, customer trust, etc. What you reckon will happen with Colt? Will it survive?
Been out for nearly 48 hours now, no relevant updates... they're not even divulging cause. Status on colt.net is worthless. I really like how they try to minimize the impact... Even more amusing are the twitter updates they have setup. Why even bother? Already in discussions with alternate ISP, will drop Colt as soon as practically possible.