21st Century Networks, resilient networks, self-healing networks, global re-routing, circular networks, mesh networks,...
That was the dream...
A raft of BT customers in the UK were knocked offline this morning due to a power problem at one of its web peering partners’ sites in London. According to outage monitoring website Down Detector, customers reported a spike in problems at around 9am today. One customer got in touch to report that all their BT Infinity …
Yes, that was the original idea. The problem we have now is that the global internet is now many, many orders of magnitude larger than was originally foreseen (don't forget - the original "internet" only had 20 or so nodes on it) which makes operating it a whole lot harder. For example the original routing protocols have long been pretty much abandoned (except in small networks where they work just fine) and more complex protocols have been developed and deployed. However these protocols can take longer to converge to an answer if there is a massive reconfiguration of the internet (such as a major node failure). Added to this is that no protocol can compensate for stupid network design (e.g. running primary & backup cables through the same duct, having both primary and backup systems on the same power supply, etc).
Apparently BT are fixing it at the moment with their help desk.
So Far They've:
1) Disconnected all equipment from other sockets (this is what took down the BBC website)
2) Plugged their router into the master socket (this took down their trunks and backbone)
3) Rebooted their router
4) Taken delivery of another router and tried that
So they're now awaiting an engineer visit. Should be fixed by next Friday.
"So they're now awaiting an engineer visit. Should be fixed by next Friday."
However, the engineer will not show up but they will subsequently be told that they will be charged if the engineer is to return and that there is no availability of an engineering appointment for another couple of weeks.
It's been like this since about 7am this morning at my workplace. Currently at 10:30 it's still intermittent. Some websites will load without problems and others will take forever to load.
Which is annoying as I want to order some Hammerite paint from Amazon for my car.
AC because I shouldn't be shopping at work.
Had just the same, except in my case it's a hard drive caddy I'm after.
Just managed to get to Amazon though (11am) and ordered it, so give it a try..
Also had the irony of getting an RSS feed message from the BBC News about their article on the subject, but not being able to open it due to the issue...
We use SIP with BTNet. As far as I can tell, it hasn't been affected, whereas we are having problems accessing web sites and with remote users trying to access servers on-site.
That said, it hasn't helped me at all to try and get through to a live person at BTNet. Being told to call back later is not at all impressive.
"At the moment, BT would just appeal to the EU and it would be overthrown as there's a similar problem in Germany with Deutsche Telecom."
And if OR were to be split off how long do you think it would be before it was bought by Deutsche Telecom, or Telefonica - or maybe SoftBank?
Millions, if not indeed billions, are spent on (advertising) network resilience yet still server centres and other installations fall over, go "off grid", suffer "outages" or "unplanned downtime".
Is it simply impossible to prevent these occurrences? Is all the advertising about resilience etc complete dishonest bollocks?
Or are the PoP operators just lying to us on the grounds that it is so much cheaper to be a crook than try and actually build in genuine resilience?
And what about all these certificates they display so proudly on their websites? Are these all lies as well? Are the awarding bodies just in on the scam and taking the dosh while they can? Shouldn't an operator suffering one of these unexpected "inconveniences" lose their accreditation? And what about some com-pen-pay-shun?
I don't know particulars of BT, but you hit some good points there.
"Millions, if not indeed billions, are spent on (advertising) network resilience yet still server centres and other installations fall over, go "off grid", suffer "outages" or "unplanned downtime"." Indeed. Advertising brings in revenue. Infrastructure is just an expense. It's not uncommon to increase spending on the services (like advertising) while cutting expenses on the infrastructure that supports those revenue services. Years ago at a small chain retailer, the manager explained to me that because we were all paid on commission, "we polish the displays but nobody fixes the roof."
"Is it simply impossible to prevent these occurrences?" Not impossible, but it requires awareness and also decision-makers must be rewarded for solid planning over short-term results. "Is all the advertising about resilience etc complete dishonest bollocks?" Not exactly. I've seen very resilient designs get crippled by small decisions like using the redundant link to handle load spikes instead of renting a metered link. As so often in this world, people prefer data that supports their message and may not even be aware of how the facts have changed.
"And what about all these certificates they display so proudly on their websites? Are these all lies as well?" Yeah, sometimes. :) The certificates have very specific definitions. "Certified Malware Free" is much easier than "Scanned Every Hour According To OpSec 15(a) Which Is Has Been Due For Review For Two Years And Meanwhile We Changed Vendors And Our Tech Lead Left To Join A Startup So Nobody Really Understands It Any More But It Seems To Work Fine And We Are In Compliance With Our Accreditation." Again, not unique to IT. We probably all know someone who bought a very expensive car and then "saved money" by deferring maintenance. Or bought insurance but neglected to raise the limit after some major purchase.
Okay, you nailed the big ones. I just spent too much time in Operations!
Power Outage you say?
So, let me get this straight, Harbour Exchange in London's Docklands, one of THE most important links in the whole UK network, does not have adequate provision for when there is a power outage, either internally or externally.
Effectively no resilience then?
GOD'S TEETH!!!!!!
Glad to see it's not just me who sees the red mist with this sort of wording. It doesn't matter if it is a small number of customers, for those customers it is a complete loss of service.
PlusNet has been having occasional lie-downs in a dark room with a damp cloth over its forehead for a week or so now. Again, the mysterious power issue might be causing some problems for a small number of customers (which always seems to include me). The falling over seems to have started round about the time they told me my bill would be increasing to pay 22 tattooed millionaires to kick a ball around on a TV channel I don't watch.
If you're seeing an unstable connection via Plusnet you may be suffering the same thing I've had. It seems IP addresses starting with 51. are seeing packet loss. Go to their addons in your account and add a static IP. They'll add it to your bill but I'm planning on arguing for a refund as they've set a precident in their forums by giving it to someone else for free for the same issue. My IP switched to an 81 and my problems (seem) to have gone away.
Compared to, say, the number of particles in the known Universe, it was small number of customers affected.
Perhaps the regulators should force companies to release an estimate of the percentage of customers affected together with any qualifying factors such as geographical limitations on the disruption.
This post has been deleted by its author
If you choose BT you don't deserve good service. And I think you know that.
The simple test is: Does your ISP adverse on TV? If they're shit and need to advertise on TV. (They also spent all the money for expanding capacity on advertising).
All the best things in this world don't need advertising because people just know they're awesome.
> If you choose BT you don't deserve good service.
While I understand (and mostly agree with) your sentiment, this is nothing to do with the (home) ISP side of things. At work we are not with BT, we are with another carrier, on their own infrastructure even (for a few more months) their own fibre into our office. No BT at all until some point waaaay back into the internet - and we're affected.
Spent the first bit of the morning trying to puzzle why we were getting so many complaints of intermittent access to our servers. Then got a report via one of our other providers about the BT outage. Their joke of a status page still says "a few customers" may be experiencing problems as though it's a localised thing to a few FTTC cabinets.
>>If you choose BT you don't deserve good service. And I think you know that
Yeah yeah, whatever.
Meanwhile, the rest of us will use a large ISP (like BT) and which we generally have no problems with. I use BT at home and they're fine . I see no need to change to a small ISP no one's ever heard of and which only has 3 customers.
But feel free to feel superior to the rest of us...
http://www.ispreview...d-services.html
"UPDATE 11:07am
A private report from TeleCity, which was sent to certain providers earlier today and seen by ISPreview.co.uk, suggests that there was a power failure with one of their Uninterruptible Power Supply (UPS) systems at 8/9 Harbour Exchange (LD8) in London.
The fault hit a key router (edge4-tch), although this was technically resolved at 9:15am. However the knock-on impact for ISPs like BT and their clients (consumers, businesses and other ISPs etc.) can often take a little bit longer to resolve (computer networks are complicated animals).
Connections should be slowly getting back to normal."
> The report from bt I saw at 11:30 said power issues were still on going and they were unable to stabilise the power at that time
And BT would never try to blame someone else's power issues rather than admit that its own network fails to return to normal quickly enough.
Whilst I can resolve names to IP addresses and I can connect to some websites (gmail, the Register, Guardian, Telegraph), the routing for others seems to get lost in Telehouse (including my VoIP phone service with Sipgate). When connected via a VPN out of Italy, I can access most but not all of the stuff not accessible via BT.
This is due to power issues at one of our internet peering partners’ sites in London. Engineers are working to fix things as fast as possible.”
Once the technical problem has been resolved perhaps someone could find the time to find a better term than "internet peering partners", which has an unpleasant ring of New Age Management Speak about it.
So when we lost our Internet and it came back on, sites were still broken, but I managed to get to the BT web chat. No go, as 297 people iin the queue. "Submit your query by email". OK.
I kid you not, this was the form I had to fill in.
http://i.imgur.com/u5n9eNA.jpg
Since they don't exist anymore - bought by Equinix.
But actually I think you're focus is better directed speaking to the nice folk at Telehouse North.
Hell - if you drive past you'll observe opened windows - never a good sign in a DC- I assume the heat is getting to them ;-)
https://www.linx.net/tech-info-help/traffic-stats - LINX show a big drop in traffic earlier this morning - presumably that's what it looks like when BT stops peering in THN - but it's back to usual now.
Yeah, I'm currently sitting in a pool of sweat doing a pc upgrade in a windowless back office of a retail chain cafe and stuck at 24M of a 60M download. Not helped by the manager turning the power off half an hour ago to fix something that tripped. Probably not helped by the freestanding air conditioner with its exhaust pipe propped up against the wall pointing directly at said switchboard.
Zomg! I couldn't get on Twitter, or Amazon, or Flickr or my bank and some other pages for an hour or two from 8am. Slowly they came back online. Netflix was still up and running but I decided to read a book. For a while I wasn't sure if life would be worth living (insert sarcasm)
Get a grip. All ISP's go down occasionaly. I've been with Virgin, Sky, Orange.
It's all running ok now and life didn't end as far as I can tell :)
This post has been deleted by its author
Ditto here; trying to connect a VPN to my company's network and am seeing ping times of 70-80ms (normally about 10ms) and packet loss of between 40-50%. Have run traceroute several times and found that the transmission times jump in BT's core network; not only that but there is obviously huge routing instabilities since the traceroute paths are changing on an almost second-by-second basis.
This is not good!
Yep, have had many support calls this morning from people unable to connect to our systems (we're not with BT), all of which were from BT customers. The joy of trying to explain the difference between internet connection and internet routing to customers.
Was amused last night to get home to some junk mail from BT trying to sell me "Super reliable BT broadband".