Obligatory post-snowden comment.
NSA accidentally screwed up their tap in Malaysia Telecom and called out the whole world rather than just SEA into their tap.
ISP Level 3's customers have been left without internet access since this morning, after the provider seems to have leaked routes to a Tier 1 transit provider in Malaysia. An incident report from CloudFlare said that while "the Tier 1 transit provider of the ISP leaking routes appears to have stopped accepting these …
This post has been deleted by its author
If this was the issue, then Level 3 is totally at fault, since one of the core principals for BGP policy is only to accept what you are expecting, since this is the obvious outcome of not having that route policy correctly defined.
This is such a basic error that the 'fat finger' has to have been involved.
However it's another example of how policy needs to be defined elsewhere, and the limits of BGP policy configuration done via the current OSS model....
Please god don't anyone say "hey we need another extended attribute for BGP that will solve this"..
"FFS Tier-1 ISP's have been using AS filter paths for 20 years - how the hell is this still happening in 2015?"
Given that you must mean prefix lists, not a path fitler, as a path filter would have allowed this through. As far as why, your comment is a perfect example: people don't know the difference between prefix filters and AS path filters, among many other things.
"However it's another example of how policy needs to be defined elsewhere, and the limits of BGP policy configuration done via the current OSS model....
Please god don't anyone say "hey we need another extended attribute for BGP that will solve this".."
Umm ok, so you know about BGP attributes, but you don't know about routing registration databases? You are supposed to build routing policies based on routes registered in a registration database. Interestingly, Level 3 runs their own routing registration database (http://www.irr.net/docs/list.html). What Level 3 is supposed to be doing, is automatically building new route prefix lists and pushing them to their edge routers every day. And those prefix lists would contain all acceptable routes.
Since a large ISP may have several thousand prefixes, automatically generating them from a database is the only way to go.
This post has been deleted by its author
And, even if it were, it certainly wasn't to do so immediately, with zero downtime or with zero human intervention worldwide.
Else things like BGP would have been in the bin decades ago. I mean, seriously, just having routing tables hit certain sizes is enough to make many brands of high-end networking gear just fall over. BGP routing tables grow into the same kinds of fixed spaces. And, hell, BGP announcements do nothing to take account of CAPACITY of the system on either end (i.e. the preference of a particular route based on its response time etc.).
The Internet won't invisibly and automatically survive any kind of attack. However, it will be not-so-difficult for even a small bunch of humans to cobble it back together even if that means throwing out DNS, BGP or similar in some fashion to allow it to do so.
I beg to differ
If instead of a stupid user error the site had been hit with a nuclear device the rest of the internet would have been fine - in fact it may have been better off
No one said the Internet could survive stupid users - stupid users has been a problem since prehistoric man dropped a really big rock on their foot and it has only gotten worse.
Specifically (and topically) on BGP issues like this one at Level-3 (but also the difficulty of moving people to BGPSEC):
And more generally, earlier history on why the original designers did not expect so many attacks:
... but as the articles say, early hardware could not do encryption easily, NSA probably objected, and people at the time never imagined we would be doing on-line banking or there would even be a YouTube to censor :(
Level3 is in the process of migrating the old Global Crossing (AS3549) networks into the Level3 AS3356. I suspect some config generation handled within 3549 was migrated to the 3356 script, which put in a permit all on the Telecom Malaysia transit port, which had always been "broken", but it didn't matter before.
Speaking as working for a network that peers with both 3356 and 4788, we solved the issue once we depeered 4788
Biting the hand that feeds IT © 1998–2021