Cloudflare explains how it managed to break the internet • The Register Forums

Tuesday 21st June 2022 15:17 GMT Ben Tasker

> This morning was a wake-up call for the price we pay for over-reliance on big cloud providers. It is completely unsustainable for an outage with one provider being able to bring vast swathes of the internet offline.

Multi-CDN is relatively easy to set up nowadays, and isn't even that expensive.

Unfortunately, if you want to use Cloudflare then you need to have your DNS with them - at least, unless you're willing to pay $200/month for their business tier in order to unlock support for CNAMEing to them rather than giving them control of your zone.

There's not a *lot* of point in setting up multi-CDN if your authoritatives are tied to one of the providers that you're trying to mitigate against.

It's a business choice by Cloudflare, but is part of the reason that outages there are so severe. If something happens to Cloudfront, Akamai, Fastly etc then there's the option to flip traffic away from them (it can even be done automatically) and serve via a different CDN until things settle down.

It's a core part of why I neither use or recommend Cloudflare: they might be huge, but they're still a single basket and mistakes happen. Not having an option to move traffic away from longer outages isn't really acceptable.

50 2 Reply
1. Wednesday 22nd June 2022 17:51 GMT Michael Wojcik
  
  I have mixed feelings about Cloudflare, but they are generally quite good about explaining what went wrong. They also publish a lot of good technical content in general.
  
  Mark Boost, on the other hand, sounds like a spoiled brat. "Everything isn't perfect! My gratification isn't immediate! How dare you!"
  
  I've been using the public Internet since a few years after Flag Day, and I've managed to avoid panicking when I can't "access the online services that are part of the fabric of all our lives". Sometimes there are, y'know, network interruptions. Or power interruptions. Grow the fuck up, Mark.
  
  3 0 Reply
Tuesday 21st June 2022 15:18 GMT KOST

"Automation: There are several opportunities in our automation suite that would mitigate some or all of the impact seen from this event."

Translation: Dave in QA's just lost his job to a robot

26 0 Reply
1. Tuesday 21st June 2022 23:06 GMT Anonymous Coward
  
  Dave in QA
  
  Got promoted and is working 60hrs wrangling the farm of bots doing automated regression testing. Being able to automate more just means being able to cover slightly more of your test cases. This is one of those cases where adding automation just ads another layer for the humans to manage. The laws governing math and primary logic make automating all of your automation problematic, so unless your company was the rare unicorn that actually had complete case coverage it just makes it possible to do more of the work you weren't getting to with each release.
  
  Sure slacker outfits might be able to squeeze headcount a little with better tools, but Cloudflare has to put on it's big boy pants every day or it will blow up the internet(again). I do not want that job.
  
  14 0 Reply
  1. Wednesday 22nd June 2022 10:54 GMT Eclectic Man
    
    Re: Dave in QA
    
    E. M. Forster got there a while ago (published in 1909):
    
    'The Machine Stops.' https://www.bbc.co.uk/sounds/play/m0018fs6
    
    (Note: login required to listen.)
    
    4 0 Reply
    1. Wednesday 22nd June 2022 13:59 GMT 1752
      
      Re: Dave in QA
      
      Someone (you?) linked a text version the other, I read it. Recommended.
      
      2 0 Reply
      1. Thursday 23rd June 2022 10:14 GMT Eclectic Man
        
        Re: Dave in QA
        
        The BBC Radio adaptation differs from what I recall of the original story, but is well done, and Tamsin Grieg is always good. Apparently Forster wrote 'The Machine Stops' as a response to H G Wells' claiming new technology would make the world wonderful. (Big debate on that one, I suspect.). One of the only things where I actually agree with F Nietzsche is that he claimed in the late 19th century that humanity was treating nature with contempt.*. (However, I totally disagree with his approval of genocide and the subjugation of women as stated in the same book.)
        
        *'On the genealogy of morals'
        
        1 0 Reply
    2. Thursday 23rd June 2022 19:49 GMT PRR
      
      Re: Dave in QA
      
      > E. M. Forster ....1909: 'The Machine Stops.' https://www.bbc.co.uk/sounds/play/m0018fs6 (Note: login required to listen.)
      
      Free version: here
      
      Free PDF: here
      
      Free AudioBook: here
      
      Wiki explanation: here
      
      Literary analysis: here
      
      MAD Magazine(!) parody(?) from MAD #1(!) 1952: here (open this full-screen, it is worth it) (and skip to slideshow page 4.)
      
      There is a Keith Laumer short (Cocoon, 1962) which echos this plot but with a Laumer-esque protagonist.
      
      0 0 Reply
  2. Wednesday 22nd June 2022 11:17 GMT CrazyOldCatMan
    
    Re: Dave in QA
    
    make automating all of your automation problematic
    
    Automation is fine for the regular run-of-the-mill stuff (do X and you get Y, feed Y to Z and get expected result). What it's really, really not good at is edge cases - and in a lot of cases it can make the issue worse.
    
    And over-reliance on automation also makes the techie skills atrophy - there's a reason why 'practice makes perfect' is a reasonable saying. If your techies[1] don't know how the guts of things work (and not just a 1-day course on the infrastructure but a 'I herd this stuff all day' thing) then any root-cause analysis is going to take longer because the people responsible for fixing the mess are not as intimately acquainted with the guts of the system and have to explore as well as fix..
    
    [1] And not just the 'hero' types but the lower levels too. Having only one or two staff who know how everything works is a real risk in itself. Especially if (as most hero types are) they are spectacularly averse to documentation...
    
    4 0 Reply
2. Wednesday 22nd June 2022 04:01 GMT Snake
  
  Re: automation
  
  This is why I intentionally keep my 4 disparate home IoT automation systems independent, regardless of what internet forum PFY's / BOFH's tell you to do otherwise. My 2 separate heating systems back one another up, not create a single point of failure.
  
  12 0 Reply
Tuesday 21st June 2022 15:33 GMT Mike 137

More than just their users affected

DNS lookup for our web sites and email failed to resolve while this was going on, despite the hosting service we use not relying on Cloudflare. But the widely used 1.1.1.1 DNS is hosted by them, so that could at least partially explain the extent of the problem.

13 1 Reply
1. Tuesday 21st June 2022 21:42 GMT alain williams
  
  Re: More than just their users affected
  
  An important idea within DNS is to have multiple servers so you should have had slave servers with other providers. 8.8.8.8 would do nicely or even have a few your own, it is not hard.
  
  10 0 Reply
  1. Wednesday 22nd June 2022 07:34 GMT Lockwood
    
    Re: More than just their users affected
    
    I have a pihole with unbound - I used to use 8.8.8.8, then 1.1.1.1
    
    Not the ideal solution for everyone, but it has some advantages
    
    7 0 Reply
    1. Wednesday 22nd June 2022 13:52 GMT Hubert Cumberdale
      
      Re: More than just their users affected
      
      I like NextDNS. It's also the last layer of defence in my DNS-based ad blocking.
      
      1 0 Reply
Tuesday 21st June 2022 15:57 GMT TRT

The more they overthink the plumbing,

the easier it is to stop up the drain.

52 0 Reply
1. Tuesday 21st June 2022 16:29 GMT David 132
  
  Re: The more they overthink the plumbing,
  
  Automatic upvote for any Scotty quote.
  
  And it's closely followed in the movie by one of my favourite Bones/Kirk exchanges:
  
  "Nice of you to tell me in advance."
  
  "That's what you get for missing Staff meetings, Doctor..."
  
  34 0 Reply
Tuesday 21st June 2022 16:29 GMT Coastal cutie

"One can imagine the panic at Cloudflare towers, although we cannot imagine a controlled process that resulted in a scenario where "network engineers walked over each other's changes."

Umm, left hand, right hand, have you been introduced? And have you been taught how to tell your derriere from your elbow?

15 2 Reply
1. Wednesday 22nd June 2022 11:19 GMT CrazyOldCatMan
  
  Umm, left hand, right hand, have you been introduced?
  
  It all shows a pretty bad case of immature change control. JDI is *not* a valid change methodology..
  
  (I suspect that, in all their testing, they had not tested the effects of new and old topologies mixing..)
  
  4 0 Reply
Tuesday 21st June 2022 16:38 GMT dvvdvv

Indeed

"we cannot imagine a controlled process that resulted in a scenario where "network engineers walked over each other's changes."

It's the most sarcastic thing I've heard this morning.

35 0 Reply
1. Wednesday 22nd June 2022 13:01 GMT Anonymous Coward
  
  Re: Indeed
  
  I think they meant "network engineers REVIEWED each other's changes."
  
  As in "they did a walk through, a dry run" to see what might have gone wrong. Doesn't sound like a bad way of understanding the problem, but could probably have been described better.
  
  1 4 Reply
  1. Wednesday 22nd June 2022 16:39 GMT Anonymous Coward
    
    Re: Indeed
    
    They clearly weren't talking about reviews.
    
    The full quote says that this resulted in reverts being reverted and the issues reoccuring as a result.
    
    It's a sign of a lack of proper incident management discipline - everyone scrambles to fix the issue, but with noone actually coordinating the response you get chaos and conflicting changes.
    
    4 0 Reply
Tuesday 21st June 2022 17:25 GMT Claptrap314

I'm curious

How many of the stone-throwers here have ever worked in, let alone managed, an operation of Cloudflare's scale?

I've not been inside, nor have I had any extensive dealings with them, but I will say this: getting this stuff right is **** hard. I've not read their post-mortem, but just from the article, they are showing FAR more transparency than we generally see.

To address two specific points, one from the article and one from the comments. First, if $200/mo is enough to matter, then you are not spending enough on resilience for me to care what your opinion is. Resilience happens at every level of the stack, and it is a sick joke to suggest that it can be achieved on a shoestring budget.

Second as for that blowhard CEO's gripes: what is your company again? This is the sound of a small competitor whose many failings have not been in the news because they don't matter.

Certainly, from just the article, it is clear that "mistakes were made". But the nature of these mistakes--insufficient QA before a change, unclear responsibilities during a major incident--are relatively easy to fix, especially when compared to, say, Google's inability to get quotas right, or Microsoft's inability to even have an inventory of their internal DNS servers.

Yes, yes: this incident is another reminder that resilient happens at EVERY level of the stack. No one said otherwise. And big does not mean "perfect". No one said otherwise.

28 19 Reply
1. Tuesday 21st June 2022 22:19 GMT gratou
  
  Re: I'm curious
  
  You missed the useful bit. It's not the 200 dollars. It's that there's not a *lot* of point in setting up multi-CDN if your authoritatives are tied to one of the providers that you're trying to mitigate against
  
  24 0 Reply
2. Wednesday 22nd June 2022 05:04 GMT Sam Liddicott
  
  Re: I'm curious
  
  Simply paying the $200 won't help if you don't understand what the problem is that he was talking about
  
  9 0 Reply
3. Wednesday 22nd June 2022 06:59 GMT Ben Tasker
  
  Re: I'm curious
  
  > How many of the stone-throwers here have ever worked in, let alone managed, an operation of Cloudflare's scale?
  
  As you've specifically called out my comment (whilst missing the point), I'll answer.
  
  I have.
  
  In fact, I've also worked with customers who considered themselves too big to deliver via Cloudflare, and as well as building and managing global CDNs, I've built and integrated against a number of multi-CDN solutions, working with customers that you've definitely heard of (and on a balance of probability will likely interact with sometime today).
  
  I don't really do calls to authority, but if you're concerned I'm griping with no view into the industry, I'm not.
  
  > First, if $200/mo is enough to matter, then you are not spending enough on resilience for me to care what your opinion is.
  
  You've completely missed the point.
  
  It's not just the cost, it's the fact that their solution is architecturally flawed for no reason other than commercial gain. Cloudflare is the only provider who charges extra to be able to CNAME in.
  
  The true market leader, where real money is spent (Akamai) offers DNS services but does not mandate them: to do so would mean forcing your customers onto a possible SPOF.
  
  That $200/mo by the way, may not give you much else extra that you care about (xepending on your requirements), you're just paying a premium for something that's a basic feature on basically every other CDN.
  
  It'd be much better to be able to spend that $200 on Cedexis or similar so that you can move traffic about.
  
  > Resilience happens at every level of the stack, and it is a sick joke to suggest that it can be achieved on a shoestring budget.
  
  There's a cost, sure, but that doesn't justify Cloudflare's commercial choice.
  
  Perfect resilience costs a lot, but that's not what we're talking about here, we're talking about building multi-CDN: resilience against single provider failures.
  
  That absolutely can be achieved on a shoestring budget (though I wouldn't recommend it).
  
  You're letting perfect be the enemy of good, and even a few years within that segment of the industry, seeing the things that big companies actually use/do would show you how wrong you are.
  
  > they are showing FAR more transparency than we generally see.
  
  Cloudflare generally do, their post mortems are open and honest, something which they should rightly be praised for.
  
  35 0 Reply
  1. Wednesday 22nd June 2022 10:04 GMT Cowards Anonymous
    
    Re: I'm curious
    
    You forgot the
    
    MIC drop at the end. Great reply though.
    
    9 0 Reply
  2. Wednesday 22nd June 2022 17:00 GMT Claptrap314
    
    Re: I'm curious
    
    It's $200/mo for something that their competition does for free. You attacked their business model over this. And I'm doubling down here--you're demonstrating an utter lack of proper prioritization on this point.
    
    You know (apparently first hand) that the monthly cost of building a resilient app runs at least 6 figures a year. Against that level of spending, you're going to weigh $2500 priced as an addon? What if the cost of the base contract is $5000 less?
    
    Look, if you just don't like them, that's fine. But any evaluation of a solution has to be based on total cost, and you're talking about a rounding error.
    
    0 4 Reply
    1. Wednesday 22nd June 2022 18:56 GMT Ben Tasker
      
      Re: I'm curious
      
      > You know (apparently first hand) that the monthly cost of building a resilient app runs at least 6 figures a year.
      
      Again, we're talking about different things here.
      
      You're talking about end-to-end resiliency in an application context, I'm talking about a feature that makes it possible to cope with a failure in an edge provider (Cloudflare in this case).
      
      You're talking in the context of an application server, whilst CDN's primary domain is serving (and caching) static content (the big money is in serving game artefacts and video). We're talking about completely seperate problem domains with very different solutions.
      
      It's not nearly as expensive as you make out. In fact, it can even be achieved for as little as £60/year (though that price point doesn't come without issues, the risk is still lower than with your authoratives tucked away inside CF).
      
      Fuck, you can do it nearly free if you don't want it automated:
      
      - Have DNS across providers
      
      - CDN1 goes down, update CNAME to point to CDN2
      
      Not that I'd recommend it, but your recovery rate will still be faster than without the ability to switch away.
      
      Something more production ready like Cedexis or Constellix doesn't break the bank either.
      
      The days of CDN being an expensive bespoke thing are long gone, it's a commodity nowadays. Multi-CDN support isn't much different in that respect
      
      I've focused on resiliency because that's what the article is about, but there's a bunch of other knock on effects too. In the early stage of the relationship It knackers a customer's ability to do A/B testing.
      
      Big customers also tend to use different CDNs in different parts of the world - If Cloudflare's coverage is pants in parts of Asia, you might route those users via AliCDN or similar etc. The lack of CNAME support hampers this.
      
      The key thing with both examples is that you generally want it to be transparent to the user, which generally means CNAMEs (I've seen DNAMEs get used too... the horror)
      
      > Against that level of spending, you're going to weigh $2500 priced as an addon?
      
      I think the thing you're missing is that that $2500 is an otherwise unnecessary charge unless you also want support from CF.
      
      It also comes on top of whatever you're paying to implement failover. If you're using something simple (like DNSMadeEasy's fairly limited auto failover) then that 2500 increases your costs many times over.
      
      If Cloudflare were head and shoulders above the competition you might ignore it, but in many locations, they're not.
      
      > But any evaluation of a solution has to be based on total cost
      
      I disagree.
      
      Evaluation of a solution is based on *value* not cost.
      
      CDN customers tend to have a spectrum of criteria. Some will pay the earth for the provider with the lowest latency (in whatever market they care about), some want to spend as little as possible.
      
      Most, obviously, fall somewhere in the middle.
      
      If you're charging extra for something basic that your competitors offer, you need to be able to justify it, either through sheer speed (attracting the left of the spectrum) or by being able to explain that its omission makes lower tiers cheaper (a hard sell with this particular issue).
      
      That's not something Cloudflare have achieved IMO.
      
      > You attacked their business model over this
      
      I think you've overlooked the context I posted under.
      
      My original comment quoted a bit from the article that said it was unacceptable that a single provider could take so much of the net down.
      
      The point in my comment wasn't to attack CF's business model, but to point out that that SPOF was there not because of technical reasons, but because of a business decision by CF.
      
      The guy quoted in the article is a little breathless, but he's right, this should and could have been avoidable.
      
      6 1 Reply
4. Wednesday 22nd June 2022 11:21 GMT CrazyOldCatMan
  
  Re: I'm curious
  
  insufficient QA before a change, unclear responsibilities during a major incident--are relatively easy to fix
  
  Not in my experience - they speak of the root culture at a place and that's really, really not simple to fix.
  
  All those things are needed for operations at scale and, if CF don't have them, it's a miracle that this hasn't happened before.
  
  4 1 Reply
Tuesday 21st June 2022 18:13 GMT Ken Moorhouse

Knocked out a lot of stock market research sites...

During the critical pre-LSE market start of trading period.

3 0 Reply
1. Wednesday 22nd June 2022 03:29 GMT Short Fat Bald Hairy Man
  
  Re: Knocked out a lot of stock market research sites...
  
  Lovely
  
  3 0 Reply
2. Wednesday 22nd June 2022 18:45 GMT IGotOut
  
  Re: Knocked out a lot of stock market research sites...
  
  You mean it took the bots offline?
  
  1 0 Reply
Tuesday 21st June 2022 18:38 GMT JibberX

The Internet...? Or the World Wide Web?

I know they talk about BGP shenanigens but still, its just affects people looking and animated GIFs in Netscape right?

11 0 Reply
1. Wednesday 22nd June 2022 01:48 GMT jake
  
  I was online and doing work in that timeframe. I didn't notice anything amiss.
  
  I was not using that new-fangled WWW-thingy. Do with that what you will.
  
  7 2 Reply
2. Wednesday 22nd June 2022 18:17 GMT Michael Wojcik
  
  Cloudflare is used for things that aren't exclusively web-related, such as DNS. Per comments above.
  
  But, yeah, if you weren't caught out by using Cloudflare-backed DNS, you probably wouldn't have observed too many issues with your SSH connections or whatever.
  
  I missed the outage, thanks to my time zone and working hours, but it probably wouldn't have troubled me too much since I'd still have the corporate network and there's no shortage of things I can be doing.
  
  0 0 Reply
Tuesday 21st June 2022 21:23 GMT AlanSh

It's not that big a deal

OK - so we lost part of the internet for a couple of hours. At a time when half the world was asleep. Get a life guys - it's not that big a deal. There are MUCH more important and worrying things going on in the world today. Cloudflare fixed it - and it won't happen again like that.

So, sit and enjoy a beer.

Alan

6 25 Reply
1. Tuesday 21st June 2022 21:53 GMT First Light
  
  Re: It's not that big a deal
  
  This is a site reporting on the industry and the story is an example of an industry screwup. It's legitimate to report on and comment about it. At what point does it become a big deal? Is there a threshold number of hours for it to become one?
  
  It was daytime in India and the outage lasted til 1pm. So it had a bigger effect on that country of 1.2bn people. Not exactly a nothingburger.
  
  https://economictimes.indiatimes.com/markets/stocks/news/retail-investors-hit-by-global-cloudflare-outage-as-services-of-zerodha-other-brokers-affected/articleshow/92357392.cms
  
  26 0 Reply
2. Wednesday 22nd June 2022 05:55 GMT Ken Moorhouse
  
  Re: So, sit and enjoy a beer.
  
  With a bit of imagination you could say this is a story about fermenting hops.
  
  Addressing the other points in the post:-
  
  Half the world might be asleep (which I doubt), but that means the other half is awake.
  
  How can anyone say this won't happen again? Are the theory and methods of calculating Risk dead? It is very worrying if people think this.
  
  5 0 Reply
  1. Wednesday 22nd June 2022 06:26 GMT jake
    
    Re: So, sit and enjoy a beer.
    
    "With a bit of imagination you could say this is a story about fermenting hops."
    
    That'd take quite the imagination indeed ... not a lot of fermentables in hops.
    
    6 1 Reply
Tuesday 21st June 2022 22:46 GMT hayzoos

CDNs are evil!

Okay, got your attention. But, The Internet core design philosophy is completely subverted by global oligopolistic CDNs. Because of CDNs, your use of the internet is insulated from the real Internet. The servers people intended to contact were not down, yet they were down. The Internet design was to be able to route around "failures". I submit that CDNs as implemented are breaking The Internet.

34 1 Reply
1. Wednesday 22nd June 2022 08:05 GMT EBG
  
  same answer
  
  that I give to those who are shocked that bitcoin mining consolidated into a few groups in China. Economics 101. That's what commodity supply chains do.
  
  1 0 Reply
2. Wednesday 22nd June 2022 12:11 GMT Wayland
  
  Re: CDNs are evil!
  
  IPFS, or Inter Planetary File System would have routed round the problem if used as a CDN.
  
  1 0 Reply
3. Wednesday 22nd June 2022 14:26 GMT cyberdemon
  
  Re: CDNs are evil!
  
  The worst example of this problem, of course, being Google AMP. Pages auto-mangled by Google, and the original website doesn't even know that you looked at it.
  
  To Room 101 with it
  
  4 0 Reply
4. Wednesday 22nd June 2022 18:18 GMT Michael Wojcik
  
  Re: CDNs are evil!
  
  Well, yeah. And the web was ruined by Javascript (and arguably by CSS, and even graphical browsers, though those are useful for viewing useful graphic assets like charts and plots). And the web ruined Usenet. And really with a few improvements to Gopher or WAIS we could have dispensed with the web in the first place. And GUIs ruined UIs. And so on.
  
  As Nick said in Metropolitan, I'm not entirely joking.
  
  But, curmudgeonly as I am, even I make use of some online services that wouldn't be feasible without CDNs or some other edge-delivery mechanism. (Someone else mentioned IPFS, but I'm dubious.) Could I live without them? Absolutely. Would I miss them if they vanished? Eh, a bit, but to be honest I'd miss good old fashioned paper books far more if I lost those.
  
  Still, I can't pretend I get no value from CDNs. And I suspect that's true for the vast majority of people who do anything online.
  
  0 0 Reply
Tuesday 21st June 2022 23:55 GMT jake

"I submit that CDNs as implemented are breaking The Internet."

Change that from "are breaking" to "have broken". But quite.

13 1 Reply
1. Wednesday 22nd June 2022 12:31 GMT Phones Sheridan
  
  I actually found this post helpful back when I was deciding to use Cloudflare or not.
  
  https://www.devever.net/~hl/cloudflare#:~:text=Websites%20should%20avoid%20using%20Cloudflare,the%20state%20of%20the%20web.
  
  1 0 Reply
Wednesday 22nd June 2022 00:00 GMT jake

" ... Large cloud providers have to manage a vast degree of complexity and moving parts, significantly increasing the risk of an outage."

To say nothing of the vastly increased size and scope of potential attack vectors.

Clouds are snake-oil.

10 2 Reply
1. Wednesday 22nd June 2022 16:16 GMT M.V. Lipvig
  
  You forgot to mention that the larger the cloud company, the further down you're likely to be on the recovery list when something bad happens that requires backups be loaded. Or, how missing a payment or two because there was a problem with payments going through (a surprisingly large number of smaller companies use a credit card for such things) means your business space gets sold to someone rlse, and overwritten. There went your business.
  
  Cloud computing may have its uses, but nobody cares about your business data but you. Relying on another company to take care of it for you might be saving you an IT guy's salary and a couple of servers, but will that be enough when you walk into work in the morning to find that you have no business, and the cloud company says "Sorry, should have paid your bill, we stopped being responsible when your account lapsed."? I know my own company will email late notices, then will shut your service down without a care in the world about whether your business will run, and when you call to complain we tell you flat out "You were disconnected for nonpayment, call 800-givenofuck if you want it back." Someone may answer, if you call between 8:00 and 8:05 on Feb 30 of any year.
  
  4 1 Reply
Wednesday 22nd June 2022 03:37 GMT The Morgan Doctrine

Which means a bad actor could TAKE DOWN THE FREAKING INTERNET

Holy Mother of Mercy! I'm surprised Russia hasn't pulled the plug on us all.

0 0 Reply
1. Wednesday 22nd June 2022 03:59 GMT jake
  
  Re: Which means a bad actor could TAKE DOWN THE FREAKING INTERNET
  
  Well, yes.
  
  To be perfectly honest, perhaps not "take down the entire Internet", but it would be fairly easy to balkanize it for awhile.
  
  Some say this has already happened, especially to the WWW.
  
  7 0 Reply
Wednesday 22nd June 2022 10:11 GMT Toni the terrible

IT's a Pain

Cloudfare is a pain in the proverbial. It has prevented access when using Firefox (and even Chrome etc) to websites that access have been paid for begining 3 months ago. Which is worse when the only way to contact the vendor is via the website, e.g. Crunchyroll.

It has also prevented me from getting my takeaway via Just Eat, grumble, grumble - of course going direct is doable so I do not need a takeaway aggregator - but still it is not convenient and prevents actual business happening

3 1 Reply
Wednesday 22nd June 2022 12:11 GMT Anonymous Coward

To be honest this isn't news - Cloudflare seems to have so many outages! Either these major foobared "Self-inflicted" ones - or the random cloudflare error pages you see around the internet when it's servers don't seem to be able to cope! Often hidden by the fact that it is static content missing...

0 0 Reply
Wednesday 22nd June 2022 14:04 GMT 1752

https://xkcd.com/908/

2 0 Reply
Wednesday 22nd June 2022 14:21 GMT Potemkine!

From my point of view, I would congrats the engineers who were able to locate and fix the problem in such a short period, when under an enormous pressure. Shit happens and will still continue to do.

An interesting comment is the one showing how we are more and more dependent of an internet connection. Public works severing optical fibres results in many unable to work, and that can last for several days and huge losses. I wonder if the bean counters took that into consideration before deciding to migrate everything to the lowest bidder?

0 0 Reply
Wednesday 22nd June 2022 16:19 GMT Version 1.0

Internet vs Renitent

The Internet was created to renitent (resist and survive) and it worked very well for years, but now we have "upgraded" so many things that this incident is no big surprise. This environment applies to everything I work in and with new "features" making data easy it's sold as "So Great" but then something stupid happens and we see this style of issue ... it's normal these days, we're busy creating problems and solving them but originally the Internet was designed and built just to avoid problems and work like that for years.

1 0 Reply
1. Wednesday 22nd June 2022 17:56 GMT jake
  
  Re: Internet vs Renitent
  
  "The Internet was created to renitent (resist and survive)"
  
  Oft repeated, but simply not true.The (D)ARPANET was just a research network designed to research networking. The "survives nukes" myth came about much later ... The cold, sad reality is that the only reason it was built to be resilient is because the available hardware was really, really flaky.
  
  The networks that were designed to survive nuclear attack included the "Minimum Essential Emergency Communications Network", or MEECN, and the prior "Survivable Low Frequency Communications System" or SLFCS. If you use an ounce of common sense, it only stands to reason ... no military would design a command and control system that inherently wasn't securable, and the Internet was not then, and still isn't securable.
  
  2 0 Reply
Wednesday 22nd June 2022 18:53 GMT IGotOut

Network engineer issue.

The issue of stepping on each other is easy to explain, as I've had first hand experience (VoIP).

Network make a change. As far as they are concerned all is good push out....

Hits VoIP all hell break loose. Telecoms report issue. They spot issue and start emergency changes, get back up and running in a few minute, while networks run round like headless chickens with fingers up their arses.

Networks panic without talking to anyone and revert changes (after asking if you can ping it 3000 times).

Revert breaks telecoms fix. Telecoms revert back to old config.

Rinse and repeat.

Yes, it happened too many times to remember.

1 0 Reply