back to article Cloudflare fesses up to config change that borked internet access for all

There was a disturbance in the force on July 14 after Cloudflare borked a configuration change that resulted in an outage, impacting internet services across the planet. In a blog post, the content delivery network services biz detailed the unfortunate series of events that led to Monday's disruption. On the day itself, " …

  1. Anonymous Coward
    Anonymous Coward

    If they're idiots

    Seriously, El Reg?

    "As a Reg reader pointed out: 'Remember this is a DNS service. Every person using the service would have had no ability to use the internet. Every business using Cloudflare had no internet for the length of the outage. NO DNS = NO INTERNET.'"

    Only if they're idiots. Nobody who isn't stupid relies on a single DNS provider.

    1. Tubz Silver badge

      Re: If they're idiots

      I have 8 authoritative nameservers connected to my Opnsense Unbound DNS, they are monitored for response times and switch priority accordingly, not rocket science. I like overkill !!

      1. Anonymous Coward
        Anonymous Coward

        Re: If they're idiots

        I have zero upstream servers in my various cache server configurations. That's... kind of the whole point of running your own cache.

  2. dsch

    It's not DNS

    There's no way it's DNS

    It was DNS

    (credit: http://i.imgur.com/eAwdKEC.png)

    1. Anonymous Coward
      Anonymous Coward

      It's usually DNS.

      DNS issues are so incredibly common that they're among the first things I check when I hear somebody whine "The WiFi is out".

      And yes, that's the most common way I get an internet issue report. I don't even bother checking the WiFi first any more, because it's rarely the WiFi.

      1. Anonymous Coward
        Anonymous Coward

        Rule 1. It's always DNS

        Rule 2. If it isn't DNS it's a certificate

  3. Eye Know

    They are one of my forwarders...

    They are one of my forwarders but not the only one, and I use PiHole so didn't notice.

    Everyone who cares about staying connected should run a caching DNS server with multiple diverse forwarders.

    1. tfewster Silver badge
      Facepalm

      Re: They are one of my forwarders...

      Or maybe you could subscribe to an internet service provider that paid IT network professionals to do that shit for you?

      Being a "prepper" isn't a good use of my time or specialist skills.

      1. IGotOut Silver badge

        Re: They are one of my forwarders...

        @tfewster

        You mean put a secondary DNS entry in? It's not hard. If you don't know how to that, go use Mumsnet or something.

        1. Anonymous Coward
          Anonymous Coward

          Re: They are one of my forwarders...

          Depends on the issue. If the primary is down, then the secondary will work, although will result in slower resolution as the primary times out.

          The real problem is if the primary is up but providing invalid data. If it comes back with NXdomain you will never query the secondary.

          Sometimes it is simpler just to have 1 resolver, then manually cut over to a secondary when you have established there is a problem.

          1. cant

            Re: They are one of my forwarders...

            It depends entirely on the OS and what DNS client is running. Windows Native DNS actually parallelizes DNS requests, your primary or secondary doesn't matter, the early bird gets the worm.. so long as you have two distinct DNS resolvers, you should be good to go. I like running my own pihole with DNSSEC/DoH via Quad9 personally

            1. cant

              Re: They are one of my forwarders...

              If the Reg article isn't a wakeup call to *not* put all your trust in Cloudflare let these links below do that. I know there's another blogpost out there somewhere with a catchy name that details even more things they've done but I can't find it, so these will have to do.

              https://www.itomic.com.au/the-downsides-of-cloudflare/

              https://git.disroot.org/cyberMonk/liberethos_paradigm/src/branch/master/rap_sheets/cloudflare.md

              https://github.com/argosopentech/argos-translate/issues/257

    2. Jamie Jones Silver badge

      Re: They are one of my forwarders...

      If you really cared, your local DNS servers wouldn't use forwarders at all.

      1. Jamie Jones Silver badge

        Re: They are one of my forwarders...

        Downvoted by someone who doesn't understand DNS.

        Sigh.

        1. Anonymous Coward Silver badge
          Facepalm

          Re: They are one of my forwarders...

          I know what you meant (use root servers) but disagree with you.

  4. Anonymous Coward
    Anonymous Coward

    Screwed Up Cloudflare !!

    This might be related, or it might not.

    But Cloudflare hosts a few UK concert ticket sellers and about 2 weeks ago, one of these resellers was doing a great deal on tickets for a band I like.

    So, I visited the relevant website and tried to buy tickets...but I got a Cloudflare error message that said that my IP address belonged to a scammer and it refused my custom.

    And there was no easy way to report this to either the ticket seller or Cloudflare tech support via this error page.

    If a Cloudflare "upgrade" or change to their hosted system was ongoing then this might explain my issue?

    1. stiine Silver badge
      Unhappy

      Re: Screwed Up Cloudflare !!

      Assuming you're on a residential ISP, and you don't have a static address, all it means is that your dad, or one of your neighbors, has been up to no good, and now you have the address that they were using.

  5. alain williams Silver badge

    Screwed up but confessed

    We all make mistakes. The impact of some people's mistakes are bigger than for others.

    At least Cloudflare admitted it and explained how they got it wrong. That earns some forgiveness (from me at least), unlike those who try to blame someone else.

    1. Mixedbag

      Re: Screwed up but confessed

      Cloudflare never fail to impress with how transparent they are when they screw up and how quickly and well written the post incident report is. Very few other orgs get that this is the way to retain trust.

      Stuff is always going to go wrong or changes have unexpected consequences, it how you handle it that matters.

  6. Ropewash

    That's one way to promote rapid resolution

    "Revolver alerts were cleared by 2254 UTC"

    Would that be a 38special alert or a 357magnum alert?

    1. Eecahmap

      Re: That's one way to promote rapid resolution

      It was a Nagant M1895, so easily suppressed.

  7. IGotOut Silver badge

    @ElReg

    "Every business using Cloudflare had no internet for the length of the outage. NO DNS = NO INTERNET."

    This is the 1.1.1.1 service, any one using say Google's DNS would see a non-issue.

    As a the reg is a Cloudflare user, did your site go down? No.

    If a business does not have a secondary DNS set up then they really should be looking at their IT team and going WTF?

    1. Jamie Jones Silver badge

      Re: @ElReg

      If a business is using DNS forwarders at all, they should really be looking at their IT team and going WTF?

  8. Anonymous Coward
    Anonymous Coward

    I have 2 PCs. One is manually configured for 1.1.1.1. As soon as I realised it looked like there were DNS issues I tested 8.8.8.8, which worked

    The other machine is configured for DHCP and used the router for DNS. This was configured for the ISP DNS servers. They must be using 1.1.1.1 for upstream DNS as that stopped working at the same time. Again, switching to 8.8.8.8 was a temporary fix.

    1. stiine Silver badge

      why not add both? or use 1.1.1.1, 8.8.8.8 and 9.9.9.9, then cloudflare, google, and IBM would all have to go offline before dns stops working for you.

      1. Anonymous Coward
        Anonymous Coward

        Because I have had problems in the past where the primary is still up but providing bogus info. Takes longer to troubleshoot than just knowing there is something up with DNS and cutting over to a different provider. Plus if you are waiting for down servers to time out DNS resolution is slower.

  9. captain veg Silver badge

    Why

    I really don't get why any reputable business uses, let alone relies upon, Cloudflare. And yes, I know that includes el Reg. Their offer is, essentially, bulletproof* hosting** to anyone prepared to pay, no questions asked. They might as well write an advertising poster 500 feet high "Spammers, crims, ne'erdowells, fill yer boots".

    -A.

    * Except this time.

    ** Yes, they claim that they don't actually host anything. This is sophistry.

    1. Autonomous Mallard

      Re: Why

      Because most organizations do not have a dedicated security team, nor do they have the infrastructure needed to mitigate large-scale DoS/DDoS attacks.

      Cloudflare does. For most, the practical benefits outweigh any philosophical concerns.

      That's without even getting into the reductions in bandwidth egress costs, host server load, and user latency that come from using a well-configured caching policy with a CDN.

      P.S: This was not a failure in the CDN or reverse proxy services. It was specifically the 1.1.1.1 DNS resolver.

  10. firstnamebunchofnumbers

    Tata

    > that Tata Communications India (AS4755) had started advertising 1.1.1.0/24: from the perspective of the routing system, this looked exactly like a prefix hijack

    This is the interesting bit. I bet Cloudflare discovered that Tata were monster-in-the-middling 1.1.1.1 DNS for some of their customers for some time. It would be interesting to know - it's certainly not unusual to hear of Indian Govt requiring ISPs attempt snooping/intercept strategies.

  11. Anonymous Coward
    Anonymous Coward

    CloudFlare... you're not the one.one.one.one. anymore !

    Oddly that service on 53/udp hasn't worked for me for ages but I haven't been arsed to work out why.

  12. xanadu42
    Facepalm

    "The root cause was an internal configuration error and not the result of an attack..."

    Isn't that the "Standard" nowadays?

  13. Esso
    Holmes

    Avocating for the devil

    They do a great job of owning up to their mistakes and doing a breakdown of what happened, publicly.

    Me? I'm 9.9.9.9, among others. Your mileage will vary.

  14. Esso

    Were all 1.1.1.1 services affected?

    Like WARP, WARP+?

    Being routed different were they unharmed by the change?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like