back to article Firefox, you know you tapped Cloudflare for DNS-over-HTTPS? In January, it briefly knackered two root servers at the heart of the internet

A bug in software pushed out by Cloudflare resulted in failures at the heart of the web's infrastructure, according to a report published this week by the Internet Systems Consortium (ISC). ISC runs the so-called F root server; one of the world's 13 root DNS servers, labeled A through M. These are the central computers that …

  1. JohnFen

    The general trend

    While glitches like this can, and do, happen outside of for-profit corporations as well, I agree that the privatization of the internet is a general trend that does not bode well for the future.

    In the more immediate term, this gives me a little more reason to start looking at implementing a separate DNS service that isn't terribly reliant on the official one. Just in case.

  2. IGotOut Silver badge

    Devil's advocate.

    The issue is, if there is no Cloudflare, no Akami, then the internet as we know it would collapse.

    You can forget any hope of streaming movies, websites would be knocked of line on a daily basis and the amount of malware, spam and suchlike would get to the stage that life on the internet would become a real pain.

    So yes indeed look for weakness and resolve it, but at the end of the day, we need these CDN's.

    1. zuckzuckgo Silver badge

      Re: Devil's advocate.

      >"forget any hope of streaming movies, websites would be knocked of line on a daily basis and the amount of malware, spam and suchlike would get to the stage that life on the internet would become a real pain."

      I think we must both use the same service provider.

  3. Anonymous Coward
    Holmes

    But

    If Cloudflare flames out, Firefox DoH with default settings will fail BUT a Firefox user can easily switch to NextDNS (and Firefox is working to get other domain services iadded).

    The privatization of the Internet is bad, very bad, BUT domain services cost money and I haven't heard of any plan to pay for it.

    Things change, things get worse.

    1. steviebuk Silver badge

      Re: But

      Easy for us to switch. But out of the people I know, there are probably only a couple that would know how to and what DNS is.

      1. NATTtrash

        Re: But

        And how many of those who don't know how or what a DNS is are using FireFox?

        1. JohnFen

          Re: But

          Firefox has shifted to primarily targeting the technologically inexperienced user now, so I imagine an increasing percentage of FF users fall into that category.

        2. steviebuk Silver badge

          Re: But

          A few of them as I install it for them

    2. Nate Amsden

      Re: But

      More likely at least for the non technical users is they would probably switch away from firefox because it's not working and resort to another browser IE or Safari or whatever is default in their OS.

      Maybe if firefox defaulted to evenly distributing the load between all of the DoH providers they support with the option of using only one if you prefer only one. Or at least do automatic active/failover.

      Not that I intend to use this feature in any case so it wouldn't affect me personally. I've run my own DNS for over 20 years now, and if I am out and about I connect with openvpn to my server at a co-lo facility and proxy through that.

      1. Mark 65

        Re: But

        More likely at least for the non technical users is they would probably switch away from firefox because it's not working

        Maybe, but given they don't know it's a DNS issue many might just end up paying $$$ for support they don't (or rather shouldn't) need. They could at least make it default to a "use X if present else behave as previously"

    3. iron Silver badge

      Re: But

      During the testing period they've been doing automatic failover to non-DoH lookups. So if that continues problems at Cloudfare or whichever provider you have set up are not an issue.

      1. cdrcat

        Which defeats the purpose

        One reason for DoH is to prevent MITM attacks. If the MITM can downgrade the DoH to normal DNS, then the attacker can control your DNS.

        1. phuzz Silver badge

          Re: Which defeats the purpose

          If an attacker can interfere with traffic between you and Cloudflare then you already have problems.

  4. chuckufarley Silver badge

    "My ISP didn't have a problem...

    ...So what ever this "problem" is it can't be that bad."

    That is me paraphrasing my neighbor's response to this story. It's his ISP. It's their internet. He is just renting it...

    "DARPA? 1960 what? Compound interest on the tax money my grandparents paid?"

    Great, now I'm a Communist. Again. His world view can't really be this obtuse. He just needs a reminder that all great science starts with an original thought...

    "What the hell do Einstein's daydreams have to do with my WiFi? Would you just let my pick up my dog's poo and go inside?"

    OK, one giant leap of logic for me, and one more person on the block that will cross the street twice just to avoid coming within 10 metres of me for the neighborhood.

    It's not his fault because it's not his internet. It's not my fault for the same reason. It's *our* fault because it's *our* internet. Just like every person in the world owns a small part of the Forschungs-und Gedenkstätte Normannenstraße.

    1. ForthIsNotDead
      WTF?

      Re: "My ISP didn't have a problem...

      Erm... ???

    2. Psmo

      Re: "My ISP didn't have a problem...

      Nurse! He's posting half-sedated again!

      1. phuzz Silver badge
        Coat

        Re: "My ISP didn't have a problem...

        Better bring the dried-frog pills and the big mallet cranial re-adjuster.

    3. steviebuk Silver badge

      Re: "My ISP didn't have a problem...

      Are we all sure that's not an AI bot in training?

  5. Joe W Silver badge

    "an improvement in character encoding"

    Just.... wow. Most of the languages I speak (or at least ask for a beer / the directions to the railway station) require more than plain ASCII (the first 128 chars). Still, I personally avoid anything ø or ö or the n with the ~ (and things like 好き) in a computing context. I also force the C lokale on anything data related (decimal character confusion anybody?). Extending the charset for domain names was a stupid idea then, and it still is and breaks stuff (such as in this case)

    1. Psmo
      Mushroom

      I seem to remember it was a nested regex issue.

      As in:

      1. You have a problem

      2. You add a regex

      3. You have no house, job and the cockroaches are munching your remains.

      1. CrazyOldCatMan Silver badge

        You have no house, job and the cockroaches are munching your remains.

        Have you been talking to my wife? - you've perfectly described her worry-escalation ladder..

        1. Psmo
          Boffin

          I know some who would be more worried by the regex.

      2. Michael Wojcik Silver badge

        job and the cockroaches are munching your remains

        I know Job had some hard times, but I don't recall him resorting to cannibalism.

    2. Anonymous Coward
      Anonymous Coward

      "Still, I personally avoid anything ø or ö"

      Yes, you, personally, because that's not important nor critical for you. For other people in other parts of the world it may be important or critical.

      Frankly, in my language the J is not used, our alphabet has only 21 Latin letters, not 26, so following your practice I would need to write your name like Gioe. And I would need to replace the W as well with a U.

      Would you like it?

  6. Temmokan

    "Extreme testing"

    That magical word "but": "extreme testing BUT we hadn't noticed this special case"

    It only means it's not that extreme. And it means those things can easily happen again, CloudFlare or not.

    1. iron Silver badge

      Re: "Extreme testing"

      Their testing can be as extreme as they want, doesn't mean thay have complete code coverage or even worthwhile tests that are checking the right things.

      1. yoganmahew

        Re: "Extreme testing"

        Absolutely; "edge case" me hoop.

        Everything you don't test is an edge case, that tells you nothing about either how common it is or what the impact of a failure is.

        FMEA should tell you that you don't load a quick-fix to broken code in key infrastructure, you fall the original code back and fix it properly and test it properly. It also tells you your original code was insufficiently tested the first time, since it was loaded with a bug in it.

        1. Martin Gregorie

          Re: "Extreme testing"

          "Extreme testing" blah blah. If you publish mission critical code or make it available for public use you should have a regression test set that is:

          (a) written from the specs, NOT the code

          (b) is updated whenever the specs change

          (c) used to check all code revisions, which MUST pass it with no exceptions before the code is released or goes live.

          If you publish or maintain mission critical code and don't have such a regression test suite for it, you're only playing at writing software. If this describes you or your organisation, you'd best start working on that regression test suite. Fast. Don't forget to include it if/when you publish the code and treat any reported omissions as seriously as you treat software bugs.

    2. phuzz Silver badge
      Thumb Up

      Re: "Extreme testing"

      Putting a change into production is testing it in a realistic environment, in my book.

      It's probably a good thing I'm not in charge of anything important isn't it?

  7. Anonymous Coward
    Anonymous Coward

    Is this really a problem?

    Those of us who know what we are doing can disable DOH. For the rest, the chances of Cloudflare lookups going down are probably about the same as the chances of their ISP's DNS going down.

    For people who are moderately clueful but not clueful enough to know about this whole DOH business I'm sure it would be confusing if Firefox couldn't resolve names but Chrome could, or vice versa.

  8. Doctor Syntax Silver badge

    "Cloudflare also acted quickly: within 21 minutes it had identified that a specific code release, designed to fix a bug that it had introduced four hours earlier, was responsible."

    Release code. Watch everything go pear-shaped. Take 20 minutes to associate the two?

  9. Scott 53

    As Homer Simpson would say

    D'oh

  10. DougMac

    The author of the article doesn't quite seem to understand the nature of the setup of the root nameservers.

    There aren't only 13 root-servers. There are over a thousand root-servers arranged in 13 clusters, each cluster run mostly by different organizations (yea diversity)

    They are anycast, so that your connection will connect to the "closest" one as BGP routing determines. If an organization does do maintenance on a few of the the root-servers, they'll stop doing the BGP anycast announcements so that traffic no longer hits that particular server(s) that is under maintenance.

    So, technology much like the global content server networks like CloudFlare, Akamai, Edgecast, etc. run the root-servers as well.

    1. Nick Stallman

      I came here to mention this too.

      The reason why BGP is involved is likely Cloudflare removing their contributing servers from the F root entirely.

      This probably took time because they were hoping to just fix the code instead of disabling all their F root servers, but they couldn't do it fast enough so they pulled the plug.

      Without Cloudflare F root servers in the pool, all the other F servers would pick up the slack which never had any issues.

  11. Wzrd1 Silver badge

    How 2008!

    As in, when a Pakistani ISP borked, via a BGP announcement, well... The internet, all to filter YouTube hosted videos talking smack about the official faith of said nation.

    So, one court order to an ISP, an ISP with a minimally skilled engineer all worked together to bork a fair chunk of the internet.

    And now, a borked BGP announcement managed to harm something innocuous, after all, root servers are mere accessories, we all memorize host addresses of...

    Sorry, couldn't go on with that, even without a requirement to maintain a straight face.

    Perhaps, we should begin to enforce a global rule. If someone borks the global intertubes, said borker shall be fired - with very real fire.

    1. Michael Wojcik Silver badge

      Re: How 2008!

      This happens All. The. Time.

      One study of 6 months' worth of BGP updates in 2015-2016 found over 100000 BGP hijacks per month. That's a couple every minute, on average.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like