back to article Akamai Edge DNS goes down, takes a chunk of the internet with it

Akamai's Edge DNS service went down on Thursday morning, US West Coast time, knocking over its customers' websites as it fell. As of 0909 PDT (1609 UTC), the status page of Akamai – which sites around the world rely upon to deliver content among other services – said, "We are aware of an emerging issue with the Edge DNS …

  1. This post has been deleted by its author

    1. jtaylor Bronze badge

      Re: Tried to access my bank around 5pm

      "Tried to access my bank around 5pm ...Akamai never tested their systems properly"

      Akamai is pretty darn reliable. If you're unhappy that your bank (just their web site, I hope) has a single point of failure, take it up with them.

  2. chivo243 Silver badge
    Trollface

    Thankfully!

    Something breaks an hour after I check out for 2+ weeks on holiday? Wheeee! NOtT my monkey, NOT my Circus! Not that I can fix something on the Web anyway??

    Glad the situation has a fix...

    Please add a Beer icon! We still can't have two?

    1. KarMann Silver badge
      Pint

      Re: Thankfully!

      Uh, what's this then? -->

      Or did you mean you want another beer icon? Or an icon with two beers in it?

      1. doublelayer Silver badge

        Re: Thankfully!

        I assume they wanted to use two icons, and they're only allowed to select one.

        1. chivo243 Silver badge
          Pint

          Re: Thankfully!

          Yes, Ive wanted two icons for sometime... troll drinking a beer!

          1. Lil Endian
            Pint

            Re: Thankfully!

            Here, have one on me!

            Happy hols :D

  3. Gene Cash Silver badge

    Downdetector?

    How come Downdetector is never down? Who the hell is their provider? Why don't we just all use them?

    1. usbac

      Re: Downdetector?

      Because, of all people, THEY know better than to have a single point of failure!!

      1. Anonymous Coward
        Anonymous Coward

        Re: Downdetector?

        "Taps head". If you have two points of failure then twice as much can go wrong.

        1. Lil Endian
          Joke

          Re: Downdetector?

          Which is the strongest advocacy against having a bit on the side!

          1. Anonymous Coward
            Anonymous Coward

            Re: Downdetector?

            "She's not my side hoe, she's my Hot Standby redundant system, to ensure maximum 'uptime'"

    2. Anonymous Coward
      Anonymous Coward

      Re: Downdetector?

      "Taps head again", you would never known down detector is down because there would be nowhere to report it.

      1. David 132 Silver badge
        Happy

        Re: Downdetector?

        Business opportunity: DownedDownDetectorDetector.com

        1. Jim Mitchell Silver badge
          Thumb Up

          Re: Downdetector?

          DoubleDownDetector.com is shorter

          1. David 132 Silver badge
            Thumb Up

            Re: Downdetector?

            Ah, but if you don't have a functioning "U" key on your keyboard it would be inaccessible.

            Imagine the scenario - yor day's already going badly because yor keyboard is acting p, everyone yo email thinks yo're American with yor talk of "neighbors" and "colors"... and to top it all off, yo can't even figre ot why yor favorite web RLs aren't fnctional...

          2. Anonymous Coward
            Anonymous Coward

            Re: Downdetector?

            and there was me hoping we could have gone down a 1999 dance rabbit hole.

            https://genius.com/Paul-johnson-get-get-down-lyrics

      2. Anonymous Coward
        Anonymous Coward

        Re: Downdetector?

        ...which led me to wonder: if a website goes down and there's no down detector to report it, does it make a noise?

        Which, for reasons understood only by my inner idiot, led me in turn to: if a bear shits in the woods but there's no one there to see it, does it still make a smell?

        Jeez, even _I_ think I'm weird.

    3. JBowler

      Re: Downdetector?

      > How come Downdetector is never down?

      IRC downdetector was down, but I tried quite a few of the outage detectors, not just the one targeted at fluff, so I can't RC for sure. For sure some of them were.

  4. ElRegioLPL

    Cloudflare, Fastly, Akamai & AWS.

    It’s scary how much of the internet relies so heavily on these 4 providers without a plan B

  5. Lil Endian
    Joke

    "Akamai said it had made repairs to address the outage."

    But only for IPv4, cos, well, bollox to it!

  6. Howard Sway

    Bad days happen to everyone

    Is that it? Rather than promising plans for vastly increased resilience, all we can expect is an "aw, shucks - what am I like!"

    I'd prefer, as just one example, an explanation of what type of release and deployment method they're using. Because the frequency of these outages strongly suggests an unwise faith in the Continuous Deployment religion in the most fundamental services the net relies on.

    1. Anonymous Coward
      Anonymous Coward

      Re: Bad days happen to everyone

      I sometimes wonder if these companies CEO's have a poster in their office of a cat and the caption "Hang in there baby"

    2. jtaylor Bronze badge

      Re: Bad days happen to everyone

      "I'd prefer, as just one example, an explanation of what type of release and deployment method they're using. Because the frequency of these outages strongly suggests an unwise faith in the Continuous Deployment religion"

      A friend was involved in setting up early infrastructure at Akamai. They built an architecture that was redundant and heterogeoous to a level I've never seen before or since.

      It sounds like you know a lot about Continuous Development. If you identified a flawed deployment process, I'm interested to hear.

      My experience is more with networking. At any layer, there is always at least one single point of failure. To the consumer, DNS is layer 3 (routing). To a network design, it's layer 7 (application). The actual implementation is probably something like AnyCast using BGP, which is wonderful stuff but also as complicated as it sounds.

      tl;dr: I'm amazed that DNS works as well as it does.

    3. doublelayer Silver badge

      Re: Bad days happen to everyone

      If we consider how much of the internet uses these services and how infrequent these events are, despite the attention each one gets, you could come to the conclusion that the services are actually pretty good at keeping sufficient resilience such that you don't have to worry about them most of the time. I don't know if that's a positive or a negative though--if they were less resilient, then maybe people would have more than one and they could better withstand the failure of one such system.

    4. Anonymous Coward Silver badge
      Facepalm

      Re: Bad days happen to everyone

      Perhaps you missed this critical piece of information: "Cloudflare CEO Matthew Prince offered a "don't blame us" sympathy tweet"

      The "Bad days happen to everyone" message was from a different company - not the one affected. I'm not sure what sort of information you expect one company to give you about a completely different company's procedures???

    5. Graham Cobb Silver badge

      Re: Bad days happen to everyone

      Because the frequency of these outages strongly suggests an unwise faith in the Continuous Deployment religion in the most fundamental services the net relies on.

      I used to work in Telecom Billing. Those guys looked at their colleagues running the network core working on their CI/CD processes for evolving the cloud core and just said "No. If the network goes down for 24 hours that's certainly a pretty bad day. But if Billing goes down for 24 hours that's real money we can't recover!"

      Some things are just too important.

  7. yetanotheraoc

    CTRL+F

    it's always DNS

    Phrase not found

  8. Nate Amsden

    took them a long time to acknowledge it

    As an affected customer it took our stuff out at about 8:34am pacific time. I checked their status page, everything looked fine but DNS was not after several manual attempts to query their systems. Tried to call support, queue was full. Tried to do support chat, was immediately disconnected (that surprised me I expected to be put in a queue even if I was #8590283 in line), tried to file a support ticket, internal server error. Once I saw that, I hung up the phone obviously others were reporting the issue to their support.

    Given their support systems were overwhelmed I'm surprised they were unable to update the status page of their site to show an issue was going on.

    They have a community support page, and that didn't get a post till about a half hour into the incident, and they didn't even get to email me that there was an issue until two minutes after it recovered(9:39am for us, email came in at 9:41am pacific time). Same with their status page the outage was going for about a half hour before it was updated.

    Don't mind the outage, but would be nice if they could get their status page closer to real time status, should have it updated say within 5 minutes of a major disruption like this?

    If companies really cared about a CDN provider going down because it does happen the obvious solution is multiple providers, but not many organizations are up to doing that. Though it's significantly easier than using multiple data centers or for those in public cloud multiple cloud providers. Same goes for DNS providers, nobody is forcing you to use a single provider. If it means that much to you then use a 2nd one(or a 3rd), again it's quite simple (but most orgs don't care enough to do it). I recall noticing Amazon was using Dynect about 11 years ago now for the first time(they were UltraDNS only before). And my Dyn rep at the time said they signed up one Q4 after UltraDNS had a big outage. Seems like today they still use both of those providers at least for their main domain. Meanwhile microsoft is bold enough to rely on their Azure DNS for their main domain.

    1. yetanotheraoc

      Re: took them a long time to acknowledge it

      Maybe Akamai's internal and support tools rely on their own DNS service.

  9. JWLong Bronze badge

    Akamai

    What happened was someone in the break room unplugged the coffee pot to clean it. Shorted out the outlet, popped the breaker which killed the 4 port switch for the DNS server

    Happens all the time, no problem. "Hey, bring that extension cord over here." so someone can trip over it next week.

    \S only, don't have a shit fit!

  10. Bodomit

    Service Consolidation

    It’s a pity that the robustness that made the earlier Internet so great is now disappearing. We now have multiple single points of failure that take entire chunks of the web down when they go TITSUP.

    They’re not monopolies obviously but this much consolidation is clearly (at least to me) having a negative effect.

    1. SecretSonOfHG

      Re: Service Consolidation

      Don't know what is this "robustness" you're talking about. The early internet was way more unstable, starting with the last mile (phone lines, yay!) down to very little, if any, redundancy on the physical layer. The things is, the early internet was way, way smaller than it is now and thus incidents such as this one impacted very few people, services and businesses.

      You're right in that the consolidation you mention has happened, but was unavoidable. In any mature market -all of them, not only telecomms- there is just not enough room for more than a few big players. It is safe to assume that if one of these players starts to under perform in some crucial dimension (reliability, performance, cost...) some other will quickly attempt to grab its place.

  11. Inventor of the Marmite Laser

    #hugops

    Liked that.

  12. JBowler

    Not a cyber attack?

    Acme:

    >We [* * *] can confirm this was not a result of a cyber attack on the Akamai platform.

    How would they know, unless they know exactly what caused it and aren't fessing up?

    This response would be produced in either case; either they don't know or they do know. Such "confirmations" are meaningless and any reasonable engineer and even a few lawyers know that they can't prove a negative.

    1. Richard 12 Silver badge

      Re: Not a cyber attack?

      Well, given that they fixed it, it's reasonable to assume they do know what broke.

      I look forward to a report on the failure in the next couple of days. If we don't get one then at that point we start to assume the worst.

    2. Lil Endian
      Pint

      Acme

      Hehe!

      I can see Wile E. Coyote running around now! If that's who they have managing their ops, they didn't stand a chance!

      Cheers!

  13. -tim
    Facepalm

    How?

    DNS was one of the 1st systems to cope with large scale failure on the Internet. How do you break DNS of this size? If all else, run two different systems.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021