back to article Azure is on fire, your DNS is terrified

Microsoft Azure DNS suffered a wobble Thursday afternoon UK-time, taking down a number of services. According to a status update on Microsoft's site, the issues began around lunchtime. Customers using Azure DNS in multiple regions experienced difficulties connecting to their goodies at the moment due to the mysterious issues …

  1. hplasm Silver badge
    Holmes

    Blimey!

    Who would ever think such a thing could happen?

    1. disgustedoftunbridgewells Silver badge

      Re: Blimey!

      What? A football reference in El Reg?

      I'm surprised too.

      1. Alexander J. Martin
        IT Angle

        Re: Blimey!

        I've asked a lot for the chance to spend a week covering Will Grigg's life. Alas, management keeps bringing up that old (see image) chestnut.

        I've pleaded and begged. "Please," I've said, "We can look at what OS he's running. I heard he likes Arch!"

        Cameth then the order to get back to work. Nobody uses Arch.

    2. TheVogon Silver badge

      Re: Blimey!

      "Azure DNS, which currently is still in preview"

      So a beta service meant for testing...

  2. Greg 24
    Facepalm

    In preview

    Shirley no one would be using a preview feature for anything mission critical.....right?

    1. AndrueC Silver badge
      Meh

      Re: In preview

      Trouble is it seems to be affecting other services which are not in preview. Things seem to be coming back now though.

  3. ma1010
    FAIL

    The cloud strikes again

    Someday perhaps I'll understand why people and businesses want to put their own data on computers that belong to some corporation in some distant location which depend on the Internet to work at all.

    If something breaks on your in-house IT and you're the IT guy, you can do something about fixing it. If something breaks in the cloud, all you can do is whine about it. And wait for someone else somewhere to fix it, eventually. The cloud demotes your IT department to the "using classes," as the BOFH would put it. At the BOFH's company, those "cloud solution" sales people wound up in the subbasement with the PFY, a hole punch and a soldering iron. Good place for them.

    1. Anonymous Coward
      Anonymous Coward

      Re: The cloud strikes again

      2 reasons:

      1. You're a small company and simply cannot afford your own IT systems and people.

      2. You're a large company and you love OPEX over CAPEX.

      1. disgustedoftunbridgewells Silver badge

        Re: The cloud strikes again

        3: You reasonably assume that a large company who specialise in providing infrastructure will be better than you at providing infrastructure.

        4: You want to scale up without a fortnights lag ( spec server, buy server, wait for delivery, arrange access to datacentre, drive to datacentre, install kit, drive home, find out you missed something, phone up, beg the guy to have a look, give up on that and drive back to datacentre, fix problem, drive home ).

        5: You want hosting in multiple countries without having a physical presence in those countries.

        6: Gmail is perfectly good and reliable. Why host something in house that's not quite as good?

        There are some decent reasons. Most apply mainly to small and very small companies.

        1. Dan 55 Silver badge

          Re: The cloud strikes again

          Number 6 is pushing it.

          1. Destroy All Monsters Silver badge

            Re: The cloud strikes again

            Number 6 is pushing it.

            Better call Rover, then.

          2. TheVogon Silver badge

            Re: The cloud strikes again

            "Number 6 is pushing it."

            Quite. Office 365 is much better...

        2. DonL

          Re: The cloud strikes again

          "3: You reasonably assume that a large company who specialise in providing infrastructure will be better than you at providing infrastructure."

          Large companies have large company issues and a complex and constantly changing infrastructure. Also, some people working there are good and some a less good (who will mess stuff up).

          If you don't need the complexity than chances are very good that you can build a more reliable and more cost effective infrastucture yourself.

        3. DonL

          Re: The cloud strikes again

          "6: Gmail is perfectly good and reliable. Why host something in house that's not quite as good?"

          If a customer has problems sending email to us, I can see exactly what is going on. Also I can often work around the issue even if it's a problem on their end like wrong SPF records or faulty TLS configuration. I can even proof a message was delivered to someone in case there is a dispute (like someone (falsely) claiming an email was not delivered, which could have otherwise had financial/contractual consequences).

          That can be difficult in the cloud.

          1. Anonymous South African Coward Silver badge

            Re: The cloud strikes again

            "If a customer has problems sending email to us, I can see exactly what is going on. Also I can often work around the issue even if it's a problem on their end like wrong SPF records or faulty TLS configuration. I can even proof a message was delivered to someone in case there is a dispute (like someone (falsely) claiming an email was not delivered, which could have otherwise had financial/contractual consequences)."

            So much this.

            Cloud-based email server means it is YOUR problem even though the problem may be with gmail itself.

            Naaah, I'll rather procure a good server, install a proper MTA on it, and stick it in a data centre where it'll have good bandwidth, security and best of all, company employees can access it anytime, anywhere.

      2. zkysr

        Re: The cloud strikes again

        "2. You're a large company and you love OPEX over CAPEX."

        If it is about money, for the large company, it is more like you love OPEX and could careless about the total cost. Public cloud is ~2.5 * the cost for a large company when used alone. The real reason is: you know how to innovate, need to move fast, and are more concerned with making money than pinching pennies.

        1. Adam 52 Silver badge

          Re: The cloud strikes again

          "Public cloud is ~2.5 * the cost for a large company when used alone"

          I doubt that. It's about 2.5 times if you consider hardware only and ignore licences, power, bandwidth, cooling, rent and payroll.

          1. zkysr

            Re: The cloud strikes again

            ""Public cloud is ~2.5 * the cost for a large company when used alone"

            I doubt that. It's about 2.5 times if you consider hardware only and ignore licences, power, bandwidth, cooling, rent and payroll."

            You need to do the analysis instead of relying on gut feelings and doubt. The ~2.5 * figure is *all in* and can be greater if the large company really knows what they are doing. Cloud companies are in the business to make a lot of money. Many IT organizations 1) do not know how to do proper capacity planning so they under provision or severely over provision 2) cannot make it easy for their end users to get what they need in a timely manner. So for many, public cloud is worth every penny and if the primary motivation for IT decisions is saving money, more money will be lost than saved no matter what the deployment model is.

            1. Adam 52 Silver badge

              Re: The cloud strikes again

              "You need to do the analysis instead of relying on gut feelings and doubt"

              I have. For both a £200M turnover SME and a £5BN turnover FTSE 100. In both cases public cloud worked out cheaper.

              The real killer is people and opportunity cost. You need huge numbers of people to keep up with the public cloud providers if you want to be anything other than IaaS. Thousands of developers waiting for you to incorporate, for example, Kafka into your stack when they could just be buying Kinesis from AWS is massively expensive.

              We had the luxury of already having data centres in Europe, US and Asia with comms between them, if you didn't it'd be even slower and more expensive.

      3. Steve Davies 3 Silver badge

        Re: The cloud strikes again

        2. You're a large company and you love OPEX over CAPEX.

        And that means SFA if your biz can't take any orders for days or weeks.

        As CFO, you will soon be out on your ear (like everone else as the biz goes TITSUP).

      4. CrazyOldCatMan Silver badge

        Re: The cloud strikes again

        > 2. You're a large company and you love OPEX over CAPEX.

        3. You are a medium-sized public body and have miniscule CAPEX..

        Our capex this year (and for the next few years) for the whole organisation is tiny - less that just our IT capex for previous years. So, no more tin for us..

    2. Tessier-Ashpool

      Re: The cloud strikes again

      That pesky internet strikes again. Why oh why can't we all just go back to using floppy disks and IBM desktop PCs.

      Shit happens. DNS gets messed up all over the place. Routers get hacked. Warehouses burn down.

      Shit happens whether you host your stuff in a foreign data centre or up the road in Barnsley. And I'd warrant that many a sysadmin has had a nervous breakdown when he, magically, becomes the guy who can just sort it out.

      1. hplasm Silver badge
        Happy

        Re: The cloud strikes again

        3Jane finds your lack of faith disturbing, young Tessier-Ashpool...

      2. DonL

        Re: The cloud strikes again

        "DNS gets messed up all over the place. Routers get hacked."

        Nonsense, if you run bind and place the primary server on-premise and secondary server off-site (cheap cloud VM) chances are extremely small that you will have any DNS issues.

        (Regarding routers: If you only allow management protocols to be reached by your own IP's, it's highly unlikely that it gets hacked.)

        It's only when you try to overcomplicate things that problems will arise. Probably Microsoft added a full blown SQL backend to it or something which will greatly increase the risk of issues.

    3. druck Silver badge
      Stop

      Re: The cloud strikes again

      ma1010 wrote:

      Someday perhaps I'll understand why people and businesses want to put their own data on computers that belong to some corporation in some distant location which depend on the Internet to work at all.

      But even if you can conjurer a justification for using the cloud, why that could? Some company with a massive IT infrastructure who decides to make some money on the side by renting it out as a cloud perhaps. But a company who's very foundation is a blood sucking leech like vendor lock-in, and continuous turning of the screw on licences, has now built a cloud - isn't the alarm bells ringing?

    4. TheVogon Silver badge

      Re: The cloud strikes again

      "which depend on the Internet to work at all."

      Azure doesn't. There are numerous ways to access with a direct connection including via MegaPort and ExpressRoute.

  4. Anonymous Coward
    Anonymous Coward

    Each new Blue Sky Of Death makes Azure seem less like a viable proposition for companies wanting to host applications developed on Microsoft platforms. Working at a company that uses Azure for a lot of application hosting today is triggering serious discussions about how viable it is for us.

    1. Anonymous Coward
      Anonymous Coward

      Can confirm this is the case at other organizations as well. I worked a short stint a state public college and watched their uptime go to hell in a handbasket right after implementing Azure. They're not prepared to use anything else.

  5. TonyJ Silver badge

    But...

    The contract I am currently on have data centres. Their own data centres.

    They are, for fairly obvious reasons not located at the head office. Nor any of the regional offices. They are in purpose build facilities separated by several tens of kilometers from one another.

    They have multiple links provided by multiple vendors along with multiple electrical feeds of the same ilk and backup power etc etc etc.

    And yet things still go wrong. Humans (usually) still manage to cock things up. Kit breaks that wasn't configured quite the way it was believed to have been or was legacy and didn't support a higher availability/resilience type of build.

    Or things happen outside of the DC's.

    There are good reasons not to entirely rely on cloud based solutions, sure, but it's hardly any less troublesome than owned DC's.

  6. Anonymous Coward
    Facepalm

    DNS hosting in the Cloud

    Don't host your DNS server on a Virtual Machine running on top of someone elses Cloud. It adds an unnecessary extra layer of complexity on top that can lead to instability in the service.

    1. Anonymous Coward
      Anonymous Coward

      Re: DNS hosting in the Cloud

      Not really.

      The biggest complexity in a DNS server is the network stack and the path to the pipes. These are gonna show intermittent failure whether in the cloud or in the cellar.

      Unless you are running AD of course. Then it becomes mucho interesting. The spirit of Ballmer with make noises in the machine.

      1. Anonymous Coward
        Anonymous Coward

        Re: DNS hosting in the Cloud

        "The biggest complexity in a DNS server is the network stack and the path to the pipes. These are gonna show intermittent failure whether in the cloud or in the cellar."

        network stack + path to the pipes + cloud is more unreliable than network stack + path to the pipes

      2. Anonymous Coward
        Anonymous Coward

        Re: DNS hosting in the Cloud

        "Unless you are running AD of course. Then it becomes mucho interesting."

        AD DNS is pretty damn reliable - and you can still separate your DNS if you really wanted to...

  7. Stoneshop Silver badge
    Thumb Up

    Whose timezone?

    "According to a status update on Microsoft's site, the issues began around lunchtime, although there is no mention of when they are likely to be fixed."

    Time is an illusion. Lunchtime doubly so.

    1. Anonymous Coward
      Anonymous Coward

      Re: Whose timezone?

      Time is an illusion. Lunchtime doubly so.

      Yup. Ditto for reliability when it concerns anything made by Redmond.

      (yes, yes, I know, I'm merely measuring TTDBMPR - Time to downvote by Microsoft PR :) )

  8. Anonymous Coward
    Anonymous Coward

    Well, uh ..

    Engineers had only managed to identify "a possible underlying cause" as of the update and "are working to determine mitigation options."

    Cause: Microsoft. Mitigation: anything but, really. As we're talking about servers I guess Linux will do.

    Someone had to say it - I could no longer cope with the cliffhanger tension :).

  9. Dwarf Silver badge

    High availability

    Isn't that one of the selling features of the cloudy things ?

    So, were not just looking at one server being a bit wobbly, but all of them.

    Kind'a says that some form of update was applied too quickly to confirm that nothing bad happened, or possibly that they only saw an issue when the last servers were updated, which means their monitoring isn't up to much.

    SCOM anyone ?

    1. AndrueC Silver badge
      Thumb Up

      Re: High availability

      Two hours of downtime during the six months we've been using Azure to host our product. Seems pretty reasonable to me. We don't need perfection and I doubt a team our size could set up and maintain the system we have using our own hardware. Our IT guy can barely keep up with office needs, let alone maintain a growing server farm.

      Anytime we need a new machine (to scale up or just for some devops work) it only takes ten minutes. Spin up half a dozen for a test, kill 'em off a day later.

      A couple of hours downtime..meh. Our client's work got backed up for a while then went through. No biggie.

      1. Vic

        Re: High availability

        Two hours of downtime during the six months we've been using Azure to host our product. Seems pretty reasonable to me

        Really?

        I'd have been devastated if I were running that setup.

        Vic.

      2. P. Lee Silver badge

        Re: High availability

        Part of the problem with cloud is not just the complexity, its the consolidation, which is damaging for the internet. DNS is supposed to be distributed so that its resilient, but Azure (and Google and AWS) have consolidated so much stuff onto their systems, that if they go down, they take a lot with them - and it isn't it a question of "if," its a matter of "when."

        My issue with the vendors is that they appear to have lost interest in anything but cloud. They aren't just offering it, they appear to be actively trying to kill off everything else.

  10. Anonymous Coward
    Anonymous Coward

    Ok We're one of those "idiot" businesses that host on Azure and we were down for about 70 minutes today. All that time MS support were sending DM's on twitter to use letting us know it was being fixed (and we are a very small business).

    Meanwhile our on-premise server has been down since Sunday and is showing little signs of life as the IT people struggle to get it running again.

    So tell me. 70 minutes downtime in the last 12 months or days of fingers crossed with our IT team. I know which one I'll take.

    1. Dwarf Silver badge

      Perhaps

      @AC (Usual MS employee)

      Do you really think MS were sending personal messages, or that they have a simple script that does this for them. Think of it as mail merge 2016 edition mixed with customer comms saying "we're doing stuff", just without the envelopes.

      As the cloud side of things is today's "must-have tick box" for those on the golf course, funds are made available and as maintenance is built in as part of the cost, so you don't get a choice in it.

      However, on the other side, I'd bet that the business people didn't provide the budget for on-site hardware maintenance. Its a bit like an old car - only goes on for so long before it breaks in a bad way. Don't blame the tech staff for not being magicians. If they had a maintenance agreement, then they would have had it fixed in a timely manner too.

    2. hplasm Silver badge
      Windows

      "I know which one I'll take."

      In house MS button pushers or Azure MS button pushers? Why choose- just close your eyes and toss a coin.

    3. Anonymous Coward
      Anonymous Coward

      Amazing how that works (or doesn't). On prem is insanely difficult to maintain but cloud is rainbows and unicorns. MS will kill all on prem in a bit, and then all will be well. After a few years everyone who knows better will have been let go, so there won't be anyone to question it.

      That server the home team is trying to restore wouldn't happen to be the on prem part of a hybrid Lync cluster, would it?

      1. Anonymous Coward
        Anonymous Coward

        Do you really think MS were sending personal messages, or that they have a simple script that does this for them.

        Well.... the pendant of the onsite monkeys will be the answering machine saying to all and sundry that "we are working on it". So you are getting the friendly robonnoucement treatment in all cases.

        That server the home team is trying to restore wouldn't happen to be the on prem part of a hybrid Lync cluster

        That's "Skype for Businezz" now. Hosted in Cambridge, perchance?

      2. Anonymous Coward
        Linux

        On prem is insanely difficult to maintain

        No it isn't, once it's setup and configured, it just goes and goes, at least until the hardware conks out., and you don't have to worry about the latest patch or AV upgrade totally borking the system.

  11. Anonymous Coward
    Anonymous Coward

    Cloud hosted Windows AD domain controller

    I've been testing a public facing, cloud hosted Windows VM acting as a domain controller, terminal server and DNS web and even FTP server for over a year. Patched every second Tuesday of each month and basic anti-malware. Backed up using whole disk image it's proven way more reliable than anticipated. I honestly thought it'd last less than a week before it got hacked but maybe I'm lucky or my cloud host have decent firewalls and IDS. In any case try it yourself you might be pleasantly surprised too...

  12. Anonymous Coward
    Anonymous Coward

    Really? Do you really believe Microsoft is putting SQL on their preview DNS?

    I don't believe that for a second. SQL is a huge service, and I would be shocked if they have their names right now on the preview DNS service. Same goes for any of the Azure services. This is most likely a problem with their internal DNS service, or with SQL itself. I am eager to read their report about the incident.

  13. nilfs2
    Windows

    Just Microsoft being Microsoft

    That's how Microsoft kit works, nothing new here.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020