Woohooo!
"Team building" day with the BBQ and the Xbox it is.
For at least the past hour or two, Microsoft's Azure cloud has been up and down globally due to a DNS configuration mishap. The platform-wide outage has knackered all sorts of Redmond-hosted systems around the world, from Azure SQL databases and App Services to multi-factor authentication, Microsoft 365 and Teams, Dynamics, …
Heading to the train in 10 minutes, always ride with a manager of a company who went full-on Azure ("we want the stability of Microsoft" he always says, "we have no truck with some company that started out flogging books" he always says).
The morning coffee will taste extra delicious this morning!
Isn't that the service that makes it so easy to get up and going but costs you at least two legs and one arm to get all that lovely data out?
Azure isn't much better with it's record amounts of TITSUP in the past year.
Yet PHB's the world over are still falling for this cloudy 'snake oil'... {shakes head in disbelief}
Suddenly on premises starts to look a whole lot better. At least there, you are in control (local JCB Operators permitting naturally)
Hmm, wasn't it the Amazon one where they were eventually forced to turn the whole thing off in a region (Western USA????) to stop their self-replicating fuckup taking over teh hole wurld....?
Of course, that was after their poor bloody customers had spent the thick end of two days living with it limping toward its final death....
Cockups are a given. It's how you deal with one that's key.
A lot of companies have Sharepoint, because MS told them it was going to be the next big thing. It wasn't, and companies paid loads for it. It's a pig to develop for too, isn't user friendly and is way too complex for what it actually does. There are better alternatives for a lot less money.
All these PaaS and SaaS services, including the ones Microsoft uses to run Azure, run on real machines _somewhere_ and are subject to real-world on-prem style failures. Something foundational like DNS is horrible to lose because you basically have no way to get to anything to even start fixing the problem. Developers are used to all their abstractions and basically don't have any idea what to do when their call to a hostname fails. Azure AD would be another one...imagine not being able to even log in to systems to start troubleshooting without using some sort of emergency break-glass kind of access.
I'm not interested in 100 hour weeks, but if that weren't a requirement I'd love to work for one of the cloud providers. The systems they have in place to keep that massive tower of abstraction running must be amazing. But yeah, if you lose DNS your best bet is to get it back immediately.
I find that a normal good troubleshooting tip: For such a simple thing, it was so many varied and exciting ways to bugger up your network.
It's OK; I'm sure going to IPv6 where the recommendation is to always use DNS instead of Direct IP calls will make this mess a whole lot easier...
Could that account for my inability to send email to an NHS address on April 29th? Or indeed the message that took about 24 hours to reach me from a friend on May 2nd/3rd, with most of that time spent on outlook.com servers?
The bounce from the NHS mail contained full diagnostic information. It was in a mail loop at outlook.com:
Received: from AM6PR10CA0087.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:209:8c::28) by AM5SPR01MB03.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:206:1b::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1835.13; Mon, 29 Apr 2019 14:44:48 +0000
Received: from HE1EUR02FT049.eop-EUR02.prod.protection.outlook.com (2a01:111:f400:7e05::207) by AM6PR10CA0087.outlook.office365.com (2603:10a6:209:8c::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1835.12 via Frontend Transport; Mon, 29 Apr 2019 14:44:48 +0000
Received: from EUR04-HE1-obe.outbound.protection.outlook.com (104.47.13.51) by HE1EUR02FT049.mail.protection.outlook.com (10.152.11.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1835.13 via Frontend Transport; Mon, 29 Apr 2019 14:44:48 +0000
Received: from DB6PR1001CA0034.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:4:55::20) by VE1PR10MB2879.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:803:10f::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1835.15; Mon, 29 Apr 2019 14:44:46 +0000
Received: from AM5EUR02FT030.eop-EUR02.prod.protection.outlook.com (2a01:111:f400:7e1e::209) by DB6PR1001CA0034.outlook.office365.com (2603:10a6:4:55::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1835.12 via Frontend Transport; Mon, 29 Apr 2019 14:44:46 +0000
Received: from EUR03-VE1-obe.outbound.protection.outlook.com (104.47.9.54) by AM5EUR02FT030.mail.protection.outlook.com (10.152.8.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1835.13 via Frontend Transport; Mon, 29 Apr 2019 14:44:46 +0000
Received: from AM6PR10CA0049.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:209:80::26) by HE1PR10MB1548.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:7:5d::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1835.13; Mon, 29 Apr 2019 14:44:44 +0000
Received: from VE1EUR02FT014.eop-EUR02.prod.protection.outlook.com (2a01:111:f400:7e06::208) by AM6PR10CA0049.outlook.office365.com (2603:10a6:209:80::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.1835.12 via Frontend Transport; Mon, 29 Apr 2019 14:44:44 +0000