What?
You mean we have an existance proof that backdoor keys don't always stay put and aren't only for use by good guys?
Remember that internal super-secret Microsoft security key that China stole and used to break into US government email accounts back in July? The Windows giant has, in its own words, today described how the Chinese spy team it tracks as Storm-0558 obtained that golden cryptographic key, which was then used to break into Uncle …
This seems to be the common refrain in the Ballad of the Microsoft Excuses.
And, of course, we'll never know that it has been corrected. Until the next refrain is sung on the same problem.
Perhaps a their culture could embrace a better response than saying "sorry". Suggesting https://en.wikipedia.org/wiki/Seppuku
To be fair, this chain of events is actually reassuring in that (a) it needed a number of unlikely things to all happen before it became a problem, (b) the background info behind it all hints at far better general security practices than you'd usually see, (c) they've been open and honest about it despite the optics, and (d) they've fixed it all.
It was a bad thing to let happen, but its handling feels good.
(a) it needed a number of unlikely things to all happen
They've made it sound a bit like this, but read it again while assuming: The software is shonky and crashes routinely because of the mentioned race condition. The two mentioned 'credential scanning methods' were rudimentary and could never expect to detect this sort of key. The "standard debugging process" thus allowed this scenario of the key ending up in a crash dump on the corporate network to happen maybe a dozen other times. Then the most unlikely bit might be some threat actor stumbling across it and recognising it's importance. And we know that's not too unlikely.
It is a good lesson in how isolated networks are never and can't ever actually be isolated. The best you can do is reduce their bandwidth to the internet, increase latency, make their connection sporadic and unreliable, and try to perform filtering on the residual data flow. All tough to do well.
(a) it needed a number of unlikely things to all happen
They've made it sound a bit like this, but read it again while assuming: ...
Or break out the tinfoil hats and consider ... there is a deep mole operating inside the inner sanctum and they needed a plausible route to exfiltrate the key without revealing their presence.
I agree with the bulk of what you said, but...
> it needed a number of unlikely things to all happen before it became a problem
Describing them as "unlikely" seems strange to me, because they actually happened. I am reminded of Feynman examining NASA's safety estimates for the Space Shuttle; their "1 in 100,000" probability of a mission loss were immediately suspect from the moment there was a mission loss.
Maybe this particular set of circumstances was unlikely - but in a massive system with many moving parts written by probably thousands of different people faced with adversaries actively looking to exploit any unusual interactions between components, a compromise like this was an inevitability - and to repeat myself somewhat, we know it was an inevitability because it happened.
I would recommend the book "Normal Accidents".
The key had no place being in a production environment no matter how isolated. Any key which Windows (or other OSs (but especially Windows)) can access is a key that will be leaked, one day somehow. Important signing keys should only exist inside hardware security modules, which produces and validates the tokens.
HSMs are expensive and Microsoft buys them in: if you're an Azure customer, Microsoft will charge you something like $3,500 per month for a dedicated HSM and you'll also need to have an annual Azure spend of at least $5M.
It's this level of cost that leads to the root certificate approach.
It's also worth pointing out that access to HSMs (or Key Vault) is mediated by AAD authentication so the ability to "forge access tokens for multiple types of Azure Active Directory applications" is a bit of a circular problem.
That's an interesting thought. But what is equally likely to have happened (IMHO) is that an engineer - a site-reliability engineer, perhaps - was targeted and crash dumps sought out to find bugs in Microsoft products and services that could be exploitable.
Just hunt through dumps that have some kind of memory corruption, and you may find flaws that can be exploited to achieve remote or local code execution rather than a crash.
Or they just got lucky.
Edit: They might also have scoured dumps for sensitive stuff that wasn't redacted -- keys and passwords, etc.
C.
What would you suggest instead?
If you have a crash that is 1. only in production (a re-create a production isn't necessarily a re-create in test) , 2. existing diagnostics (logs etc.) don't provide any information beyond it crashed then what else to do?
Companies that care about security have been dealing with this for decades. I used to work for a university in their administrative computing center. One morning I caused my (well, the university's) mainframe system to crash (i was pretty good at this) and when I reported the bug and uploaded my information, one of their other customers tagged my bug report with 'the same as our bug #xxxxxx.' Their bug had been open with no notes, save for the 'it happened again' updates, for several months because it was a secure system. Their problem was that they were unable to reproduce the problem on their outside of their secure network, so they had been unable to provide support with any supporting information, dumps, etc.
If I were a Microsoft manager, I would move that engineer to QA or Sales so that their access to production systems would be removed. I'd also make sure that the method used to get the production dump onto a non-production system was removed.
Surely the problem was that the dump was removed from -- copied -- from the isolated environment. The rule should be that whatever is in that environment stays in that environment. If you need more computing resources, bring them in. But whatever comes in never leaves (at least not before gong through a very thorough clean and check (which may not be time or cost effective so destroying used hardware might be a better option).
> Redmond assures us it has made changes to prevent them from happening again.
So they've implemented their equivalent of MAP_CONCEAL or MAP_NOCORE, which ensure some memory areas never even make it into a core dump in the first place so your secrets are safe even in the scenarios you didn't expect?
Because that's what "prevent" would mean: fail safe, not just fail.
Maybe the key didn't actually leak at Microsoft and it has just been told to take the hit.
Maybe what really happened was that the key was stolen from one of the many agencies it must be sharing the key with, but they can't afford the embarrassment. We're as used to Microsoft causing security problems as we are to Trump committing crimes or Musk overpromising that it would not really register much..
After all, this is the same government that told you your luggage was still safe despite their mandated backdoor*, and you can get TSA keys now even from Amazon..
* If this sounds familiar, you're thinking of the insane idea to demand a backdoor into encryption that somehow seems to appear every 7 years or so, but in British politics.
I think your suggestion gains a bit of credibility from the fact that the US government seems not to care.
Whereas someone like me in charge, I'd be asking three letter agencies to evaluate that company with a view to perhaps excluding them from all government procurement for the next 20 years. Which, yes, I know, would cost a lot, but I'd think even a half-competent government can probably build their own data centers and go fully open source for about the same price.
> I'd think even a half-competent government can probably build their own data centers and go fully open source for about the same price.
Do you mean "a government which is half-way along the list of governments sorted by competency"? Looking at the number of government-based IT disasters, I really doubt it.
Or do you mean "a government which is half-way to being fully competent" ? I'm not sure there are any of those.
Why can't they just cancel this super-secret golden cryptographic skeleton key?
“golden cryptographic key ..which in the wrong hands can be used to create forged authentication tokens and log into other people's Microsoft accounts”
As distinct to falling into the right hands ;)
However, as per Microsoft's "standard debugging process," workers moved the crash dump from the isolated production network into a debugging environment on the internet-connected corporate network.
Sort of saying, then. "Oh yes we have a dump of supersenstive information that crashed. Let's take it somewhere insecure so that we can have a good look at it.".
Really. No one thought that this was a really stupid idea in general
Who was the dunce who thought it'd be ok that everything could go through the same converged API without actually checking whether the token had the privileges to act as a skeleton key. That is the most egregious error of them all.
Get your OAuth/OIDC right, for goodness sake. It's not that difficult either. OpenID Foundation has examples!
It's an interesting measure of blame... because unless one has proof of who did it, how can blame be laid at a particular "hacker group". Did this hacker group accidentally leave a business card when scarpering from the scene? Was this accidentally dropped or a red herring?
The confident assignments of blame to "China" do sound like the previous "It was commies that done it" claims.
This doesn't mean that a Chinese state agency didn't do this, but I'd question any statement that they did without anything to back it up. Hell, there have been hacks where multiple groups have claimed that they did it and if it really was a Chinese state agency or a group sponsored by the Chinese state, I'd suspect that they'd take a dim view of those who did it showing off about it.
Its obvious that neither of you have worked with a Chinese citizen. Their goals will appear to be similar to yours, but in actuality, their sole goal is to advance China. They will take whatever they want, when they want it. If you think this is untrue or an unfair characterisation, you're wrong.
If you are not finished looking at a crash dump within a couple of days, why keep it? In contrast, why are access logs, which have long term value, regularly deleted?
What's the time between the original crash and the security failure? (Somewhere I read "2 years", but that may be unreliable information).