Re: Canary Deployment
There at least three kinds of controls to release software safely:
1. Unit testing (partial testing)
2. End-to-end testing (full testing in a non-production environment)
3. Canary/staggered deployments to production
Many companies do all three. Previously, CrowdStrike only did #1, but there was a bug in their unit test that didn't catch their application bug. After the disaster, CrowdStrike vowed to also do #3 moving forward. Noticeably, they did not vow to do #2. For software engineers who work on critical systems and have been doing all three, this looks negligent and cheap.
Canary/staggered deployments are traditionally random, but usually it's in combination with additional safety controls like #1 and #2, or if #2 is not done, canary deployments are done on non-critical systems. CrowdStrike appears to think their Falcon sensor doesn't touch critical systems, or that potentially crashing the systems of some customers is worth the cost savings, as #2 would cost more money.
It's possible that some companies would volunteer to take on unnecessary risk for their vendor and be the guinea pig for no benefit to themselves, but in the long run, these risk-seeking companies would extinct themselves. Another possibility is that CrowdStrike would select low-value, low visibility customers to test on so that customer complaints wouldn't hurt their reputation or too much of their revenue. They did not explain if their canary targets will be random, opt-in, or strategic.