Never mind the quality, feel the bandwidth
So not only was there an untested code path in the code that was deployed, but the code that configures policy changes permits "unintended blank fields"?
Google Cloud has explained the massive outage it created last week and, as has happened many times previously, admitted that it broke itself. The outage struck last Thursday and meant that Google Cloud customers could not access their rented infrastructure for at least three hours. Among the customers impacted by the event was …
Rust would do the same, it's an invariant.
C# and C++ would throw an exception that may or may not be caught - realistically, that also means crash in most cases.
"Memory safety" generally means "won't continue running after corruption occurs". So yes, crash out quickly.
100% code coverage on all code is unrealistic and generally not worth the expense. Obviously this should have been tested. More importantly it should have had a flag that could be simply turned off to mitigate the problem in minutes rather than hours.
Planning for failure is much better than assuming you can prevent all failures from ever occurring. Of course you should test as much as possible, but also assume it'll fail anyway and have a plan around that.
Code that is not covered is code that is not tested. If it is not possible to achieve coverage, then one of the following must hold:
1) The test vectors are inadequate; or
2) The code has not been designed to be testable; or
3) The code is unreachable / on an infeasible path and should be removed.
100% code coverage is required for critical systems (including medical, automotive, avionics, ... ) to ensure that "surprises" to do not happen in service. Less than 100% may be acceptable for a non-critical system.
100% coverage does not need to be achieved for a library used within a project, but it does for the test environment used to validate that library.
Edited to cover "expense".
We will improve our external communications, both automated and human, so our customers get the information they need asap to react to issues
"Reacting to an issue": "Hey, our SPOF - Google services - just failed. We're fucked 'til they fix it. So put a blurb on the answering machines, and it's beer o' clock for us!"
Oh, and don't forget a healthy (?) helping of technobabble gobbledygook. Pepper the excuse explanation with enough officious-sounding gibberish so that the hoi polloi are left with the impression that this stuff is Really Hard™, so as to engender sympathy for the poor beleaguered tech entity that roundly fucked up by taking a shortcut and not testing their shit.
One of the things that surprised me when I was an SRE there (almost a decade ago) was how few checks they had for their config updates--and what there were, were hacked together, not the work of a SWE. So, I'm not super-surprised about the unchecked blank fields.
Not using a feature flag? !!!???!!! Google wrote the book on feature flags! That's incredibly sloppy.
Nevermind adding an untested code path to prod--this is also a deep failure.
Someone is making bad life choices.