
"stronger guardrails 'to prevent human errors' "
Sounds like better procedures should be used.
But that sounds like work . . .
Microsoft has blamed "operator error" for the multi-hour outage of its cloud SQL Server in Europe last week. "Between 03:47 UTC and 13:30 UTC on 21 July 2022, customers using SQL Database and SQL Data Warehouse in West Europe may have experienced issues accessing services," said Microsoft. The issues were severe for affected …
Poor process control. Just like the famous leap year Azure collapse. When you operate on the basis of least effort for maximum return, such events are to be expected.
The whole point of a cache is you should be able to just trash it -or any part of it - and carry on as normal, albeit with temporarily degraded performance. If it were a true cache, you'd just push the query upstream and save the response for later reuse.
It appears that by "cache" they actually mean "replica" and if they can't get the basic terminology right, it's hardly surprising some operator gets confused about what they're doing.