Reply to post: Working within tolerance?

Monitoring is simple enough – green means everything's fine. But getting to that point can be a whole other ball game

Mike 137 Silver badge

Working within tolerance?

"The basic thing you want from your monitoring is to show that, when everything is working within tolerance, all indicators are green"

As long as the dashboard shows green right across, it's generally assumed that this represents normality. But it doesn't. It represents someone's assumptions about normality that got locked into the machine at some point in time. Thereafter the machine is trusted implicitly. For example, when Equifax let its scanner decryption certificate expire, all lights remained green because all data (including the malicious attack data) passed through unchecked. They're not alone. In my several decades of consulting, I've practically never met an organisation that was aware of what normality actually looks like.

In order to monitor successfully, you have to monitor the changing operation of your systems, the changing external environment and the changing internal environment - all the time. So you're mostly monitoring change, but to do that you have to know what the starting point is. The big problem, obviously, is that the starting point today may not be the same as yesterday's. So the normality you need to find is not a set of static values for monitored points, but a typical rate of change for each point. in order to identify the rate at which change is happening and compare that with the "normal" rate of change. The "tolerance" will be (for example) an acceptable variation in the rate of change at each monitored point that exemplifies the cycle of normal business activities round the clock.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon