Reply to post: Re: Infestation-

*Thunk* No worries, the UPS should spin up. Oh cool, it's in bypass mode

Anonymous Coward
Anonymous Coward

Re: Infestation-

"give an estimate of likelihood and potential cost to business (eg 1 site goes down, 100 users are twiddling thumbs for 2 days, that’s 200 days worth of wages, if it’s a warehouse add costs for late deliveries during peak season and so on and so forth). Put it all in a nice excel file with estimates from all and sundry"

In my experience, unless your likelihood figures are realistic, you need to have a chance of failure >33% before anyone from beancounting and/or senior manglement will take the risk remotely seriously. I've seen plenty of people who should know better assume that a 10% chance is the same as "will never happen". Even if you've got accurate ways of measuring the risk (because a lot of this is frequently finger-in-the-air wizardry), too many people will ignore the risk entirely anyway, because those IT guys are always pessimistic and grumbling.

Of course, as Pterry so rightly pointed out, million to one chances come true nine times out of ten.

A previous job we had several racks in DR using only one UPS, and we said the chance of one of the UPS failing during an actual DR was about 20% due to the increased load (hand-wavery here was we didn't know what the actual max load was going to be during DR, nor exactly what the maximum load on the aged UPSs could withstand before falling over). Beancounters and senior management said that this was an acceptable failure risk and said that, if the UPS failed, just plug some or all of the servers into the UPS in the next rack to take the load off the overloaded UPS. This was added to the DR plan by the beancounters without knowledge or signoff from the techies, because who needs their opinion?

Guess what happened come the next DR (exercise thankfully, not an actual disaster). The business learnt the hard way that "cask aiding" doesn't mean assisting out a forlorn barrel that's down on his luck.

Of course, it turned out it was actually ITs fault for not correctly identifying the risk, because the chance of failure in hindsight was actually 100%, and the beancounters couldn't have been expected to allocate budget accordingly if IT gives incorrect information. If half the racks hadn't have died, the chance of failure would have been 0% and thus IT would have been at fault for incorrectly saying there was a risk of failure and the beancounters would have been entirely right in denying ITs frivolous request.

<need a Catch 22 icon>

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon