partially entailed doing ops on a room full of Vaxen. 780, 2x785, 8600, some microvax, lots of drives.
one day when changing a tape over thought 'feels a bit warm behind the 8600' so go and tell the ops manager. we go in, agree it feels a bit warm. go out and look at the aircon control panel. in those days it wasnt a nice lcd panel, it was a grey metal box with panel lights and a power switch. None of the 'unit failed' lights were on. there were 3 units running the area.
still we are coming out of winter and again a week later i'm thinking 'this is too hot, can't be right'. tell the ops manager, we do the same thing again.
then a fortnight later it was early may and the first lovely warm day of the year. I'm in a diferent building and cant seem to log on to anything in the cluster. rush over to the computer room to see all the loading doors open and people wafting air in. The aircon had failed. the console printers were going mad. as a vaxcluster everything major that happened came out on all the printers. but the heat was affecting the connection and the machines were dropping out, causing all the machines to report on the status not just what it saw, but the status that the other machines in the clusters saw as well, so paper was just churning out.
eventually they took the decision to just turn the power off as they couldnt regain control for an orderly shutdown. not good.
when the aircon engineer turned up, it turned out due to a blown fuse in the control panel, the 'unit failed' lamps were not working. to the best of his knowlege he reckoned we were working on 2 units most of the winter, one failed a month ago (when i noticed it getting warmer) and the last one failed that day when it struggled in the heat.
the aircon panel was re-wired so instead of 'unit failed' lamps we had 'unit running' lamps. If the lamp was off we had a problem - or a blown bulb!
thing to be learned is if you've got a hunch there is a problem, don't trust your instrumentation.