Re: Loyalty and experience
A lot of this stuff can't ever be written down, because the people who know it don't necessarily know what they know until a problem arises that has to be solved asap.
I recently wrote a document describing the steps in bringing up a particular set of machines after a total power outage. There's (obviously) a specific sequence of switching on gear, and checking whether minimum system functionality, connectivity requirements etc. are met before proceeding to the next step in the sequence.
The product manager for the platform complained that a) I hadn't put a 'shutdown' chapter in the document (err, this is about powering up after a catastrophic power failure. Meaning. There. Is. No. Graceful. Shutdown. Sequence.) and b) that my examples showing the results of the checks whether or not to proceed were trimmed down to show the essentials ("you're supposed to see something like [few lines of output] and NOT [some other lines of output]").
Sorry, dude, this stuff is supposed to be used by people WHO KNOW THOSE SYSTEMS, not the first PFY you grab from wherever it is they're hatched. If the command used to check actually outputs 50 lines of info, they don't need to have all 50 lines of info present in the document, just a handful that show things are OK, and a few that show errors, with a comment on the severity of those errors and whether you can still proceed or not. The full output would be different from the document for any of the 15 clusters anyway, so apart from cluttering up the document and obscuring the essentials, it would be of no actual use whatsoever
This document isn't so much an instruction sheet telling people in detail what to do for every step, but to help them remember the right sequence, and the checks, in a situation when things have gone utterly haywire and several levels of managers may be breathing down their neck. And there are more circumstances that just can't get solved by a PFY with reams of documentation (presuming it is all complete and up to date in the first place), but only by someone who can correctly interpret the (lack of) messages a system is spewing.