A production server? I raise you a datacentre!
One of a pair of UPS units had failed and was constantly on internal bypass. To do the work to repair, Health & Safety demanded (reasonably) that the faulty unit be hard bypassed at the wall. This meant we had to have the appointed person electrical engineer on site to flip the switch.
He did and the entire data centre went silent. "Whats that meant to happen?" came the cry from the electrical engineer as all the disks span down and bleeps started happening all over. I just looked at him, bewildered. Without another word he just powered it back on about 20 seconds after power off, and I died a little inside.
NetApp arrays, VMWare hosts and everything booted up. We lost some disks, and lost volumes and the config of one fibre switch reset to an older config (as someone hadn't saved it). Thing was, is that we lost the only two read-write domain controllers as they were both stored on that same volume that corrupted. All other DCs on other sites were read-only. To top all of this off this was an extra secure system which was separated from our main network and the company had skimped on the DR options and only given us 2 days snapshots and no tape backups. We also couldn't invoke a full DR quickly as the disks on the main site were Fibre Channel, while the DR site was iSCSI
We had to get the RW DCs up again (which we did from old snap-mirrors of the disks - not system state backups), do a system state backup of them in their crap, old state we'd got running, and then do an authoritative restore into themselves. There were so many conflicts.
It turned out the labels on the electrical switches was wrong (or the labels on the UPS). Either-way it was 5 days work to get it all up and running.
Things I learned:
Never store CRITICAL passwords for a system within the system (especially if it's super secure/isolated). Have "break-glass" accounts.
Always take a system state backup of at least one DC. Consider storing it offline.
Have more than two read-write DCs, Never store them on the same volumes. Consider one offsite.
Save your switch configs religiously.
Authoritative restores of domains aren't too scary. You'll have to make some sacrifices to some account and object changes but that's worth it!
Don't be afraid to tell your non-technical boss to piss off if he's breathing down your neck looking for answers every 3 minutes. Give him a time you'll update and make sure you update him.