As well as testing for failure, ensure that the people are considered as part of this. Keep the experts who know what they are doing out of it - they can watch as the other members of the team manage the fail overs. You can watch as much as you like - you only learn when you have to do it. If the team is screwing up - let them - up to a point. Part of learning is how to undig the hole.

The expert may know to do A, B, C - but the documentation only says do A, C.

In one situation someone had to go into a box to press the reset button - but the only person with the key to the box was the manager/senior team leader.

Having junior staff do the work, while the senior staff watch is also good as it means someone who knows is watching for unusual events.

