Wrong wrong wrong
This article irritated me greatly. Now I can't comment on what happens at the very top level of such a large business, but as someone who has recently jumped ship from being a "geek like you" to a service continuity manager, I have to take issue, and especially with this bit in particular:
"If the DR site was working, why didn’t it take over? Can the LSE put paid to the rumour that they were running exactly the same software for both live and standby? If you are Clara Furse reading this, here’s a hint, two copies of the same software will probably crash at the same time, given the same inputs. That’s why grownups use multiple versions.”
Did Accenture tell you that? Did it sound like a luxury to the media beancounters you appointed? What a load of rubbish. Are you seriously suggesting that the reason you have a DR system, is merely in case the software crashes? And that the way to recover from a software crash is to have some "different" software on your DR system? If so, how different? A newer version? How about an older version? Should my production systems use Oracle on unix, and the DR system DB2 on mainframe?
No, the DR system should be identical to your production system in terms of it's design and functional behaviour wherever possible. OK, depending on the size of your business and the importance of the system in question you may have have lower capacity, or other limitations as part of your design, but these differences should be carefully considered, and may be chosen for reasons of cost, complexity or any number of other reasons, but the nutshell is that what you choose to make available in a DR situation has to mirror the functionality, if not the specification. You can't make arbritary changes to second-guess potential future problems.
To deliberately choose to run a different system in case there is some sort of bug, or error that could cause the design in question to fail is both impossible to do effectively, and not a function of service continuity, but of system design. This is important!
I would like to see what happens when there is a problem in terms of there being a real "disaster" (fire/flood/power cut/sysadmin-gone-mental), whereby the failover to DR systems would have worked, only didn't because the software was different. Your comment "but it’s obvious from this event that if the pathetically vulnerable St. Paul’s site is taken out, we can have no confidence in when the market will be back on line." misses the point that it's entirely possible that the DR site could have worked perfectly, if the cause of the fault had actually been the primary site being "taken out" rather than suffering from potentially poor design, or unexpected "input" (whatever you mean by that).
I note that there hasn't been much press coverage in sufficient details for me to understand what actually went wrong, but I know that if it were my systems, I would not want you to be working to fix it, not only do you misunderstand the concept of what service continuity is and how to effect it, but your "holier-than-though" attitude would surely waste peoples time and misdirect their efforts. You may understand when you become one of us "grownups", stop publishing these childish rants and behave like a (fully) responsible journalist...