Reply to post: Re: Testing methodology

Software update borked radar, delayed hundreds of flights, says US FAA

swm

Re: Testing methodology

Forty-five years ago I was the chief architect of a 100 (eventually 200) user time-sharing system. One of the design goals was to make sure that users couldn't "hack" the system and interfere with other users. Another goal was for the system to be up >99% of the time. Any crash (which rebooted in less than a minute was counted as 15 minutes of down time. Testing was done during experimental times when we would attempt to crash the system, put on large loads etc. We eventually could log a scheduled uptime of over 99% - this includes failures from all causes such as power failure, operator error etc.

We also had some users that would attempt to hack the system and they were partially successful a couple of times. Once they installed a trojan that did interesting things when run by a privileged user. They also found a hole in our disk quota system (a bad compare with the maximum integer value).

Sometimes the system (after a year of pretty solid operation) would crash and we would find a bug that had been in the system for years. Examining the bug we couldn't understand how the system had ever worked - but it did.

My take away from all of this is that it is extremely hard to make a large system completely air tight. Testing cannot be complete. Users change their behavior over time exposing bugs. Updating a system that is running is even more difficult. I think one important thing is to recover quickly and not lose important data.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon