Reply to post: Fixing errors

Tolerating failure: From happy accidents to serious screwups … Time to look at getting it wrong, er, correctly

swm Silver badge

Fixing errors

When I was writing the Dartmouth time sharing exec there were errors (of course). Sometimes an I/O operation would complete successfully but, due to an error, the error path was taken. Rather than fix the code immediately we would check that the "error" was handled correctly. It is difficult to test very infrequent occurrences so anything that took the program down a little-used path was quite helpful.

I also learned not to trust the status of "good" after an I/O operation. Once, due to a hardware fault, the peripheral would return a good status even though all of the words had not been transferred. So I checked everything I could - last data control word, word count etc. and declared an error if anything I could lay my hands on did not match my expectations.

I also learned that in the exec there were no "errors" - everything had to be handled. In case of a disk error I retried 3 times (logging the first attempt on the console typewriter) even though the error was recovered) and, if the error was not recoverable, give an error status back to the user (with another log entry on the console typewriter).

Even so, mistakes were made.

Another policy was to tell the operators that, if they made a mistake, to talk about it and they would not get in to trouble.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon