Reply to post: Bah!

How do you call support when the telephones go TITSUP*?

Stevie

Bah!

Not phone-related.

I transferred into a new department and was asked to replace their "expert scripter" who was retiring. One of the jobs he had written was represented to me as "vital, if this doesn't run we are in big trouble".

Said script ran at regular intervals and sent an email if there was a problem of a certain type. The problem was detected by examining the output of a "ps" command and using "cut" on the output to extract the pid, which was typically 3-4 digits on that hardware.

We had deployed a new Unix infrastructure from another manufacturer and the expert had ported this script to the new hardware, but had never checked it was working.

The expert had also usefully redirected stderr to the bit bucket because he never figured out how to make his dot profiles work for logon shells *and* batch shells and the script would fill the server mailbox with "can't do stty keyboard configuration stuff in batch mode" error messages. Said dot profile had a truly staggering amount of code that I think was trying to find out if it was running in a logon script or not. It certainly had no other purpose, but didn't work anyway. I surmised it was the work of several people. I digress.

As part of another project I sorted out the problem with the dot profile so that it *would* work in both use cases (if tty -s etc of course), and that is when I discovered that:

When the expert had deployed the script on the new servers, he forgot to also deploy the mailing list file with the addresses for that "vital" email, and the script was failing on line 2 as a result.

Smiling to myself I fixed that, and discovered that:

The new hardware was super virtualized. One side-effect was that pids were now 6-8 digits long rather than three or four. The "cut" command presented only the most significant of those digits to the rest of the "logic" and so was not working. At all. The "vital" email would never go out.

So I replaced the "cut" part of a massive pipeline with "awk" and the script started doing what it was supposed to.

And that afternoon the condition it was built to detect came about and fifty bajillion emails went out to the man who had told me how important it was he get said emails.

And BOY was he pissed. "Stop these G_D_ emails!"

So I descheduled the "vital" script.

All-in-all, an avalanche of suck.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon