
Lesson learned
Indeed.
I would even say, lesson engraved.
Come inside from the swimming pool, dear reader, and put away that sunscreen, for yet again it is Monday and time to return to the grind of the office and/or remote workspace. Thankfully The Register is here to cushion the blow, with another instalment of Who, Me? – the weekly column in which readers recall the times they would …
With more than a few days' experience in IT that hasn't learned that kind of hard lesson?
My own was back in DOS and Netware days (thankfully also on a workstation and not a server):
Format c:
Are you sure?
Of course I am bloody sure! I know what I'm doing ffs!
Oh hang on...that was C: not D:
Ah crap.
Memories of a colleague back in the day, when we used 3.5" floppies quite extensively. He was muttering and grumbling and someone asked him what the problem was. He said that all he was trying to do was format a floppy disk so he could re-use it, but his computer had started asking loads of "are you sure?" type questions. Unfortunately he made the final confirmation a split-second before someone asked "are you sure you didn't type 'format c:' ?"
Ah, the good old days of DOS and floppies. A common thing that happened where I worked was someone would ask if they could check the contents of a floppy on a workstation -- from the "C" prompt they would type in dir a:, up came the directory of the floppy followed by "that's a load of crap I can get rid of it all". The next command was del*.*, remove floppy and walk away leaving the workstation user with a machine that now had no files in the root of "C". Problems for the user soon followed, even worse if they re-booted it.
Of course if the machine was re-booted it failed with "missing command.com" error as the root of "C" had been deleted -- I was the one that had to sort it out (I had a boot floppy with the necessary files to restore the root of C)
Myself I never used dir a: from the "C" drive, as it was all too easy to forget the machine is still logged into "C" and any further commands will be run on "C" not "A".
If I remember correctly, I once fully recovered (i.e. the machine recovered, I still wear emotional scars) from a "del *.*" in the C:\DOS\ directory. The memory's a bit blurry as this was deep in the 90s but it involved playing a bit a floppy DJ to get the files back from another machine.
"someone asked "are you sure you didn't type 'format c:' ?""
Worse, on Apricot computers, the A: drive was the floppy until you added a hard disk, which then became A: and bumped the floppy up to B: or C: or whatever came next after each hard disk and/or partitions had been allocated driver letters. Not fun if working with both those and "standard" PC's. There may have been others that did it that way, but IME it was only Apricot.
I've recounted this before, back in the days when MS-DOS <> IBM Compatible Apricot used A: for the HD then B: for the floppy. Switching between different machines you'd go to format the floppy, type in FORMAT A:, Y to Are you sure, then oh sh*t (or other expletive). This was followed by frantic hitting of Ctrl-C and then reaching for the Norton UNDELETE utility which, fortunately, worked at that stage of the format. You had to know the first characters of the deleted filenames but the systems were a fairly standard setup.
Forty years ago - so memory is hazy - I was very inexperienced with the Basys newsroom system. I mistook a couple of commands and instead of just restarting the single user I had intended, I reset the entire system: newsroom, gallery, control room... at about five minutes before the broadcast.
Fortunately scripts were always printed, and most of them had been, and also fortunately the system restarted - just - in time. But the screams were loud...
flak jacket--->
The Unix/Linux device-to-device copy program, dd is blazingly fast, asks no questions, and has no prompts.
dd if=/dev/sdb of=/dev/sdc bs=32768 [Enter]
[10 ... 20 ... 30 seconds pass] ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C
"(/dev/)sdb was the target drive! Shit-shit-shit-shit!"
(Icon 'cause I couldn't belive I did that.)
Decades ago, in my first job as a software engineer, I was also asked to administer my company's Sun-3 workstations since I was the only person in the company who had any Unix experience. One day I started to get complaints about one of the machines not working properly, and when I investigated I found that /tmp was pretty much full (the SunOS installed assigned /tmp to its own partition, but when it filled up all sorts of applications stopped working). So, metaphorically putting my System Administer hat on, I logged on as "root" and entered the following command "rm -rf / tmp/*".
Note critical space in an unfortunate place. I certainly did and hit Control-C within a couple of seconds, but too late to save the OS. Fortunately, this was early in the working day and I had performed a routine backup the previous night, so I was able to get the installation tapes (yes, *tapes*) out of the cupboard, re-install the OS and re-load the backup. Downtime was only a couple of hours, but tense ones for me.
Lesson learned - always double check the commands you are entering as "root". Also, the next time this happened I used the command "cd /tmp && rm -rf *" - much safer!
I dislike having certain "dangerous" commands in the history, in case a little bit of lag, or jitters on the arrow keys, causes them to reappear at an inconvenient time. I tend to do things like:
# mv important-sounding-dir xyzzynosuch
# rm -rf xyzzynosuch
Or
# bash +o history ## new shell with no history retained
# dd if=/dev/zero of=/dev/sdb
# exit
Of course I've still had several ohnoseconds over my ${too_many} years in this game. That's why I'm quite keen on backups...
No, because it will simply annoy anyone who needs to delete a lot of files. Every time you do an rm -r and it gets turned into an rm -i -r which asks for your permission for every single file there, it will annoy the users. It won't take long for them to realize that -f cancels out -i, so now every rm command will be an rm -f command which means you will lose the warnings for files that rm would normally warn about. Forcing that only makes things worse.
The other useful parameter is -i, when you need to do something more complex and you're not entirely sure if it will be safe. For example, when I was trying to clear out temp files, but no other files, from a nested directory. I was very careful to specify "rm -i */*/*.tmp". Anything with that many asterisks seemed a bit too dangerous to do without at least seeing some of what would get wiped.
that double & is going to bite you and some day you will make a typo... happened to me a couple of times but fortunately i was not doing rm -rf but 'yum clean all & yum upgrade'
... and had a few moments of confusion wondering why was yum seemingly waiting on... itself until i realized i had typed a single & instead of &&
We all need to have had that sudden panic at least once in our lives, preferably on our own/test systems.
It teaches the fallibility of human condition in the clearest terms.
It is where we get our fear of things we don't fully understand and has us reaching for the manual, making n+1 backups and only when forced do we ignore this sagely wisdom "Draco dormiens nunquam titillandus"
The Ohnosecond, that mysterious time dilation quantum effect. Wen measured by any measuring device it is only registered as a tiny fraction of a second while the human just realizing the full levity of his fuck-up and the full consequences of the action just performed, experiences it as an eternity. Having all the time he or she needs to contemplate the meaning of life and their place in the cosmos.
Quote
"And one of the only other times you will see it, outside of IT, is during a car crash."
You've never been involved in robotic programming then?....... being called over to cell #5 today because the operater loaded the magazine with castings and put them up the wrong way with the hole at the bottom.
Then pressed start.... where the arm grabbed the first casting , stuck it in the lathe and then he noticed the tool heading for a non-existant hole.........., to his credit he did e-stop the cell... too bad it was after the tool and part died (we've had worse than that ..... thats when the ohno second goes straight to brown alert)
> You've never been involved in robotic programming then?
Maybe not an "ohnosecond" event, more never tell someone to not pull that leaver or press that button.
Fresh out of school and at my first job we were all marking time a bit until the tech schools started (we got a one year off the job training - those were the days!), so we got given various jobs around the factory (electrical engineering firm).
Mine was on a large automated lathe turning motor/generator casings that were cut from lengths of pipe. Had a small crane to do the lifting, set into the 3 jaw chuck and press the start button. There was a big drum on the end of the machine with pegs that changed the tools and started the next process. Over the drum was a big lever - NEVER PULL THAT LEVER ! (oh and precious few guards but this was the late 70's in the UK)
So one day the machine seems to stop part way through the program, waited for a few minutes and - yes you guessed it - rather than seeking help I pulled the lever. The drum rapidly did a 90 degree rotation and the next tool fast traversed straight into the casing leaving a lovely deep spiral grove until the whole thing jammed up
Then I went and sought some help!
I always use a script to dd a download onto an SD card, usually duplicating the last line* of the script, commenting it out and then editing the if=value to be the new file name. Nevertheless between pressing enter on ./script and the red LED starting to flash on the SD adapter there's an enormously elongated couple of seconds of thinking "Could I have gone wrong and be doing this to /dev/sda?"
* The script ends up being a series of commented out lines and one non-commented line. From time to time it gets pruned.
I've been working on getting a certain cloud file sync tool working on Linux. Of course, I needed an account to test with, so of course I'd been using my own. As my testing was mostly concentrated on the install and first setup, I kept reverting back to a fresh state and syncing for the 'first time' again. At one point I must have just deleted the synced files...without making sure I'd disabled the sync tool. Then went back to my main computer, and wondered why all of my documents had disappeared.
Fortunately there was a 'restore deleted files' option.
The other side of this is that the prompt absolutely must, no exceptions, contain enough data to know what it's asking you about. Not just trusting that you can scroll up to see the command that launched it, not assuming that the user knows which script they clicked on, the prompt must print that again. If it's really dangerous, maybe make the user type it again to be careful, but that part is at least optional.
Ideally the command should pretend it has been executed and take user to a honeypot, where all the data is wiped off (or whatever worst case scenario could there be). Then once it detected the despair in how user tries to find the extent of the damage, it should display:
"See what could have happened had I listened to you? Do you STILL WANT TO PROCEED or do you want to stop crying and get your data back?"
I'm fond of estimating scope before running. "Do you want to delete 5 rows" vs "do you want to delete 5,000,000 rows". Ideally, give a guide - I have a tool to clear old data that knows roughly how many files are created per day, so the prompt is "do you want to delete 3000 items (expected: 300 per day)?"
Not guaranteed to catch every mistake, but it gives the user more reason to check themselves.
You mean like Microsoft GPOs? "Do not allow client printer redirection" -> Enable or Disable or Not Configured? You always have to think twice on that. I know examples of triple-negatives in their GPO logic, so you have to think three times, like "Allow disabling of XY". In the end you test which gives the result you want, export it with a sensible file name or check which registry setting is changed and re-use it to stop your brain from curling more than needed.
The second question should be something like:
Do you want to exit and skip this?
Which would prevent the operator from just hitting the 'Y' key over and over again. Then again, this is similar to the CAPCHA we see from time to time and it is a challenge to figure if the "operator" is a machine (in the form of a biological unit), or a human (with some thinking ability!).
Once upon a time I had a SCSI drive with removable disc packs. To format a disc pack you needed to answer three prompts chosen randomly from a pool of six. Three of the pool questions needed a 'Y' to proceed, the other three needed 'N', so you actually had to check the prompt before clicking through.
Was with someone in the early days of DOS. She put in the dreaded "del *.*" on her C: drive. But its OK. She realised the error "OH, God, I didn'y mean to do that".
DOS prompts "Are you sure?"
And she replies, whilst talking to me, "Yes I am sure I didn't mean to do that".
Sorry - But I did laugh
I'll admit it ...
I have done this with XCOPY / MIR .
The MIR mirror feature is very dangerous , especially if you make any kind of mistake in the source / dest paths
I ended up chewing the guts out of a colleagues system32 folder and killed the laptop, much like in the story , except worse (and more embarrasing) as it was someone else's workstation
Ditto the trailing or lack of trailing / in rsync paths changing the behaviour of what gets transferred or where has probably caught every *nix user out at some stage.
Also, for those of use using pip on CP/M and moving to MS-DOS and discovering source and destination are reversed. Although to be fair pip using destination=source feels like a bit of an outlier since I think pretty much every other OS I ever used from TRS-DOS on up used source:destination format. I never really used mini or mainframe OSs, so I'll have to assume that CP/M did things they way they were expected on some previous OS.
A thing I "like" about rsync: You cannot prevent it, on the server side, to NOT save the ACLs of the source. Can result in inaccessible files on the server, so you have to use the root/root or BUILTIN\SYSTEM to access the stuff. Maybe it is possible now, but the only way to prevent that is to run rsync server with its own user, which does not have the right to set or change any acls let alone owner of the files.
I may have some of the product names or versions wrong, but basically this is what happened: Back in the late 90's I was working at a site where my colleague was writing a VB6 system for a weighbridge. The system had some issues which he corrected using his Windows 95 computer and using InstallShield to create an installer for deployment. InstallShield automatically detected DLL dependencies and included them in the distribution package. Then he went over to the server, which was running Windows NT 4.0, and fired off the installation of the upgraded software. After rebooting ... the server was trashed. Turns out that NT 4.0 won't boot with Windows 95 versions of the system DLLs in the Windows\System32 directory.
Made mistakes with Amiga hard disk partitioning
Whatever tool I was using to do it (may have been inbuilt one) didn't compare the size of the disk against the size of the partitions.
From memory, you had to specify the start/end block for each partition.
Somewhere along the line, I got confused and managed to create a partition or extend it beyond the end of the disk, what this ended up doing (unknown to me at the time), was wraparound the partiton back past the beginning of the disk (as that's what this partioning app done) - this particular tool didn't check for validity it seems!
Only noticed some days later when weird things strted happening and files got corrupted - took ages to sort out that mass, I think I actually had some backups that I managed to restore (done to VHS tapes no less) - no idea what I lost but I don't think it was much as I was only about 19-20 at the time so didn't have much of importance!
Fun times!
If you search for a few minutes, I'm sure you can find a copy on good old HTTP somewhere. If you search better, you might find an instruction manual that's less likely to cause you a fatal accident. Whichever one you use, if you're found making explosives, that'll be the major problem and yes, they'll cite your possession of instructions on how to make weapons when they try you, because most of society thinks that making explosives isn't great. The possession might also be brought up if they have a different reason to suspect you, but it's not as if downloading that file will bring the police to your door.
One Tuesday morning, many moons ago, as a fairly green system manager of a debtors system, it was my job to start the annual debtors run. So, locked the system for update, started everything off ... all went swimmingly, until I checked the output. Guess who mis-typed one parameter, with the net result that every single account (90,000) had double charges.
There was nothing to do except fess up to my manager, luckily he was more interested fixing the mess rather than executing the guilty. Unfortunately, I had to restore from Friday's backup, meaning everyone in the office lost all Monday's work - and I had to walk the gauntlet of a very disgruntled office on Wednesday morning.
The only long-term damage was the footnote in the documentation (with my initials): "Always back up before starting this process, unless you fuck up like I did". The punchline, though, that 15 years later I was asked to return to the same area - due to a the then system manger jumping ship without training a successor - and, despite the system having been replaced several times, my footnote comment was still there in all its glory.
That which was hard to learn, will be even harder to forget.
I recently had to modify a script I wrote to extract files from disc images, after I mistyped a filename and overwrote something important. It was an old script -- and only by sheer, blind luck that I had managed not to do this sooner.
Still, adding an -e test and a yes/no prompt at least provided a pleasant diversion before the unenviable task of recreating the changes lost since the last backup (so only a morning's work, but it had been a busy morning).
Years ago, my boss came up with the idea of operating an internet cafe in one of our buildings. We needed a way to limit the sessions on each machine, but the Boss did not want to pay for any kind of software to do this.
Being a keen c++ coder, I offered to write something. I hit upon the idea of writing a screensaver to restart the machine when it triggers. After all, Windows already has a timing mechanism for triggering screensavers, so I wouldn't need to write my own.
Being the astute person I am, I realised that people would just leave documents open, so the system doesn't just log them out. So, I altered the screensaver so it did a forced restart. Because the restart was forced, it would not prompt the user to save it.
Then I tested it.. And realised, as the machine rebooted losing me some very important other work. I'd forgotten to disable the shutdown API call. I don'r remember why, but for some reason, the screensaver would not compile after this. But, I had realised it was a stupid way to do this, so I deleted what I'd done, and ultimately, the Internet Cafe was shut down (it was costing too much to staff).
I've seen internet cafe where guy selling access to machines had kitchen timers with a number corresponding to each computer. Once you paid, he'd set up the timer.
He also had a baseball bat and would walk to your desk once timer gone off and firmly informed you that the time is up whilst angrily pointing the bat at the computer screen.
If there was a slow day, he would be napping until the alarm followed by juicy "oh ffs, who is it now" and still half asleep trying to find which timer is making noise.
Some years back I wrote a macro for MS access. It would basically take a series of reports that had already been generated by Access and attach them to generated Outlook mail messages. I easily could have made the script automatically send these messages without any user intervention, but there was always something in the back of my mind saying maybe I should give the user a last chance to review the message before sending. Sure, most of it was probably a CYA type of thing, so if one of my coworkers using the script hit send when they shouldn't have, it's on them and they can't blame my script for anything. Still, it did also have the added benefit of not creating a situation vaguely similar to our "hero" here.
My powershell scripts contain a line near the beginning
$WhatIf=$true
All commands which might cause problems have
-WhatIf:$WhatIf
So as long as $WhatIf is not set or $true all commands only pretend to do something. I did not always do this as you guessed right. My scripts got more -Verbose over time to show the Computername and a few other things to make sure I run in the right context.
PowerShell has inbuilt support to do this with $WhatIfPreference
.
For a particular application we bought a special valve controlled by simple 'open' and 'close' buttons. Taking the finger off the button stopped the valve mid-way.
My eager assistant wanted to perform this operation via the PC and so he wrote a simple little procedure. This made the task no easier but it kept him out of my way for a bit.
"Don't forget to test it" was my sole guidance.
"It works brilliantly" he said later on returning to the office for a cuppa.
Much later, one of the engineers came in and said "Your valve is making a funny noise. It seems to be running all the time."
"What did you test it on?"
"The valve."
"Did you see it?"
"Yes. It worked both ways."
"And did it stop?"
"Er. I think it might have done..... Probably....... Not sure..... Perhaps not..... "
The drive gears were destroyed. I made him work the valve manually.
I performed more than one recovery from similar situations based on two bits of knowledge:
- Unlike DOS FDISK, Linux fdisk doesn't overwrite any data in the partition itself, so if you know the partition sizes you can use it to re-create a missing MBR
- FAT32 stores a spare copy of the boot sector in sector 6.
Fix the partition table, then use DEBUG (or a sector editor, if you've got one handy) to copy sector 6 to sector 0, and you've resurrected the filesystem.
Don't put something like "install new OS? (Y/N)" but grab as much useful information about the environment as possible to make the question better. So before you ask you grab the name of the machine it is running on, who else is logged in, the current drive usage, and running processes. In particular, take note if there are other mounted volumes that would not be expected for this script to run on, are other users logged in or processes not owned by you or the system running.
Then you can not only have something better like "install new OS on 'workstation24c'? (Y/N)" but after you type Y you may have another prompt "Are you sure? There are other users logged in, and/or background programs running, and/or unexpected mounted volumes. Type YES if you are sure:"
I never had to learn this the hard way, I saw others learn it the hard way around me and didn't want to follow in their footsteps!
Right… I mean the prompt is not the solution. Surely the script should have set a marker when it executed, and refused to do anything otherwise. That way it would work on new laptops and not repeatedly unless the marker was cleared.
Adding alternatives for “real” laptops too, like the existence of User folders, would have been the way, IMO
I was writing a disk maintenance program in Turbo Pascal 6 that was going to put X-Tree and PC Tools out of business. So I'm happy to report that my implementations of Interrupt 21H Service 65 and Interrupt 21H Service 58 worked flawlessly the first time around because I tested it on my source code directory and everything was gone. Yes backups, we've heard of those.
I cleaned the wrong database and wiped a 40 person department entire days work.
Luckily the DB-admin could do a rollback to last backup and then rollforward to exact at my delete operation.
But we lost 40 persons * 20 minutes of productivity because it was no cases in the queues coming from the mainframe.
In this case it was IBMs DB2 and a very nice DB-admin that saved my career.
(He actually got a cake from management for fast problem solving)
When I was trained as a Unix admin (for historical context, 486 chips were still rumours) the guys who ran the courses used to demo how to trash a system (admittedly not a system that was doing much).
They'd merrily rm away everything but, miraculously, the system stayed up unless you rebooted.
ISTR you could drop it to single user and then it was possible to reinstall/recover from backup while it was still up and once done it would reboot happily.
Ricardo needs to learn "not to run as root".
Because if it wiped out his boot sectors, that means he was working as an administrator - I'm not sure you could do that otherwise even in the 90's when you were inside the OS itself.
Also... always have a confirm script and/or an "if this is my test computer, then don't actually run these commands" in the script.
I caught one from my team a few weeks back where they were trying to use a script they'd copy-pasted to deploy disk encryption (rather than just group policy it!) to a bunch of machines... and the script meticulously:
- Generated a highly secure random key.
- Encrypted the disk with the key.
- Backed the key up to a file on the server.
I think you can see the problem with the order there.
To top it off, the script was supposed to be used to encrypt multiple machines and the "backup" involved echoing the computer name and key to a text file on a shared network location.
Bad enough in and of itself, but it used > instead of >>.
So now every machine that had the script run, would permanently overwrite the only record of all the previous computer's keys anyway.
The script never hit a real machine, tripping up on my very first eyes-on review and was immediately condemned.
In the space of a few minutes, we deployed an alternative that was vaguely sane and also checked to make sure the key was stored in a secured area before it then proceeded to encrypt.