ElReg asks ...
... "Remember when September seemed so far away?"
Yeah, but then I'm old ... It's been September for nearly 10,000 days.
Today is Monday, September the 9862nd 1993.
Remember when September seemed so far away? Those of you still working from your bedroom since March should probably have changed your pyjamas by now. We'll wait. When you're ready, enjoy a tale from the Who, Me? vault courtesy of a reader who knows all about unplanned undergarment changes. "O" spent the early part of this …
Looking after my company's collection of Sun 3 workstations 30 years ago. Someone had been doing some development work that had resulted in /tmp filling up, with all of the assorted problems this gives. So one day when I came in (I was often the first in the office in those days) I logged on to the server using the root account and issued the immortal command rm -rf / tmp.
Note significant space.
It took me about 2 seconds to realise what I had done and hit <cntrl-C>, but by then most of /usr was toast.
'fessed up to my manager and spent the rest of the morning re-installing from scratch. Fortunately I had done a backup just before the screw-up, so I could restore everyone's user directories with no loss.
> 1) before doing any rm -fr do a pwd just in case.
> 2) become very anal about doing total system backups before any upgrade
My personal favourite is to filter what goes into the history on systems where I tend to use the ^R search to repeat commands a lot.
A few things such as a bare 'ls', 'df', or 'du' serve little useful purpose when regurgitated, and it becomes *very* difficult for a typoed search term to pull out a destructive `rm ...` when there aren't any there.
""Is there a non-worst time to do that on a production server?"
Yes. During scheduled shutdown/maintenance windows come to mind. Immediately after a full system backup, of course.
One that happened to a friend down at IBM Almaden ... Running late to get out the door (baseball game was due to start), he accidentally entered a similar command at approximately 5:04 PM local time on the 17th of October, 1989. About one millisecond later he realized what he had done. About one microsecond after that, the Magnitude 6.9 Loma Prieta earthquake hit, with the epicenter approximately 14 miles to the South South West.
The SCSI drive, which, in his words, "was happily losing it's tiny little mind, and destroying mine alongside it" suffered a hard crash before the power went out. Seems that even high-end SCSI drives don't like imitating a pogo stick when the heads are moving around. DriveSavers in Marin managed to salvage most of the drive, thus saving a high-temp superconductor project over a year of data. Drivesavers didn't volunteer that the command had been run, so he didn't lose his job ... but his entire department got yelled at for not having a proper off-site backup strategy in place.
>"proper off-site backup"
>And there's the words of wisdom to live by. What's more, use a differential backup, e.g. rdiff-backup, then you can rewind back to the moment before your files were encrypted by the ransomware.
On a training course probably around 20 years ago in the centre of Manchester. During one of the lunches, it was mentioned about the city centre changing significantly and all the rebuilding work since the 1996 bombing and it turns out that the bomb warning gave the admin enough time to dial in and copy as much data as possible to the servers in other sites before it all went boom - including their original Manchester base.
I never understood the logic of firing someone who makes a mistake. If they repeatedly make the same mistake, sure, but all you're basically doing otherwise is replacing them with someone who hasn't learned not to make that mistake yet.
And if the rest of the company doesn't put procedures in place to mitigate that mistake from happening in the future then nobody has learned anything and you're back to square one.
Well, you're right that you should learn from your mistakes. However, I've found that it's much cheaper to learn from other people's mistakes. For example, it takes about ten minutes to learn how to cut down a tree with a chainsaw. But it's worth spending a couple of hours watching the videos on the YouTubes to learn about all the things that can go wrong. (ladder chainsaw is a good search combination!)
My contribution to the hall of shame involved accidentally recursively deleting /usr/bin, or most of it at least. We were a poor university in the 90s so only backups of user directories. Luckily the system was one of 4 HP 700 series running HP-UX we had scattered around the department so I managed to restore from one of the other boxen and all was good again. Pretty sure everything was restored, no one ever complained at least.
It's amazing how sudo makes one think about what one is about to do, most days.
Who said that user's directories were on /usr? They were actually on /user.
However Sun's lovely installation system (booting the workstation from tape) did not give you the option to preserver existing partitions and the contents thereof; instead it worked on a simple "nuke-it-from-orbit" basis. Hence having to reinstall the OS meant that the user directories would have been automatically scrubbed.
In the very early days (1970s), the user's home directories were on /usr ... Economy of typing further shortened this to /u on very early BSD. At some point, primarily due to splitting the filesystem over multiple disks, the user home directories were moved to /user, leaving /usr for shared read-only executables and their attendant tat.
/user became /home much later ... and just to confuse things, MacOS (which is a bastardized BSD) uses /Users ...
Yes, Sun's "stock" installation media wasn't exactly friendly. However, being based on BSD it was fairly easy to make your own to suit yourself. You could boot from tape, CD or floppy ... depending on your system ROM, of course. And again, changing the ROM wasn't really all that difficult. Personally, I think Sun's biggest sin in the early days was shipping each and every system with the same default root password ... They didn't even suggest changing it in the installation guides! And this for a machine that was designed from the ground up to connect to TehIntraWebTubes ... The 'N" in Sun stood for network.
been there done that
restart in 10
do my change
check my change
get distracted by someone.
$%^& that switch i was just working on has gone off line, 3 mins its back minus my changes & the help desk ignores the monitoring tool showing my mistake and starts trying to understand why some people in the corner of 1 floor of some office are complaining their pc's and phones don't work, swiftly followed by it's all back now. I then have to do my change again before fessing up i caused the outage.
One thing Juniper does much better, commit confirmed with roll back not needing a reboot.
I am usually firm enough with my tone of voice that when I say that I am busy and come back in half an hour, people leave me to what I am doing.
And when they come back, I explain to them what I was doing and why I could not be interrupted.
And I found out that just saying "not now, come back in 30" does not distract me nor make me lose track of what I am doing. Probably because I did not give them time to present their request.
Much better than skipping the "reload in 10", getting distracted by someone, and boring a router that's a 3 hour drive away.
...or so I've heard.
More offices need a convention like a sign, or a glowing red light to indicate "I'm busy doing something that requires full concentration, and screwing up can cost the company a shitton of money, so don't bother me unless the building is on fire".
A graduated scale could work too, indicating what level of disturbance is ok at the moment. Levels could include:
* I forgot how to print on A4 instead of letter (reverse per local custom)
* can you reset my password?
* where we going for lunch?
* the boss's computer locked up and he has a presentation in 10 minutes.
* building's on fire!
* where are we going for drinks after work? (Essentially a NMI).
Whenever such systems have been tried, they are roundly and routinely abused by basically everyone. There's the poser who claims to be mission-critically busy for weeks on end, and the chancer who sets that status when they want some, ahem, uninterrupted Internet time.
And, of course, the huge majority of people who completely forget/ignore the whole thing - who neither set their own status, nor pay a smidgen of attention to anyone else's.
Had a variation of that happen to me ages ago.
rm -rf *
Noticed some files refused to go
Realised those needed superuser permissions
rm -rf *
Realised that this was taking rather longer than expected
Noticed the - after the su....
Realised the backup tape was fortunately still in the tapedrive but ejected, which indicates last night's backup had run.....
Managed to stop the Rm in time to keep the system alive enough I could restore from tape
For a moment there, I thought I knew you....
When I first joined Sun in '87 the sales database in the local offices was Ingres. During a demonstration of how to run a task on a remote machine via the new "on" command, the office became noticeable busier with increasingly raised voices.
Upon asking what the issue was, it transpired that the Ingres database had disappeared, cue some detective work. It appeared that "on" mounted the whole directory containing the programme being run on the remote machine under /tmp. Read /write too..by anyone.
At the remote site, fed up with people putting files in /tmp, a colleague had a cron job that periodically scanned /tmp and removed extraneous files (and directories).
Hindsight is a wonderful gift.
Yeah, I've seen those "capable hands" running at a small software house where I was temping back in 2005. I don't know who set up the server but they had to reboot it at least twice a week because it would just curl up and die for no logical reason. What was worse was that it had done so since it had first been installed. No one ever investigated any further - they'd just reboot it without question.
Near the end of that temping stint (about 5 weeks), they did offer me a full time job there. Having been there long enough to see first hand how the way the place was (mis)managed - and I don't just mean the IT side of it - I politely declined. They went bust a couple of years later. No surprise.
Maybe you are remembering the time Microsoft bought the Hotmail service around that time. It ran on FreeBSD, and continued to run that way for a long while even after the purchase, which was a bit embarrassing for MS, when this tidbit leaked out, as it was in the middle of the "Open Source is Cancer" era of Microsoft. (demon icon for FreeBSD, but the real one looks much friendlier.)
If memory serves, at one point Microsoft used Xenix to write the final "gold" copy of their Windows install disks before sending them off to the duplicator house. It was ironic that even Microsoft couldn't guarantee their internal Windows machines didn't have latent virii.
"It was ironic that even Microsoft couldn't guarantee their internal Windows machines didn't have latent virii."
At a small but high-profile Amiga developer I once worked at, one release's master (floppy) disk was given a virus *by* the duplication house. As I recall, part of their QA was to compare a sample output copy against the original master -- for which they used an infected Amiga. I no longer recall whether the whole run was bad (vs. just those two diskettes), and if it was, whether any of them made it out into the wild.
Somehow, I then became the mastering guy for all of our products, not just the one I was assigned to. I presume my level of outrage at the series of screwups that led to the above debacle is what landed me the task. My checklist included such items as: make N copies of the master, and keep some of them back; do a comparison ourselves; do it byte-wise, not merely a recursive file comparison -- I wrote a utility for that. Last but far from least: flip the $&(! write-protect tab before you send the master off to the duplicators.
Mass distribution of floppies infected with a virus happened several times over the years. There used to be a web site listing them, but today's useless search engines won't tell me anything about the history of viruses, they will only tell me where I can purchase an AV product for Windows. How fucking useless is that?
To be fair, in the early days, CD mastering and pre-mastering software was fairly esoteric, and ran on proprietary hardware. This hardware usually ran whatever OS the lead engineer used/preferred at University ... AIX, HPUX, SunOS, BSD, and yes, Xenix.
But WinDOS? Not so much. It was nowhere near stable enough for such critical tasks. (NT didn't exist yet, at least not in the early days.)
Authoring tools to create master CDs that later were used to make the mass production copies, ran on Windows and DOS so that's how a CD could get infected by a Windows or DOS virus despite the other systems being different.
The hilarious thing is that F-Prot for DOS was free and worked until Windows 98 SE to detect and erase viruses for Windows and DOS. And there was also Norton Antivirus Scanner that at least let you detect the things and was free to use and that one ran in Windows.
We had a QA group that did something similar, a long test job that ran for hours, and cleaned up with something like "rm -rf $(LOGDIR}/".
Inevitably some unexpected failure (is there any other sort?) in the test job resulted in this being run with LOGDIR unset. As a user in the same GID group as the staff. On a lab system which had the default automounter config which NFS-mounted all the user home directories, many of which seemed to have 775 permissions...
The sysadmin finally twigged when he realised that the calls from people saying "some of my files have disappeared" were coming in in alphabetical order of username. A hasty shutdown of the home directory NFS system was followed by some forensic network access to find the guilty system.
Fortunately the overnight backups & regular ZFS snapshots meant that the QA team responsible got away with an apology, and buying a few beers.
Of a completely unfortunate accidental sort of course.
A departing employee completely accidentally left a floppy disk in their laptop, which was set as the boot drive and which by complete accident had its autoexec.bat consisting of just one command that would delete the contents of the C: drive if it was completely accidentally booted from.
The great thing about booting from a floppy is that it is the sort of thing that you can hear and stop with a magic word and a mystical gesture before it does any of that completely accidental damage for which the git knew full well I would be the one blamed.
On the confessional front, have I deleted stuff I didn't mean to delete? Yep. On the other hand I still feel the stinging lesson of a disk failure that happened the day before the dvd blanks arrived upon which the much put-off backup of said disk was to be stuck.
Yep. On the other hand I still feel the stinging lesson of a disk failure that happened the day before the dvd blanks arrived upon which the much put-off backup of said disk was to be stuck.
OH fudge.. I knew there was something I had to do friday before beer time.....
Departing employee should have included a line to re-write the autoexec.bat, or otherwise hide the evidence.
I hope your company sent the disk to him with his last paycheck. I'd have included a note that the (completely accidental) effects of the disk resulted in the deletion of a perfectly glowing letter of recommendation for anyone enquiring as to his employment history.
I'm writing a backup/restore script now, and completely paranoid that the backup won't backup or the restore won't restore or the person using it won't see the error message or someone will try to run it as a cron job with half the environment variables missing or something. There are more ifs with quotes and $? than there are lines that actually do stuff.
I was doing a temp stint at a place that shall remain nameless to protect them from the hoardes of angry IT folks that would surely set fire to the HQ if said name were made known. Suffice it to say it was one of those huge juggernaughts that we all love to eviscerate with vitriol at every opportunity.
I was in a server room tending to an old cluster that needed some TLC. Archaic disks that hadn't been backed up in aeons, software so old it had probably been given first drafts on clay tablets by Egyptian clerks wondering how to spell all the buzzword bingo bullshit, managed by monkies in feisty knickers.
I had verified that the server was no longer connected to the internal network by order of the manager I had been assigned to. I made sure that it wasn't running any jobs that hadn't been marked as non critical, temporary, or otherwise able to be sacrificed without need for panick. At which point I start searching the disks for where the largest concentration of files (sizes, numbers, etc) were to be found.
Imagine my surprise when the largest by an exponential margin turned out to be a personal directory full of porn. I dutifully made an offline copy for *cough* Reasons and began backing up the entire system to the specific NAS unit dedicated for just that cluster & purpose.
I get done, detach the NAS, lock it in the drawer the manager indicated, and began cleaning up/out said server for repurposing to other tasks.
I'm about halfway through when some guy I don't know barges into the server room in a wide-eyed frizzy-haired state & heads straight for me like a laser beam.
"What have you done to my server?!" he roars as if he were Zeus & I'm about to get smote with lightning.
I explain what I'd been brought in to do, show him the paperwork from my manager giving me authorization to do it, & explain that I've just given the machine a fresh, legally licensed copy of the OS (complete with drivers) to prep it for reuse elsewhere.
"You can't DO that! That's MY server!" he roars again.
Unimpressed I show him the paperwork that expressly says I most certainly can & have $Manager's orders to do so.
Back & forth, back & forth, him roaring, me not giving a shit. I'd made *damn* sure I was on the machine $Manager specified to prevent me from fucking anything else up, so to have a different manager berating me for doing my job leaves me wanting to smack him upside the head with a NAS.
He shouts he'll talk to $Manager, I nod & say to go ahead, & while he's off to go have some more shouting, I'll finish the job I want to get paid for.
Turned out that Old Yeller had been some up-and-comer nepotistic bugger that had hit his Peter Principle limit. He was used to coming in to work, vanishing into his office, & surfing porn all day. Actual work? Don't make him laugh.
My having taken down his personal porn server was seriously putting a crimp in his pseudo-productivity & he was Having Words with $Manager about why said interference Was Not Allowed.
Except Old Yeller really should have talked to whomever he'd been related to first. Because that person no longer worked at the company, upper management was sick & fekkin' tired of the useless dolt, & this had been their shot across the bow to get him to actually DO something for a change.
$Manager showed their paperwork authorizing said work & sent Old Yeller up the ladder. At some point Old Yeller was told to stuff it.
$Manager thanked me for not giving in to the idiot, for having had the forethought to keep all the paperwork I'd needed to deflect said idiots anger, and for giving him back seriously needed resources.
I was quite pleased with the trip to the pub for lunch & a pint on $Manager.
I was even MORE pleased with the fat brown envelope he offered for his own copy of what I'd found on the machine.
"It's all on that NAS you locked in the cabinet." made him grin like a shark swimming through a cloud of fresh chum.
I was even MORE pleased when I uploaded the entire trove to my various torrent accounts (Demonoid FTW!) & watched my street cred go through the roof.
Ahhhhhh... fun times!
> one of those huge juggernauts that we all love to eviscerate with vitriol at every opportunity
I'm all for colourful phraseology, but in the interests of factual accuracy I would point out that evisceration = disembowelling cannot commonly be done with vitriol = sulphuric acid...
(I leave the matter of monkies in feisty knickers for the attention of another commentard.)
> I would point out that evisceration = disembowelling cannot commonly be done with vitriol = sulphuric acid...
I dont really see why not. I mean, yes, you'd have to be careful, and it would take a while. But you could get there in the end. Much sooner if you weren't worried about a bit of collateral damage, which if you're disembowelling someone is quite possibly the case.
We had a sysadmin who'd backup a folder with DB backups themselves to a locally attached drive via script. It was a secondary back up as these files were the crown jewels. It also only ran once a month.
It worked fine for years until both she and I were off on leave, someone messed with the server and the drive letter of the target drive was changed to a network drive instead. This could only have been one of the other IT staff.. Anyway end result was a script frantically trying to copy a huge set of large files across a network that wasn't designed for it to a remote location with a PDC which was already low on space.
The script would restart if it couldn't verify the files had copied successfully so it basically looped for 2 days, taking all of the remote sites offline.
In a previous incarnation I was the security person on a bid to run the IT of an anonymous (as alway) Public Body. We went to the potential client's site and got a presentation with the other bidders' teams. The IT manager was asked how many incidents at each level they had in a typical week.
The answer somewhat surprised me. In a typical week they had two level one incidents. That is, twice a week, the system was unusable by most staff and there was no workaround. Twice a week. Every week.
After viewing the 'security' manual I hoped the staff were sensible enough to subvert and ignore it and do their work securely in spite of it.
We ended up bidding for it and losing out to another well-known systems integrator / outsourcer. Who walked away after a month complaining that the client was impossible to work with.
Some System Admin people make mistakes, some are just incompetent.
And so has everybody who has worked in IT for a few decades, particularly when one of the customer's senior manglement lusers insists on "doing it now".
But my personal favourite was when I was asked to delete customer order data from the previous year in a live SQL database - I knew that a cascade delete would also run on the orderdetails table because I had designed the database. Obviously I checked that they had a backup (from the night before) and what was going to be deleted with a: SELECT COUNT(*) AS NumOrders FROM Orders WHERE OrderDate < "01/01/1999" - Then the phone rang as I was copying part of the line, just before I was about to paste it into a new command that I had started with DELETE FROM Orders - About fifteen minutes later I pasted what I thought I had stored in the paste buffer, just as the manglement luser came in and pestered. The paste added only the semicolon, but I still pressed [Enter] - The statement was therefore "DELETE FROM Orders;". Fortunately I was able to roll it back from the transaction log in spite of their system admin's new policy of saving storage space by truncating logs before they "got too big"...
The SQL Tool we used at work has an auto-commit function. I was often mocked for turning that off because 'I should have faith in my SQL'. But it always struck me as an accident waiting to happen and I preferred to commit as a separate step; that way I had the option of doing rollback as a separate step if something unfortunate had occurred.
It is always good to understand the RDBMS (or specific client) you are working with. Someone used to Oracle would be in for a shock if they had to work with for example Sybase where default is unchained mode. No rollback for you if you didn't explicitly begin transaction (or change chained mode for your session).
I made a (windows) temp cleanup tool quite a while ago. Deletes all but the 10 newest entries - some programs rely on things in TEMP during installation after a reboot.
For known reasons I wrote it this way in the for /f loop:
del /q "%TEMP%\..\temp\%%A"
rd /s /q "%TEMP%\..\temp\%%A"
Yes yes, trust environment variables.... NOT.
In the early days of Unix on PCs (Interactive Unix, pre Linux) my team had 386 workstations. My colleague asked me to remove my user account from their machine to free up space I did that but left just a login with a home directory of root. That should have been that but the owner then decided to completely remove my account, blithely answering yes to questions such as remove home directory. The re-install involved a box of floppies.
Ever created what you thought was the neatest utility ever, only to realise that you have unleashed a data-destroying monster?
Hmmm? And what of those others who would/could be able and emboldened to tell you the obverse/converse/reverse ...... they have created what they thought was a data-destroying monster only to realise what is unleashed too, is the neatest utility ever?
If one has any sense at all, makes sure they be playing on your team, and for your side, for they can rightly be thought of as a Messi Equivalence.
A while ago, I was on a Netbackup update to a newer solution.
The situation was this:
- dude was in charge for the last 15 years, alone and unmanaged
- no documentation to speak of
- he was using a method which is a NO-NO: the pre and post scripts installed on EVERY 300 clients
- those pre post scripts were gigantic 5000+ shell lines, hostnames hardcoded, therefore a new system = update of the script
- basically, the backup was triggered by Netbackup, who was doing only scheduling and receiving the data. All the rest was handled client side
- he never used any Netbackup plugin AT ALL
- he was doing insane things like shutting down DBs if a secret naming convention was used in the policy (remember scripts are on servers and policy is on Netbackup). So in case you setup the wrong naming convention, shit happens.
Of course the dude left and our project was to cleanup this huge mess and upgrade Netbackup. Of cause, the guys knew an update in this state would not be possible without breaking everything. He even told me "good luck on his last day ! Nice."
It took time but we finally did it. To hell with the cathedral !
Once got blamed by the client for half trashing a system. Detailed examination of the suspected script which I had not written, revealed that a certain programmer had used the Unix internal Field Separator reserved word environment variable IFS, thinking that he could shorten Image File Server to the same moniker. A nghtly clear down consisting of rm -r /$IFS wiped out a lot of the server before it was discovered. The programmer who had been given root access for emergency overnight support purposes then tried to prevent accidental over-writing of important data by a restore job by renaming tar to tar.dont. I think they took away his root access after that. Trying to panic hack failed jobs at 3am against a deadline causes chaos the day after.
This didn't happen to me, but I know the person it happened to rather well. I'm not sure quite what went wrong, but the end result was that a script got run in the process of a backup which at some point did something like
rm -rf /$DIR_TO_REMOVE and, of course,
DIR_TO_REMOVE was not set, and the
-u option was probably not thought of (and probably did not exist in the shell then).
This happened on what I am sure was an 11/750 running BSD 4.2 or 4.3 with one or more Fuji Eagles. These were ... quite slow machines. And the person concerned was in the machine room next to the machine, because that's where you had to be to change tapes during backups.
With great presence of mind they realised what they needed to do: pull the power from the machine or, more precisely, the disk, immediately (it may be that in fact all they did was toggle the write-protect switch on the drive, but I like to imagine the fantastic noise old machines made when they lost power and the drives spun down), so that most of the actual writes to the disk did not happen.
Of course the machine would not boot and there was then a saga involving working out how to cold-boot an 11/750, which I am fairly sure involved understanding how to get it to come up off its DECTape (TU58) drive and then getting enough of BSD to run (from tape? or a spare drive? not sure). But once that was done, of course almost everything was still there on the disk: except that (after suitable incantations of
fsck and mucking around with
fsdb) a lot of files had lost their names and now lived in
/lost+found in the way that important files usually do.
All this was done from a paper console. I have to confess that when I've used a paper console I've been glad that
vi would fall-back to open mode (ie to
ex): the person who did this despised, and still does despise,
vi and will use nothing but
ed (and, of course, emacs, I mean, everyone uses emacs, right?): this attitude is probably why they succeeded in putting the machine back together at all: it's rather unlikely that something as vast and overcomplicated as
vi would have been present on whatever minimal Unix they got up to recover the system.
and will use nothing but ed (and, of course, emacs, I mean, everyone uses emacs, right?): this attitude is probably why they succeeded in putting the machine back together at all: it's rather unlikely that something as vast and overcomplicated as vi would have been present on whatever minimal Unix they got up to recover the system.
Erm... I wouldn't put emacs and minimal in the same sentence.
Biting the hand that feeds IT © 1998–2020