back to article C'mon SPARCky, it's just an admin utility update. What could possibly go wrong?

Hey hey hey, it's Monday! The new week is but a caffeinated beverage away. Come join us in celebrating another Register reader's flirtation with career-ending disaster with a morning dose of Who, Me? It was the late 1980s, and our contributor, fresh out of university, had inexplicably landed a job as system administrator in a …

  1. werdsmith Silver badge

    Who in this trade has not felt that sinking feeling followed by cold sweat when you realise that you've done something you shouldn't.?

    Me more than once. But thankfully so far, never a clear my desk and get my coat one.

    1. jake Silver badge
      Pint

      As usual ...

      ... two kinds of people. Those who admit to it, and liars.

      A little experience doesn't make brain farts and/or fat fingering go away completely, but it helps.

      Beers all around. We've earned it.

      1. Sir Runcible Spoon
        Black Helicopters

        Re: As usual ...

        I've been trying to train a new guy on a (fairly) complex security system, when the powers that be decided to give him full admin access.

        So he pings me for some advice on how to do a particular thing, to which I duly advise him on the most sensible approach - one he didn't happen to agree with.

        Since he was now on his own, I simply told him that if he breaks it, he fixes it - it's the only way to learn. He has tried to lie to me several times about having broken the damned thing, claiming something else happened that caused the error - little knowing that I edited the logging script to create a duplicate on a remote server he doesn't have access to. I don't even need to trawl the log file, I just need to perform a diff to see what he's been up to and then tried to cover his tracks on. Silly twat might as well just send me a report :)

        He still hasn't worked out how I know. Even if he does discover my edit, he would still have to discover the slight modification to a standard cron job that replaces my code every night :D

        1. jake Silver badge
          Pint

          Re: As usual ...

          See my reply to your other post. And have another pint, it's 5 o'clock somewhere.

        2. werdsmith Silver badge

          Re: As usual ...

          It's actually only the people with the permissions and ability to do the work that usually end up with the problem, because they are the ones with their heads above the parapet.

          It's really important for IT departments to look after people who have such genuine accidents and there should be no consequence for them as long as they own up and alert everyone immediately. Because trying to keep something quiet can make a small problem huge as time goes by.

          But well done to the author of the article. Well recovered.

          My incident, under pressure was to repoint an ODBC DSN in Windows. It was a while back and I repointed it OK, but there are two.

        3. Doctor Syntax Silver badge

          Re: As usual ...

          "He still hasn't worked out how I know. Even if he does discover my edit, he would still have to discover the slight modification to a standard cron job that replaces my code every night "

          Are you sure he doesn't know about el Reg? If he doesn't, by the time he's worked out what you've done he'll be up to speed.

          1. Sir Runcible Spoon

            Re: As usual ...

            I'm pretty certain he doesn't read El Reg, so that was my thinking exactly. If he can work out what I've done and undo it without breaking anything else, then he no longer needs to be watched like a hawk.

            However, his current thought processes are still more focused on preventing anyone finding out about his botch-ups so I'm expecting the heat death of the Universe to occur first. It would never occur to him to put himself in my position and then work out what *he* would do in such circumstances, because it would require him to acknowledge his own role in the play in order to view it objectively.

            1. vogon00

              Re: As usual ...

              Oh, that last paragraph is just 'class' - beautifully constructed!

              We've all had a colleague like that at one point or other. Mine was quite a while ago and the ignoramus was so far up is own chuff that everything he did was brilliant, everything anyone else did was flawed.

              The empirical evidence was the reverse, and everyone apart from him knew it. I'm glad to report that he did get caught out in the end, for the same reason (syslogs and command audit trails were forwarded to a third machine overseeing the entire shooting-match).

              1. Sir Runcible Spoon
                Pint

                Re: As usual ...

                "Oh, that last paragraph is just 'class'"

                Thanks, although coming from a Vogon I'd be wise to not let your praise go to my head ;)

            2. werdsmith Silver badge

              Re: As usual ...

              Maybe he does read Register. That original post of yours has a single downvote.

              1. Sir Runcible Spoon
                Joke

                Re: As usual ...

                There, you see, you broke the rule and now it's 2!

            3. Intractable Potsherd

              Re: As usual ...

              @Sir Runcible Spoon: "... it would require him to acknowledge his own role in the play in order to view it objectively."

              I have my faults, but denying my own agency isn't one of them. Those who can't see their part in the world are little better than stoats (lightly grilled and served on a bun, of course).*

              *One of my favourite Douglas Adams concepts - I still remember being reduced to a giggling puddle on the floor the first (and second, and third, and....) time I read it!

      2. Version 1.0 Silver badge

        Re: two kinds of people ...

        There are users for have lost all their data with no backups; and there are users who are going to...

        1. jake Silver badge

          Re: two kinds of people ...

          The third type are those who have observed other people lose all their data without backups, and so resolve not to have that problem. And to date, I haven't. I've got all my personal stuff dating back to the 1960s. The most I've lost has been a couple hours here and there ... but always at an inopportune time, of course.

          Proper backups are a vital part of properly running any computerized system. However, I can make a case for simply having multiple copies (off site is good, cloud maybe not so much) of all your important personal files being all that's needed for the average single-user/family home system. The OS can be reinstalled, your pictures and personal correspondence (etc.) cannot.

      3. Andrew Moore

        Re: As usual ...

        Only ever did it once. After that, I just move the files to a temporary folder and if everything still appeared to work, delete the temporary folder after a couple of days.

        1. Michael Wojcik Silver badge

          Re: As usual ...

          Like many people, I'm sure, I used to have "rm" aliased to do just this: move the target(s) to a "~/.undelete" directory, from which old files were cleaned after a couple of weeks by a cron job. In fact I still have this arrangement for many of my Linux and UNIX accounts, though not, I see, for Cygwin. I guess it's been so long since I accidentally deleted the wrong file that I've never gotten around to setting it up on my Windows machines. I should do that...

          I had another alias ("rrm", for "real rm"), which bypassed this, for situations where moving the targets under my home directory wasn't viable. I was always very careful before using it, though. Generally I started with an "echo rrm ..." first to verify that the globbing result was what I expected; then recall the line, delete the "echo", and hit enter.

    2. chivo243 Silver badge

      That "OH Shit" moment, followed by fsck fsck fsck! and possibly a bloody spot on the desk where your forehead went BOOM!

      1. Uncle Slacky Silver badge

        Also known as the "ohnosecond": https://en.wiktionary.org/wiki/ohnosecond

      2. Mark 85

        That "OH Shit" moment, followed by fsck fsck fsck! and possibly a bloody spot on the desk where your forehead went BOOM!

        Every admin has the scar on the forehead. It's a rite of passage and badge of honor.

    3. Baldrickk

      Only once for me... Build task on a networked file system, unset environment variable in a different script than the one I was editing.

      There was a "rm -rf $PATHVAR/*" line. That went... as well as you might expect. It only wiped out half of the file system before I stopped it...

      Luckily, backups were taken of the filesystem, and were restored, and the offending line is now protected by guard statements, so it won't happen again. Unfortunately, said script, for which the only modifications I have made are the guard statements themselves is now referred to as the "<Baldrickk> script" - sigh.

      1. whitepines
        Coat

        Oohh...in the running for replacing the venerable "Molly Guard" lingo from hacker days of old? In the domain of scripts and screwups?

        1. Baldrickk

          They're just standard guards - on function/script entry, check variables and if any are not set, throw a big fat wobbly and refuse to work until someone fixes them.

      2. Soruk
        Facepalm

        I'm reminded of an occasion at work when we did an out-of-hours upgrade to our CRM system several years ago. Some unfortunate engineer forgot that the out-of-the-box setting for the payment platform on the Windows component was to point at the test instance, which doesn't actually take any payments. This needs to be manually edited to point to the live instance when the software package is installed or upgraded. This is usually done, but is forgotten from time to time. This particular occasion was just the first time, but the one that bit us in the bum rather hard.

        Unfortunately, this oversight wasn't discovered for a few days (may have been a couple of weeks, I can't remember exactly), until it was noticed a handful of live customers weren't being billed.

        The practical upshot of this is I modified the start script of the main Linux package to query the config file on the Windows box and refuse to start if the setting was wrong. This has saved our collective arses a number of times since.

    4. whitepines
      Flame

      My favourite was when a little-known OS bug silently trashed the encryption headers of every single disk in a very critical production file server. The worst part is that the machine kept churning away for some weeks afterward, and the bug was found during a routine early morning restart.

      I was the one that executed the routine restart. Then found out none of the unlock keys would work...

      Very sweaty palms, and an indescribable sinking feeling as some of my own rather critical work was on the machine. About 60% of the data was recovered from backup, the rest was just gone. Backup schedules to removable media were increased in frequency and disk to disk backup in the same server was banned. To this day the org in question doesn't consider anything actually backed up until it's been written to tape, reread with matching checksums, and sent off site. Anything with a spindle is verboten for use in backing anything up.

      Took a long time to regain any trust in said OS after that fiasco. Icon for status of data. Still gives me a twinge of something not entirely unlike PTSD thinking about it.

      1. KarMann Silver badge
        Headmaster

        You do know that tapes have spindles too, right? They're the bits that spin up the reels.

        1. whitepines
          Happy

          The cartridge doesn't have spindles though. It only has reels.

          1. TSM
            Paris Hilton

            So you can use a tape cartridge for your backup, as long as you don't actually put it in the drive? I'm not certain that this is helpful.

            1. AIBailey

              Simply place the backup tape next to the drive array, and wait for some kind of data-osmosis to occur.

              Of course, verifying the backup will be difficult, but I'm sure it wouldn't be the first time that backup integrity checks have basically consisted of crossing your fingers and hoping really hard.

              1. Anonymous Coward
                Anonymous Coward

                Our company backed-up to tape overnight, every working day. Tapes rotated on a strict schedule providing daily, weekly and monthly recoverable positions.

                It was some time before it was realised that the drive wasn't working.....

          2. 's water music

            The cartridge doesn't have spindles though. It only has reels.

            So are you saying I just need to crack teh disk enclosure, remove the platters and my removable drive backup is golden?

      2. MCPicoli

        Blanket bans aren't healthy

        Simply banning "something with a spindle" from being used in backups isn't a good practice. It's just a measure out of fear and misunderstanding.

        I know, we get traumatised and scarred from past mistakes and disasters. Been there, done (and suffered) that.

        Understanding each backup medium and realising its strengths and limitations and planning accordingly is much better!

        I do not think this is your case since you've already stated following many other good practices, however once someone starts putting too much faith - without constant reasoning about "why" - in some well established procedure, disaster follows.

        1. whitepines

          Re: Blanket bans aren't healthy

          Yes. that's true (and I had condensed the ban down to a simple phrase for comedic effect) but each time active media (spinning rust drives, SSDs, even early R/W optical systems) have been evaluated they've come up fairly short on a number of key characteristics.

          The retention criteria is fairly simple: if you assume only one copy of the data has survived a disaster, potentially after sitting on a shelf for the past decade (retention of static data, data may not be continually rewritten) do you want to hope that all of the delicate (static, EMP, power stability and corrosion sensitive) electronics integral to accessing that data have also remained intact? Or would you rather have a DR plan that basically states "if the drive is bad, use a spare"?

          Sure, you can compensate by duplicating the active media in multiple locations, but then your costs start to spiral uncontrollably compared to good old fashioned cold magnetic/optical storage in a secure off-site location or two.

          I've had stacks of SSDs just up and die over the years. Many of them would "work" until powered off for an extended period -- powering them up reveals no data. Same with hard drives -- if the platters didn't outright stick, the electronics tended to be unreliable.

    5. Stuart Castle Silver badge

      Thankfully, (and thanks largely to the amount of fuckups I’ve had to deal with from others), I’ve not actually done really bad stuff to anyone else but myself. We have shared computers , and had a problem with users logging onto them, then pissing off for hours, leaving the computer unavailable to others. My boss wanted a solution. Preferably one that cost as little as possible. I wrote a small screensaver that when it was activated, did a forced shutdown and restart. Then, while testing it, I realised I hadn’t saved anything. So, I lost several hours work.

      For the second, I need to explain something. Because of some partition format problems we had with our install of windows, we had a bootable disk that when run, wiped the partition sector of the internal hard drive. I booted my work pc up one day, not realising that not only was this floppy in the floppy drive, but my machine was (rather unusually) set to boot from drive a.. I realised as I lost several year’s work, not all of which was backed up.

    6. Michael Wojcik Silver badge

      Been there, done that, got the t-shirt.

  2. GlenP Silver badge

    As soon as I noticed rm -r * in the text I could see what was coming!

    I've narrowly avoided that one, but back in the late 80s I was working for an Apricot dealer supporting a mixture of IBMs, clones and Apricots. The problem was the latter had the HD as drive A not drive C. A couple of times I went to do a high level format of a floppy and started formatting the HD instead.

    Fortunately it was easy to spot and stop, Norton Undelete was effective provided you knew the first letters of the filenames and as the software installed was our own I could figure it out.

    1. Headley_Grange Silver badge

      Same here - saw the rm and knew what was coming.

      I have a non-critical but annoying problem on my Mac where an app crashes occasionally due to a problem with directory access. The Dev isn't really interested in fixing it so after putting up with it for a while I decided to try to get to the bottom of it. It relates to log files which the app periodically clears out with an rm. My first thought was to try to change the location of the log files so there wouldn't be an access problems. My second thought was "never f**k with an rm command.", and I left it alone. Better the odd bit of annoyance than lots of unexpected disc space.

      1. Anonymous Custard
        Headmaster

        I think we all knew as soon as the command was quoted.

        Sadly that still doesn't mean it doesn't happen on an all too frequent basis anyway, despite such dire warnings...

        1. Doctor Syntax Silver badge

          We all know what not to do. The problem is that sometimes our fingers think they know better.

    2. Sir Runcible Spoon

      This is why I always prepare my rm statements in a text file and always, always, use an absolute for the directory.

      1. phuzz Silver badge

        I usually start to type rm -r /blah/blah, then realise what I'm doing and put a 'z' at the start, so it reads zrm -r ..., so even if I accidental hit enter, no harm will befall me. Hopefully.

        Of course, the other day I ran an rsync with the --dryrun flag. Those paying attention will notice that it really should have been --dry-run. Fortunately it gave me a syntax error instead of running.

        1. GrumpenKraut
          FAIL

          Try echo: echo rm -rf /${hope_this_variable_is_set}

          Btw. I have done more damage using mv. Move several files into directory blah: mv foo blah, mv bar blah, etc. And then I realize blah is NOT a directory. So I repeatedly overwrote the file(!) blah. Well, you still have the last file... Me ------>

          1. Sir Runcible Spoon

            I copied a directory the other day and forgot the /. at the end of the destination path - fortunately I realised straight away that I'd just created a single file from the rather large directory that I was about to prune due to high disk usage on that partition :)

          2. Grooke

            I was today years old when I learned you could move to a regular file. I will try to use this newfound power responsible. Thank you.

    3. big_D Silver badge

      I had an Xi, now that was a great little machine.

      10MB hard drive, GUI, C interpreter (!!), C compiler, BASIC compiler, dBase, Multiplan, WordStar and still 5MB of space for data.

      The A:/C: thing nearly caught me out a couple of times, but I never did. On the other hand, Norton Undelete was a tool no serious PC user was without back then. That and Xtree.

      1. GlenP Silver badge

        Overall the Apricot architecture was better than the IBM I believe, but the PC clones started flooding the market and became the only option.

        1. TimMaher Silver badge

          Also, the way their keyboards butted up against the chassis and the phone extension.

          So elegant they could be mid-period Apple.

      2. Anonymous South African Coward Bronze badge

        Oooh, XTree Gold... another name from the past...

        1. Tim99 Silver badge
          Windows

          Some of us preferred Norton Commander. I still use its descendants including the *NIX Midnight Commander.

          1. Zippy´s Sausage Factory

            My favourite is Double Commander, which while it's rather "rough and ready" compared to some of the other NC descendants, has at least the advantage of being the same across Windows, Mac and Linux, which when you have to work across multiple operating systems on a daily basis saves some daily wear and tear on the grey matter.

          2. jake Silver badge

            I too use mc, on both Linux and BSD. It's a useful tool, and a lot more powerful that it looks on first glance. Recommended.

        2. big_D Silver badge

          My favourite was Directory Opus on the Amiga. It is still around, but doesn't wake those memories somehow.

      3. Who's Nicked My Wireless Mouse?

        "Norton Undelete was a tool no serious PC user was without back then"

        So true! I never went anywhere without my complete set of Norton Utils back in the day.

        Happy to say those very same disks are still in the same box and have been resting for many a year. I darn't throw them out, you never know....

    4. Pascal Monett Silver badge

      Ah, Norton Undelete

      Those were the days of truly useful and powerful little tools that have certainly saved dozens, if not hundreds, of hides from the roast.

      And mine too, once or twice, I must confess.

    5. Muscleguy
      Boffin

      I have been the wielder of the Norton disc in the labs I've worked in and at home. Called in to undelete things after people unwittingly held down the shift key to select things to delete which included the item they had selected before. All get dragged to the Trash and it told to Empty.

      I got my users pretty well trained to do NOTHING until Undelete could get working. I had the disc because expecting IT to help in a timely manner and on a Mac was a pipe dream.

      I'm still not entirely sure how I ended up in the role. Being interested I think and being bothered enough to get informed. I also have a problem solving brain which likes solving problems, other peoples? fine.

      You have been warned. I have learned that sometimes people don't want their problems sorted. Their problems are their crutch and their excuse.

      1. Sir Runcible Spoon
        Thumb Up

        "You have been warned. I have learned that sometimes people don't want their problems sorted. Their problems are their crutch and their excuse."

        This is a true pearl of wisdom and deserves many upvotes.

        I'm embarrassed to say it took me many years to learn this about my useless, idiotic, in-laws. All those wasted years and effort on trying to help them sort out their problems, only to wonder why they would undermine my efforts the moment my back was turned, or simply develop new problems with which to fuck up their life.

        Trouble is, one of them's now dead, the other is in a home with dementia, and I'm *still* sorting out their shit. Still, their ability to create new problems is now limited.

        1. Sir Runcible Spoon
          Mushroom

          I made a terrible mistake in that last sentence this morning. I have since been told we have a letter from the OPG regarding a complaint from one of my wife's siblings that we are mismanaging my mother-in-laws money.

          Let that be a lesson to all who would tempt fate!

          1. Intractable Potsherd

            Sorry to hear that - it is a very uncomfortable feeling. The odd thing is, the sibling that complained will be really surprised that you don't act towards them as you used to, and get upset when you refuse invitations to family gatherings, etc.

            They'll also scream like hell when you say, "Fine, you deal with the finances from now."

            /bitter experience mode

            1. Sir Runcible Spoon

              "They'll also scream like hell when you say, "Fine, you deal with the finances from now." "

              That has seriously crossed my mind. I've spent countless unpaid hours on all the admin, not to mention cleaning and fixing up their house so it can be sold to pay health care costs etc. and all the sibling seems to do is create situations that can be twisted to suit her antagonistic viewpoint - which she then uses in a smear campaign.

              Thankfully we saw this coming and have already appointed a solicitor to help us out.

    6. Sykowasp

      Amazing that the default for rm wasn't to query when in interactive mode, and you are doing recursive/force or are root in a 'sensitive directory'.

      "Are you sure you wish to recursively remove all files in / as root? Y/N"

      But then people would run 'rm -rf --do-not-ask *' or something instead.

      1. GloomyTrousers
        Mushroom

        --no-preserve-root

        rm does, now, protect you from recursive deletes of your root FS. You have to pass it "--no-preserve-root" in order for total destruction.

        https://www.gnu.org/software/coreutils/manual/html_node/Treating-_002f-specially.html

        I wonder how many stressed-sysadmin-hours that feature has saved :-)

      2. YTC#1

        Solaris 10 introduced a "that is / , don't be silly" built in override to reduce the numotyness.

        1. jake Silver badge

          "Solaris 10 introduced a "that is / , don't be silly" built in override"

          And the skiddyadmins immediately started using the --DoItAnyway switch out of reflex, if they didn't alias it to save the typing.

          Trying to protect humans from themselves never works as intended.

          1. Anne Hunny Mouse

            Solaris 10

            It won't stop you running rm * in /usr/bin

            Thankfully it will still let you run most of the OS commands you have just deleted...

            ... Giving you a chance to retrieve most them via FTP (symbolic links don't work well) from it's sister server on another site until you can do a restore from tape.

            Good test to show your backups for the backup server work

    7. hmv

      Without intending to anger the gods and invoke Murphey's law, I've never fallen victim to the 'rm -rf' 'accident'. Probably because of tales like this.

      Plenty of other mistakes.

      In one case (not my mistake although I ended up fixing it), half a BIND master zone file got removed through a vi accident; unfortunately the result was valid and half the names disappeared! Fortunately I knew the name and address of the backup server, so we were able to rollback that single file. I've shown a strange obsession with filesystem snapshots ever since :)

      1. mdubash

        Same here: the horror stories around R-M have made me very wary.

        On the other hand, I once gaily installed an experimental data compression NLM on my NetWare server, as the 20MB full height Seagate was filling up, and I couldn't afford the the squillions it would have cost to buy a bigger one.

        Guess what? It was very good at compression, decompression, not at all. Lost everything before 1991. I still have an ARC file somewhere that nothing has ever been able to decompress...

  3. jmch Silver badge

    Haven't we all?

    "Let the person who has ever forgotten where they were before running something horrifically destructive cast the first stone..."

    Exactly this! As long as we clear up the smoking wreckage, then rather than casting stones we can have a good laugh about it later!

    1. doublelayer Silver badge

      Re: Haven't we all?

      Well, I haven't yet done an rm that killed my own files, although I can take the blame for someone else running an rm that lost them their files. When at university, I was helping a younger student in the second programming course who was getting disk quota errors. The reason was that their code was not working very well and had been dumping a lot of cores, which had not been deleted. We used a couple tools that produced different core filenames, so "rm core.*" wasn't enough. So, of course, I spoke the required command for the user: "rm, then a space, then asterisk core dot asterisk". Unfortunately, another space got entered, and not in a good place. And now I no longer read code or commands aloud.

      For the record, I had some extra access to things and I was able to get the student a relatively recent copy of their work. I'm not sure how they felt about me after all was said and done, but as this was the due date for the assignment, I believe there was much panic from everyone.

  4. Chris G

    I am guessing the article title refers to this : https://youtu.be/2xIHVrTc0ps a great short.

  5. Korev Silver badge
    Coat

    It's On Call, looks like it's time for someone to SPARC some bad puns...

    1. Korev Silver badge
      Coat

      They'd be the favoured Sun...

    2. Stumpy

      I guess I could RISC one or two.

      1. Korev Silver badge
        Coat

        I'm rooting for you

    3. Sir Runcible Spoon
      Coat

      Do that sudo that you do so well

    4. Sgt_Oddball
      Linux

      That's just asking for...

      Someone to su you for that

    5. Anonymous South African Coward Bronze badge

      Bah, just rm all the bad puns

    6. Roger Kynaston

      #!/bin/bash

      me=`/bin/who`

      echo "Who $me?"

  6. defiler

    Breaking servers with routine maintenance

    Well I'd just like to say that recent patching on some Windows 2016 DC Hyper-V hosts has left them utterly unusable. We have some servers that decided either not to see the SAN after patching, or just not to play nicely in the cluster(s). Even better, uninstalling the most recent patches has left them in a boot loop.

    Now I have to go onsite to the datacentres to rebuild them from cold because they're too stupid to acknowledge that the internal USB ports aren't actually removable drives. Luckily I need to do some storage work imminently so I'll roll the whole lot together, but I've burned most of a week already on this pish.

    I think it's time to petition for new servers.

    1. Anonymous South African Coward Bronze badge

      Re: Breaking servers with routine maintenance

      I think it's time to petition for new servers.

      Non-Windows, hopefully...

      I also don't like to update my hosts. They're running without issues, so the only thing that can get updated, is windows defender.

      And before I get lambasted for not applying security updates : I don't trust Microsoft and their gung-ho approach to windowsupdates either, see what've happened to Suck10

      1. defiler

        Re: Breaking servers with routine maintenance

        I have to say, I tend to try to update the hosts. Not as often as the VMs because it's a real headache, but periodically.

    2. Jr4162

      Re: Breaking servers with routine maintenance

      Or petition for VMware esx and relevant enterprise licences. Keep a old system with the free version to test vmware related updates before installing them..

      1. defiler

        Re: Breaking servers with routine maintenance

        We dumped VMware over the cost / complexity of the licenses. There are a couple of things it does better, though...

  7. ColinPa

    Just following instructions....

    I find it amazing that installation instructions (from software companies that should know better) start with "go into su mode.... and issue the following instruction". The developers have their own sandbox machine, ("just spin up a cloud instance for the test so if you break it, it doesnt matter") and have clearly never been near a production environment.

    1. Korev Silver badge
      Windows

      Re: Just following instructions....

      At the time of the story, being able to "spin up a cloud instance" was decades away. I guess back then each Dev would have had multiple physical machines to work on.

      I agree that the current vogue for a Dev spinning up a VM somewhere and then assuming a "clean build" and/or making big changes to libraries config etc. is a serious PITA!

      1. jake Silver badge

        Re: Just following instructions....

        We were spinning up VMs for dev work on mainframes in the early 1970s. This whole "cloud" thing is just another name for service bureaus renting out centralized computing.

        1. Julz

          Re: Just following instructions....

          So true. Have an up vote.

        2. Doctor Syntax Silver badge

          Re: Just following instructions....

          This whole "cloud" thing is just another name for service bureaus renting out centralized computing.

          I'm waiting for someone to reinvent a PC (no, not a phone) to be followed by on-prem servers to be followed yet again by yet another name for bureau service.

          1. Julz

            Re: Just following instructions....

            What goes around... I guess we will have to wait for the realisation of the true cost of using someone else's computer to outweigh the trendiness of using the current hip thing. Perhaps when the whole edifice collapses when a hapless fool innocently changes something which is replicated endlessly around the globe by aggressive dependency policies it might be cause for thought. Or perhaps not.

            1. jelabarre59

              Re: Just following instructions....

              Perhaps when the whole edifice collapses when a hapless fool innocently changes something which is replicated endlessly around the globe by aggressive dependency policies it might be cause for thought

              And that gives me a possible solution for a problem in a SF story of mine, how a particular attack vector hits computers worldwide when originally meant for localized targets.

              1. jake Silver badge

                Re: Just following instructions....

                jelabarre59, might want to look up the Morris worm.

                1. J. Cook Silver badge

                  Re: Just following instructions....

                  BEHOLD THE MIGHTY RTM WORM, DESTROYER OF THE VERSION .5 INTERNET

                  1. jake Silver badge

                    Re: Just following instructions....

                    Not the 0.5 Internet. That would have been a pre OSI model IMP version (See 1976's BBN Report 1822), which most agree culminated in Internet 1.0 and included the use of TIPs ... The Morris Worm ran on the later TCP/IP Internet which went "live" on January 1, 1983. This can be considered Internet 2.0, and was fairly mature by 1988.

                2. jelabarre59

                  Re: Just following instructions....

                  I'm familiar with the Morris worm, my story/series has an *accidental* release (the particular malware was meant to be used on very specific targets).

        3. Korev Silver badge
          Pint

          Re: Just following instructions....

          Thanks for the correction Jake -->

        4. Joseba4242

          Re: Just following instructions....

          You had a mainframe where devs could spin up VMs, managed databases, message buses etc though a self-service API?

          Impressive!

          1. jake Silver badge

            Re: Just following instructions....

            Yes, but the names have been changed to protect the guilty.

    2. Daedalus

      Re: Just following instructions....

      As noted in "Zen and the Art of Motorcycle Maintenance", instructions are generally written by the least valuable member of the team, one who can easily be spared from regular work, and indeed one whose absence will improve the progress and quality of said work.

      1. Daedalus

        Re: Just following instructions....

        Or, as per a recent Reddit, the instructions haven't been updated because the third-party writer is being mushroom-managed by the actual vendor of the device for which instructions are needed.

        1. J. Cook Silver badge

          Re: Just following instructions....

          Spooky; just read that post too....

  8. Olivier2553

    Around the same time in France

    Sparc and Solaris was so new that most of the time Sun engineers would not know where the things had migrated (many configuration and stuff had changed location between SunOS and Solaris). Calling the maintenance was always a gamble.

    1. jake Silver badge

      Re: Around the same time in France

      The only real difference (other than GUI) between SunOS and Solaris was the difference between BSD and SysV ... not all that difficult to move between for most UNIX hackers of the era.

      1. Alan Brown Silver badge

        Re: Around the same time in France

        " not all that difficult to move between for most UNIX hackers of the era."

        The hardest part was understanding how rc.[S23456] worked - once past that the rest was easy.

        1. Anonymous Coward
          Anonymous Coward

          Re: Around the same time in France

          <shudder> drop to rc.s, then enter "sync" three times, then hit the power, according to a yellowed piece of paper attached to an old SparcStation that I recall from long ago.

  9. jake Silver badge

    "Stir in the fact that changing a 1Gb drive back in the day was a two or three-person job"

    Nah. By the time the SPARCs arrived, that was an easy one person job. The first SPARCs were Sun-4 models, with full height CDC Wren 5.25 SCSI2 drives. Not lightweight by modern standards, but hardly the 8 inch monsters that CDC had made a couple years earlier. Some early machines even had 3.5 inch half height drives. I can't remember if they made it past Pilot build and Alpha test prior to Sun actually labeling the machines SPARC though ... lot of water under the ol' bridge.

    1. Korev Silver badge
      Joke

      Wouldn't the Alpha test have been DEC and not Sun...

    2. TonyR

      Don't forget the 1G IPI drives. With Power Supply they are a challenge for one person.

      1. jake Silver badge

        "Don't forget the 1G IPI drives."

        How many of those did Sun sell with SPARC systems? Maybe a dozen IPI drives total?

        Also, for a lot of the Sun SPARC product line, the "hard drive" was really just a steel chassis that a power supply and one or more HDDs fit into. For example, my own pre-SPARC 3/470 "Pegasus" is what they called a "dual pedestal deskside system" ... One box holds the VMEbus+cards and it's power supply, a full height 5.25 CDC Wren drive, a floppy drive, and a tape drive. The other, connected via SCSI cables, contains four more CDC Wren drives and a couple power supplies. The individual drives are easy for one person to swap out, as are the (hot pluggable!) redundant power supplies ... but the entire contraption takes a couple people to lift. The 19" rack mount systems were built the same way. If they were mounted properly, you could swap out dead drives single handedly.

        1. Down not across

          If they were mounted properly, you could swap out dead drives single handedly.

          Unlike the 7914 drives for my HP3000. Lifting one up and trying to also line it with the pins on the sliding rails, is not a single person job. Well, it can be done, I've done it.

          It would be amusing to see look on H&S person's face if that was being in today's corporate environment.

          1. jake Silver badge

            Those 7900 drives were early '80s, while SPARC was late '80s ... amazing how much shrinkage storage experienced in those few years, no?

            I wonder how far behind we'd be if today's elfin safety nazis had come into existence in that decade. Probably still stuck with the VLB bus and half height 5.25 drives ... with silicon being built on 6" (`150mm to you euro-types) cookies.

            1. Down not across

              Yes they were. I have 7970E and 2 7914Rs in the rack. Just found copy of the install manual...

              The disc drive weighs approximately 67kg (148lb); more than one person may be required to install it in the subsystem cabinet

              May? Well I guess they were right. i have no idea how I managed to wrangle the drive into the cabinet, let alone align it with the pins in the rails.

              If I recall correctly the drive is 132MB or so. Hmm, wonder if the 3k stil works, or more to the point if the tapes are stll bootable. Inrush current for the storage rack is quite impressive. Always flickered lights. Once spun up, it was safe to turn on the 3K itself. Joys of single phase in home environment.

              Yes, and it didn't take long for storage to shring to half height and then down to 3.5".

              I reckon you're right that we'd be nowhere close to where we are now.

            2. pirxhh

              I distinctly remember a hard drive urging the user to wear safety shoes (with steel toe caps) during installation. Cannot remember the brand; may have been HP.

  10. Will Godfrey Silver badge
    Thumb Up

    Beneficial Disasters

    This kind of (eventual) benefit is always good, and works right across all trades and industries. Fortunately I learned early on that when faced with a non-fault you should go and quietly discuss what to put in the report with the 'victim'.

  11. Anonymous Coward
    Anonymous Coward

    Really?

    Its not a directory and it wasn't me but. I remember well sitting at my desk eating my morning porridge and imbibing caffeine when a systems bod complains his web management interface is down. After a quick dig around it looks like internal DNS is down. Or at least its not resolving.

    Check the DNS box, and there's no root entry. In order to tidy up DNS and make it standards compliant a new tech removed it, after getting the OK from the change management group.

    ITIL, Schmitil.

    Anon to protect the not so innocent.

  12. Korev Silver badge
    Pint

    Favours

    The piggy bank of favours was cracked open: Scott and the engineer had a pretty good working relationship stemming from Scott's willingness to overlook an occasional miss of the contracted call-out times

    Reminds me of the time one of our lab Macs needed to have its OS upgraded by a field engineer so they could install the latest instrument software; the hard disc in the Mac then chose a seriously bad time to die! The scientists were furious but luckily I was around to point out that it was very bad luck on the Field Engineer's part and it could have easily been me. I managed to find a new spare compatible disc from our spares cupboard and saved the poor guy's bacon.

    I really should have got a few of these from him... -->

  13. Anonymous Coward
    Anonymous Coward

    I just got into trouble with terminals

    I can't recall the exact details (I'm either getting old or my brain wilfully suppressed it), but I recall having the odd problem with the Wyse terminal I had with me to work on mainly the pizzabox SUN server variety we used for creating firewalls. One particular instance was driving across the UK for 6 hours or so and then return the same day, only to hear the day after that it didn't work - if I recall correctly switching off the terminal before disconnecting it would send a STOP signal to the box so it would basically sit there suspended until someone hooked up again and told it to continue.

    I can't for the life of me imagine why that was implemented, but it taught me never to walk offsite without testing a client machine's access to the Net.

    1. Phil O'Sophical Silver badge

      Re: I just got into trouble with terminals

      I can't for the life of me imagine why that was implemented,

      At the time it was pretty commonplace for a <BREAK> signal from the console terminal to perform the sort of non-maskable interrupt that we associate with Ctrl-Alt-Del today. It was the "stop everything and give me back control" command, useful if the server was hard hung. <BREAK> was sent by having the RS232 transmit line held at 0 (low) for a longish period (IIRC it was between 1 and 2 character times).

      The problem was that many terminals would stop sending data and take the transmit line to a low value for a time when powered off, and the server saw that as a <BREAK> signal. I think that there was eventually a patch for Sun systems so that you could disable the <BREAK> response on the console.

      1. StevieD

        Re: I just got into trouble with terminals

        Ahh, bloody decservers had a nasty habit of doing just that on reboot.

        A bit of a ball-ache when used as terminal servers for a rack of U60s which would immediately lock up unless you remembered to disconnect them first. Of course then you had the problem of remembering to reconnect them all again afterwards, or be faced with a dead console when you really Really REALLY needed one !

  14. swampdog

    You can get spectacular results with a cheap keyboard

    #mv *.foo /bar

    ..mutates into..

    #mv *>foo /bar

  15. IT's getting kinda boring

    I have a few:

    1) Developer managed to do an rm -rf * from root, just not *as* root. Still causes a lot of damage. After we recovered the machine he very helpfully showed us what he had typed - and then ran the damned command again!

    2) Took a call from someone in the US. Their database was down. I asked them to go to a directory and found it was empty. After a few minutes of questioning, I finally managed to get to the bottom of the issue. Apparently the filesystem was full, and the person had done an rm -rf on the directory structure that held the database! To make matters worse, the last backup was over a week old.

    3) Had a developer decide to delete glibc on a Linux machine so they could install a new one. Hint. Linux really needs it!

    4) Had someone do a chown -R from root as root. That was an amusing one to try and resolve.

    And my personal cock-up.....

    Problems with a disk under VMWare. I deleted disk 5 instead of scsi id 5. In my defence, it was on a cluster with shared disks and at the end of a very long and stressful day :)

    1. Jay 2

      Been there with a dev doing something strange to glibc libs. Fortunately not a complete disaster, but a bit annoying to have to sort out. At least another time when a dev suggested they wanted an updated glibc (and helpfully pointed to a tarball available online) another dev pointed out that wouldn't be happening for several reasons.

      1. Korev Silver badge

        We seem to get a lot of software in from academics or startups running a bleeding edge version of Fedora or Ubuntu which then won't run on our RHEL/Centos systems because of the "old" glibc versions running.

        1. Anonymous Coward
          Anonymous Coward

          "What do you mean you don't like dependencies on nightlies?"

          Cue application that has never run stable on anything but the developer's computer. (The developer who refused to ever really try the application, or sit down with people who did, and insisted on reproducible bug reports before doing anything about crashes.)

          Anon, just in case.

    2. hmv

      Disk work is ... entertaining.

      I once had a mirrored volume set up with the intention that one half would be in one data centre ("A") and the other half in the other data centre ("B"). Worked fine except for the one mirror where I'd managed to set up both LUNs in the same data centre; and to maximise the stupidity, both were in the other data ("B") centre with the server in "A".

  16. Sir Runcible Spoon
    Alert

    It wasn't me!

    This isn't a new story as such, but on topic. It also wasn't me what did it, but a colleague, honest.

    Back in the day, working for a large ISP in the UK that still ran the *.co.uk name servers. Said colleague was adding a new customer domain to the file using vi. Since the customer domain began with an 'n' he was about half-way down in the zone file. All seemed well after saving and exiting the file, but reports started to (at first) trickle in that some domains were unavailable in the DNS.

    The trickle turned to a deluge, and it seems that my colleague managed to 'delete to end of file, save' the zone file.

    Obviously we restored from a backup, but it still took over 4 hours for all the domains to trickle down through the secondary servers to update.

    1. jake Silver badge

      Re: It wasn't me!

      The existence of DNS is a very good reason for not passing root access about willy-nilly. Not even when somewhat sanitized with su ... It's absolutely astonishing how many people think they know better than the admin who set it up, and thus can improve a system that has been running flawlessly for a year or more.

    2. Anonymous Coward
      Anonymous Coward

      Re: It wasn't me!

      To be honest, I never got on with vi. Allegedly I should have been able to get used to emacs (being a former BRIEF user), but that wasn't a default install on most variants of Unix, whereas vi was.

      Hats off to vi power users!

      1. ma1010

        Re: It wasn't me!

        You had vi? Luxury! We had to write COBOL programs in EDLIN. (Really, truly, for a class - except before long I got a copy of WordStar and never looked back.)

        1. jake Silver badge

          Re: It wasn't me!

          I wrote a simple screen editor for MS-DOS 0.96 in EDLIN, creating a bunch of text files full of preudo-assembler commands that, when concatinated together and redirected into DEBUG, produced the editor as a .COM file. No need for linking with .COM files. Why? Curiosity, of course. I was learning the internals of a new OS program loader.

          Primitive? Absolutely! But try to remember that DOS was tiny ... It ran from 160K floppies. Most early machines didn't have hard-drives, and if they did they were probably only 5 megs. DOS was mostly useless as a program loader, until ver. 3.1 enabled the networking hooks ... But it was a hell of a lot better than dragging card-decks to the glass house and waiting days for the result!

          As a side-note, I had already been using UNIX for several years (BSD on DEC, mostly) when the IBM-PC came out. We looked at each other & asked "What is IBM thinking? Thank gawd/ess it can't do networking!" ... the rest, as they say, is history.

        2. el_oscuro

          Re: It wasn't me!

          From the way-back machine, I had a Commodore 64 and as a teenager had no money for the $35 macro assembler cartridge. So instead, I wrote a very simple one in C64 BASIC that supported JMP labels and such. But since the C64 didn't have a built-in editor, I also had to write one in BASIC. And it was based on EDLIN.

    3. Sykowasp

      Re: It wasn't me!

      That calls for an addDomain.sh script that requires no manual editing of the actual domain file (obviously these days the domain file would be generated from a database of domains and blah blah blah).

      It's amazing how poor most practices were in the past (and still now, unfortunately). Critical files simply should not be hand-edited, and indeed should have machine validation prior to deployment.

      1. Down not across

        Re: It wasn't me!

        Critical files simply should not be hand-edited, and indeed should have machine validation prior to deployment.

        If editing by hand, at least use RCS or some other version control. Enforce its use by alias/wrapper script if necessary.

  17. chivo243 Silver badge

    different colors

    call me new, but I use different colored terminal windows...

    1. GlenP Silver badge

      Re: different colors

      That wasn't an option back in the day, the choice of colours was generally green, green or green!

      1. Fred Flintstone Gold badge

        Re: different colors

        I had an orange one :)

        1. OveS

          Re: different colors

          Ambra?

      2. jake Silver badge

        Re: different colors

        I had a few amber ones along the way.

      3. Kubla Cant

        Re: different colors

        Back in the day? There was no green back in the day! Black and white if you were lucky, grey and slightly lighter grey if you weren't.

        1. jake Silver badge

          Re: different colors

          The very early Tectronix displays were green ... 1972ish. My 1976 ADM-3A has the green phosphor option. There was also a white phosphor option.

          1. Anonymous Coward
            Anonymous Coward

            Re: different colors

            I had a college class taught by the inventor of the plasma screen. Originally, it was orange or black (off). He told us that the media was going nuts over it - "you could make a TV out of this!" to which he responded, "if you really like orange..."

          2. Down not across

            Re: different colors

            My 1976 ADM-3A has the green phosphor option. There was also a white phosphor option.

            My LSI ADM-3A has white phosphor. And uppercase only (which makes me doubt if it was 3 and not 3A) Luckily it at least has 24 lines instead of 12. It also has CTRL in the more convenient for unix location. Never did come across one with the Tektronix 4014 option.

            1. Will Godfrey Silver badge
              Happy

              Re: different colors

              I managed to get a nice green TV display on a 'scope hung on the end of the IF strip (Line OP was dead). it was watchable... just - contrast was... interesting!

    2. Anonymous South African Coward Bronze badge

      Re: different colors

      call me new, but I use different colored terminal windows...

      I also had a lucky recovery from doing The Wrong Bloody Thing on the wrong RDP session.

      Nowadays every server I remote in to has a tiled bitmap set displaying the server name and also which site it is. Helps a lot if you have got a lot of RDP sessions open.

  18. Alan Brown Silver badge

    I had a script

    famous last words

    This one walked the /home directory, removing no-longer-existant users with rm -r

    It also found .. and went merrily rm -r ing down that.

    Ewps.

  19. ortunk

    He realized he had to rewrite the boot block as well

    Haha been there with a already EOLed 5 years ago HP-UX :)

    Not a pretty date I tell you

    1. Anonymous Coward
      Anonymous Coward

      Re: He realized he had to rewrite the boot block as well

      Ah, HP-UX. My fondest memories of HP-UX stem from when I was on a project in Singapore and we had to interface with Windows. We also had a Red Hat box in what we were building, which gave me the idea to see if there was a HP-UX version of Samba as a possible solution.

      After some digging I found a HP authorised CIFS variant, but our office at Orchard Road didn't have enough bandwidth (one line divided over tens of developers tends saturate quickly), so I had to get a cab back to my apartment where this new fangled invention called WiFi (then delivered over a card I had to insert into my laptop) gave me about 10x the speed I had in the office.

      Those were interesting days :)

    2. Phil O'Sophical Silver badge

      Re: He realized he had to rewrite the boot block as well

      I remember a colleague who accidentally deleted the on-disk kernel image file on a running Solaris box. Didn't have any immediate effect, the system still had the file open for paging even if the directory entry was gone, but he had a tense few moments hunting round the systems on the network for one with exactly the same OS version. He then FTPed it back to the boot directory, and after a reboot at a convenient time he heaved a sigh of relief when it rebooted OK.

      1. ibmalone

        Re: He realized he had to rewrite the boot block as well

        Wonder if there's some way to re-create the directory entry in that situation, the data's still on disk. Maybe could be achieved by hard-linking the process's file handle? (I'll admit the copy-known-good-version approach sounds less risky.)

      2. jake Silver badge

        Re: He realized he had to rewrite the boot block as well

        A friend somehow managed to delete /vmunix on a largish Sun system back in 1988 ... fortunately on a Saturday afternoon. Also fortunately, he had enough sense to call me (and tell me the truth!) before he started "fixing it". All was well come Monday morning, and I had free beer for the rest of the month.

  20. Chloe Cresswell Silver badge

    rm -rf *

    Never wiped out a filesystem with rm.

    Now, the time I cloned a new fresh HD over the boss' actual HD instead of the other way around? Yeah...

    1. Anonymous South African Coward Bronze badge

      Re: rm -rf *

      Disk clone...

      Back in the days I was stationed at a toll plaza. Things were fairly quiet most of the time, and I was playing around with OS/2 Warp v4 (yes, long ago).

      Then a dev asked me to clone one HDD of a lane over and configure it for another lane (the clone).

      It was IDE drives, set one as master and one as slave (I was never a fan of cable select) and run Norton Ghost. I also made sure the correct HDD (original lane HDD) is master and the drive with OS/2 on it was the slave.

      A quick <tappity><tap> and away goes Norton Ghost and clones the HDD.

      Removed the master HDD, set the slave to master, reboot... and up comes OS/2

      Suffice to say I made another clone from another lane, but it was a success, and I fixed my own boo-boo.

      Lesson learnt. If possible, use different sized HDD's.

      Also, with Clonezilla you can identify the HDD a bit better as it gives you a more verbose description of the HDD you want to clone to/from.

      I miss those gay, carefree days without spam, cryptomalware and shouty bosses/clients.

      1. Chloe Cresswell Silver badge

        Re: rm -rf *

        This wasn't even that new. This was a clone of a dos/windows using laplink.

        160MB scsi to a *gasp* 420mb ide.

        I got the remote and local sides mixed up. On the same screen. Where one was empty. Whoops!

      2. Anonymous Custard

        Re: rm -rf *

        I miss those gay, carefree days without spam, cryptomalware and shouty bosses/clients.

        Wait, there were days without shouty bosses and clients?!?!?!

  21. baud
    FAIL

    A few years ago I was in charge of keeping an eye on an ELK stack which was centralizing logs across a few dozens of services. Thing was a bloody waste of processing power and network packets most of the time, but I still had to keep it running because the logs were used to draw shiny graphs for the PHBs. We only kept 7 days worth of logs, with a cron job cleaning up every night. Then one day I had a request to keep the logs of a certain date on hand for one of the dev, so I stopped the cron job. So a few days later, I had to clean up the extra logs, so I connect to elasticsearch, the storage layer, and run for the first time a delete command. When it took way too long and when I saw errors starting to appear on the dashboard, I knew I've made a mistake. I realized I had deleted everything in elasticsearch, since I hadn't fully read the documentation (I would have learned that the delete command is not search-then-delete, it's just a delete everything, for example). So I fessed to my boss, warned the dev we were having a few teething issue this day and did a rollback from last day's backup (I was happy that day that the automatic backup on AWS were enabled). In the end we didn't lose much, just a few logs, that weren't that important anyway.

  22. Anonymous Coward
    Anonymous Coward

    Sunsite

    I remember the sysadmin (Ornan the Destroyer) at Sunsite in the 80s was a total hack. Used to pull all sorts of shenanigans. (x-googly-eyes appearing everywhere, running mud servers on the science systems.) Good times.

  23. Anonymous Coward
    Anonymous Coward

    Haha many moons ago I was planning a system upgrade. Being the diligent type that I was I restored a fully functional replica of the live system in the lab from backup, went back to my desk and added a static route back to the lab environment rather than the real environment on my work laptop. As I only expected this to take an hour or so I didnt bother to make the route persistent. Queue a windows update whilst I popped out for a crafty smoke break, followed by me uninstalling the software package from the live devices instead of the lab devices.

    I look back with some fondness now given that it was 15 years ago, it was a very different feeling at the time!

  24. calmeilles

    Haven't we all…

    On one occasion a $software_vendor was in to do an upgrade. Upgrade successful… then clean-up with rm -rf * as root in /

    It was a very clever system that replicated data very fast (for the day) to a partner machine for failover in case of failure. Sadly the deletes were replicated just as fast.

    Okay, so the good news was that we had backups of the data and we knew they were good because they were tested. But the OS needed a bare metals install first. The OS was SCO Unix. It came on floppy disks. Boot from the first and see the message…

    "Please insert disk 2 of 96"

    It was a long night.

  25. IJD

    PDP11/44 at uni in the early 1980s which (as a PhD student) I somehow ended up (sort of) running because nobody else would. Users who had to access image capture hardware needed superuser rights (or whatever this was called, it was a *long* time ago). Some users had multiple IDs. One user -- no, definitely not me -- who had hogged too much disk space (2 x 20MB drives shared between a dozen or so users!) decided to save his data to tape and clear up his disk areas, DEL [*,*]*.*;* deleted all files for all users (not just him) including the OS (RSX-11?) -- or at least, deleted all the file allocation tables, the data was still there, but of course no OS commands worked any more since the commands ran from disk. And there were no proper backups, tape drives were mainly used for data storage, people were supposed to save programs on 8" floppies but rarely did. Months of work for multiple postgrads circled the digital drain...

    Luckily the system debugger just happened to be loaded (in 64k of RAM!), and could talk to the disk and printer, and could print out (to the line printer) the absolute block address, owner and filename for each disk block. Said idiot user had to sit there and manually reallocate all blocks by hand to rebuild the FATs, took him most of a weekend, sweating all the time because if anything happened or the debugger crashed there was no way back except reformatting the disks, reinstalling the OS, and losing all the data.

  26. pakman
    FAIL

    I was told about a variant of this many years back, where an inattentive sysadmin typed 'chmod -R 777' as root while in /. I wasn't there at the time, or involved in sysadmin, but I would have loved to have heard what they did next.....

    1. GoldCoaster

      I had someone do this on a box I was admining (not me this time).

      Luckily I had a similar un-wrecked system. I captured the correct perms from the unborked machine and wrote a script to re-permission the broken ones, had to boot of tape first as the setuid perm had been removed from /bin/login and so you couldn't actually login.

    2. herman

      I once fixed a 777 screw up by making a virtual machine of the same version of the screwed up system on another machine, then I think I used a tar command to copy the permissions of everything over.

  27. John H Woods Silver badge

    Sticky shift key...

    if the shift key doesn't bounce up quickly enough you've got rm -f *>o instead of rm -f *.o --- instead of deleting all your intermediate compiler files, you've deleted everything and got a new single character filename containing a single character ( ^J).

  28. Gnoitall
    Facepalm

    Not as severe as "rm -rf", but I rebooted the production server thinking it was the test server.

    I was at the test server console. But I was reusing a terminal window that I didn't realize was SSH'd to the prod server. First inkling I had a problem is when the test server didn't shut down its display. SSH closing was the second. The dismayed phone calls from the prod server's user community was the last.

    1. ibmalone

      At least it was a reboot.

      Accidentally shutting down the home directory server.

      Good news, have access to server room. Bad news, that's irrelevant as it's a VM and you don't have hypervisor access. Fortunately the person who did, although on holiday, a. was checking email, b. had five minutes to connect and press start on it.

      (Other good news, things are set up so this doesn't cause data loss, although it does cause a certain amount of thumb twiddling.)

  29. Anonymous Coward
    Anonymous Coward

    YI'm afraid your support contract does not cover this, please provide a PO.

    After doing two years of support at a VAR in the '90s and spending many a Friday afternoon on the phone with some hapless department manager that thought they would just clean up the chaff and inadvertently hitting enter when this is on the line root@doomedserver:/#rm -rf * on the production server, I've always been able to manage a pwd before running the poison rm as root. Now, if you want to talk about borking remote networks, that I have stories about... I don't always remember to do a "restart in 5" Luckily, never at a site that required a plane or helicopter ride to.

  30. Anonymous Coward
    Anonymous Coward

    That's nothing...

    There was once a Solaris bug which, when running

    "zpool create" on an existing zpool would wipe it and start fresh.

    And I was building two large clusters with a lot of zpools.

    And I'd built dev and was in the process of building prod, with the same script and same names.

    And I got asked to make a small change in dev just as I was about to run the script on prod.

    And you can guess the rest.

    But I did fess up, Sun (as they were then) admitted it was a bug as well as my own stupid fault, and I got to stick around to fix it, which took a few days.

  31. Anonymous Coward
    Anonymous Coward

    Another one

    Two node Oracle RAC Solaris Cluster. Indirectly serving tens of thousands of users.

    Junior admin decides he needs to boot one of the nodes (during the day, it was that kind of place).

    What he didn't realise was the quorum disk was offline (that kind of place).

    So, before boot cluster has 2/3 votes and therefore quorum.

    As soon as one node is goes offline cluster has 1/3 votes, and therefore not quorum.

    What happens next is obvious to anyone who knows cluster but is termed a "split brain" or at least a mechanism to protect against one.

    Remaining node "takes one for the team" and shoots itself in the head.

    Database offline, many many many people disadvantaged and a specific procedure needing to be followed to bring it back online.

  32. herman

    I messed up my own server this way and eventually made a virtual machine of the same Linux version to copy some system files back.

  33. Wexford

    A while back I performed an in-place upgrade on a client's information management system. They were a not-for-profit and had invested little in IT; consequently the "server" was a random desktop host A with an at-capacity HDD, and file store itself physically stored on another hard drive in host B, UNC accessed in the software. Host B's HDD was also close to full.

    Upon completing the upgrade, I wrote up a report along with nice drawings showing their system architecture, and a strongly worded caution that the file store was NOT on the "server" and to make sure host B was always switched on and DO NOT TOUCH THE FILE STORE FOLDER.

    Whilst on call over xmas some time later, I received a panicked phone call from the client. He'd run out of space on the server, needed to store some things and found another computer with a big hard drive they decided to UNC map to...once he deleted some "file store" folder which obviously wasn't important, so there'd be enough space to use. You can imagine why he'd called our emergency number.

    I gave him the bad news and suggested he go to his backups, after which I would assist in fixing the app side. The good news was, he DID have backups, it was xmas and nobody was there except him so there'd been no changes in several days, and he managed to recover everything. I then go to work fixing a bunch of in-app links - scriptable - and went back to watching the cricket while thinking about my on-call pay ticking over as the script ran.

    I don't know if he was ever required to explain the bill my company would have charged (or, more likely, a big chunk of time deleted from our balance of paid-for-in-advance support hours).

  34. dl1

    I have an unpleasant Pavlovian response to wildcards or variables on the same line as an rm.

    Now that I am old, I always do a recursive rm in two stages. Move the files to be removed in to a deletion directory with a nice clear, unfancy name (no spaces etc), Then delete that single directory completely explicitly, using tab to complete the name.

    1. whitepines
      Alert

      It sounded great right up until the tab completion. More than once I've tried to tab complete a key command (stopping just in time) and not noticed the tab didn't actually complete.

      So, for instance (greatly simplified, this tends to be a problem in cluttered directories versus simple example ones):

      $ ls -R

      delete:

      rubbish

      morerubbish

      demonstration:

      cooldemo

      de:

      source.c

      key.dat

      $ rm -rf de [TAB]

      oops....

  35. staringatclouds

    A very long time ago in my first job, fresh out of university, I was hired as a software engineer and I was writing code in Fortran on a PDP11/34 in an electronics lab for a bunch of engineers.

    I deleted all of my new bosses program files, just his files, in the first week of starting work.

    There were no backups, despite the PDP having removable hard drives no one had ever thought to make 'A' backup let alone regular backups.

    There were, however printouts, fortunately these were the early days of computers and disks were only a few megabytes in size, so while the deleted files were important they weren't so massive they couldn't be typed back in.

    So I spent the next week, laboriously typing them back in, making sure they compiled & produced expected results.

    Then I sorted out a backup regime.

    After I'd done all this, my boss informed me that not all of his programs had compiled before & he was pleasantly surprised they did now.

  36. Luiz Abdala
    Windows

    Trash Bin is a saviour of fat fingers.

    I love those OSes where the login prompt can be dully customized with RED BOLD fonts... so you apply them to ROOT.

    While in Windows, 99.8% of users run the thing as admins, out of the box,.where even the simplest DELETE keystroke can bork your system... so MS invented (or hell, they stole, most probably) the Trash Bin.

    The trash bin saved me more than once, must admit.

    1. Anonymous Coward
      Anonymous Coward

      Re: Trash Bin is a saviour of fat fingers.

      they stole, most probably) the Trash Bin.

      Considering I recall seeing a trashcan on a Mac back in the Win3.1 era, I'm guessing "stole".

      EDIT: Yep. Wikipedia says Apple Lisa had a "Wastebasket" 1982, MS-DOS 6 (1991 at earliest) had "delete sentry", then Win95 actually had a Recycle Bin. Definitely "stolen".

  37. tip pc Silver badge
    Mushroom

    adding a Cisco ASA back into the cluster circa 8.5

    Simple job, one of the cluster members was unwell and hung on a reboot and was like that for months, me being dutiful decided to fix it,

    visited the DC, disconnected its sync and data cables, rebooted it, rebooted into an older image, binned the broken image, copied across the same as on the live one & booted it,it came up, i checked the to make sure it would not become live when i rejoined the cluster, connected the sync and data cables and observed on the console the formerly broken ASA copying its out of date config to the running ASA & remained in back up mode.

    Massive Oh $%^&, luckily i had taken a copy of the original live ASA config before i started work and plugged in its serial cable, the console session was still live and i pasted the former live config in. Write mem and the config from the once broken ASA over wrote it again. I then pulled the other ASA's sync and data cables, pasted in the config to the live box again, wr mem, consoled into the other ASA, wr erase and reload, booted and brought it back into the cluster.

    Never seen that before or since, it was the kind of unbelievable event you'd expect from a newb. Luckily i resolved the issue quickly enough that no one noticed, turned out the ASA was broken before the cluster was moved from another DC and was installed faulty. The new DC had the ports rearranged so it was connectivity was definitely broken as the old config had the wrong port assignments.

  38. UKHobo

    not file system this time

    It was only last week that I hit f5 on:

    delete from table

    where field=x

    ..but had accidentally only highlighted the delete line in SQL Server Management Studio. 200,000 rows were gone in the blink of an eye.

    Luckily this was only on a dev database and our dev environment is such a mess that no-one really uses it. Anyone who noticed anything whilst I frantically scripted the data back into existence probably accepted the weirdness as normal.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like