back to article When a deleted primary device file only takes 20 mins out of your maintenance window, but a whole year off your lifespan

The weekend has been deleted. Pause a moment before you start your own workplace odyssey and enjoy another's trip to Oopsville courtesy of Who, Me? Today's story comes from "Jim", and concerns the time he and a colleague were performing an all-night hardware, OS, database and application upgrade of a daily newspaper's …

  1. Korev Silver badge

    Good creative thinking, I know it's only Monday morning but have one on me -->

    1. A.P. Veening Silver badge

      I think they needed something a wee bit stronger after that.

      Where is the whisk(e)y icon?

      1. Rich 11

        Where is the whisk(e)y icon?

        I drank it. Sorry. Got a bit desperate on a Monday morning.

    2. Fruit and Nutcase Silver badge

      "and accidentally deleted the Sybase master device file while it was running."

      I did something similar on a DB/2 for OS/2 installation on a customer site ironically whilst doing a roll-forward recovery after a database corruption. Some quick thinking got me out of a hole - connected to another of the customer's sites and copied the relevant file off that server, as the builds were identical. The connection... Lan Netview Managment Utilities to remote server at 9600baud

      1. chasil

        Another way to do this

        There might have been a less traumatic way of accomplishing this.

        As I remember, Sybase was able to mirror device files, and the free verson ( was capable of doing this.

        Assuming a mirror operation could be launched that could read the unlinked file, Sybase itself would copy the device file to a new location.

        Oracle has the ability to "alter database rename file," and Sybase device file mirroring was the way to accomplish the same thing.

        1. MjjASE1

          Re: Another way to do this

          Or try a bcp out of the usual key tables - sysdatabases sysdevices sysusages sylogins, then boot server with blank master device option in runfile and do the bcp recovery method

      2. tony trolle

        DB/2 for OS/2.....

        did you delete the log file to cover your tracks at the second site?

        was the second site a outer London location?

        want to guess the man hours spend looking for a hacker...(assuming it was

    3. druck Silver badge

      There are much better ways of recovering a deleted open file, than crashing the system and hoping fsck recovers it. I did it the other day on Linux when I deleted an open log file, it wasn't very important but I got it back anyway. I believe even on Solaris the file handle will be under /proc/<pid>/<fd>, and a quick google shows the fsdb command will help in this situation.

      1. BenDwire Silver badge

        Impending doom

        Assuming this is a warning from the gods about impending doom, I'm going to have a play with debugfs to see how this is done (hopefully before I need to use it in anger!)

      2. big_D Silver badge

        This is assuming that a) there is such a thing as Google, when you do this b) you think about looking in /prod/<pid> etc.

        In the middle of the night, in a time before Google or other major search engines, you were left to your own devices and what you could remember from reading the f'ing manual.

        1. Chairman of the Bored

          Speaking about the f*ing manual...

          I think there is some sort of physical law that says something to the effect of, "If you're in deep ship when working on a production system, you will not find the required manual in your documentation wall. Nor will the requisite man pages be installed. After the disaster, you will find the resources with ease."

          1. Anonymous Coward
            Anonymous Coward

            Re: Speaking about the f*ing manual...

            I had this issue many years ago, we knew where the manual was, in the locked filing cabinet in the tech support room. Luckily we had a hardware engineer on the team complete with toolkit. He removed the lock from the cabinet in about 30 seconds. The filing cabinet was not quite as secure once we replaced it but hey ho

            1. Alan Brown Silver badge

              Re: Speaking about the f*ing manual...

              Filing cabinet locks are only effective at stopping the drawers rolling open during earthquakes (and sometimes not even then)

            2. NorthIowan

              Re: Speaking about the f*ing manual...

              Some locks are not very good.

              In an emergency, see if you have a key that fits. I unlocked a minivan at church with my pickups key from the same manufacturer.

              In the US, most camper storage compartments use the same key. One of my camper locks froze up so I replaced them all to get a new key. Will keep the old key in case someone else at a campground loses theirs.

              1. DougMac

                Re: Speaking about the f*ing manual...

                It makes me chuckle to see the colo people in other cages oh so carefully label and string up the server keys, paired with each server, to make sure they don't get the mixed up...

                When Dell only changes out the lock/key type every 5-10 years or so and I have a bucket of keys that would fit any of their servers depending on how far down deep you want to dig for it.

              2. Luiz Abdala

                Re: Speaking about the f*ing manual...

                Taking this off-topic and running with it, my mom opened an old Ford car that looked identical to hers, *completely by accident*. It was the same make and model, parked right next to hers.

                I happened to notice because that car was running on fumes, while we had just filled ours.

                And the tires were bald. And it had 100.000 more miles on the odometer. And the radio was not set on her station.

                The kicker: her keys could open both, but the guy arrived soon enough to catch us closing his car, and his set of keys could not open ours.

                1. DCFusor

                  Re: Speaking about the f*ing manual...

                  I've had the same thing happen with a '66 chevy staton wagon, when it was around 4 years old (giving away my age) as a teenager.

                  Got half a mile in the wrong car, noticed some things weren't quite right, came back to the grocery store to see the other fellow trying to start my (dad's) car and failing....

                  At least back in the day in pastoral USA, it was only an occasion for some laughter.

                  Nowadays, it'd be charged as auto theft or something.

                  Fans of Deviant Olam (he uses that name on youtube and conferences) know that virtually all Ford crown victoria police cars, and hence taxis, are keyed the same....and you can buy that key on ebay.

            3. Anonymous C0ward

              Re: Speaking about the f*ing manual...

              Was it labelled "Beware of the Leopard"?

            4. Olivier2553

              Re: Speaking about the f*ing manual...

              I don't lock every racks, only those located in places where some public may be walking by, mostly to prevent accident. I made myself a bunch of all the matching keys: I have only 2 keys on the bunch and could open most of the racks in the country.

            5. GBE

              Re: Speaking about the f*ing manual...

              I had this issue many years ago, we knew where the manual was, in the locked filing cabinet in the tech support room.

              Typical office desk/file locks are almost always cheap wafer locks that are easy to pick — even if all you have to work with is a couple paper clips.

          2. tony trolle

            Re: Speaking about the f*ing manual...

            We found the manual, had missing pages as did the backup mini wall...lucky thing was i knew where a spare set was due to my night shift exploring practice.

            Few weeks later I found the old erratum instructions that said remove the pages....turn that erratum page over and it DID say replace with the new versions. And the difference ? two 50% bigger blank lines. ok...

        2. DJV Silver badge

          "you were left to your own devices"

          "devices" - hah, well played!

      3. AVee

        I was thinking the same thing. However, if you need 45 minutes to figure all that out it's risky. Anything can happen in that time causing more permanent damage. If you know how to do it on top of your head it surely is the better option, but if it's going to take time to figure out it quickly becomes scary...

      4. ovation1357

        I rember trying to use fsdb - not only did it come with a massive "here be dragons" warning but it also needed the user to have an intimate knowledge of the internal workings of UFS. Certainly not for the faint-hearted! :-/

        I certainly share the concern about the approach of deliberately crashing the system - it could very well have caused more problems that it solved. I'm going to guess that this was a farily early version of Solaris as the later versions (starting from Solaris 8 I think) had Journalling enabled which in essence meant that it would have logged the file deletion and automatically committed it on boot after the crash.

        Later versions of Solaris included the 'pfiles' command which could be used in a similar way to 'lsof' (which was considered a Liunx tool and only available as an unsupported extra on the Sun Freeware CD)..

        A tricky problem here for sure, and one which I think we're all experienced in our careers.

      5. Anonymous Coward
        Anonymous Coward

        fsdb :)

        fsdb... IMHO only to be used carefully when desperate. "With great power goes great responsibility (and poor documentation)"

      6. whbjr

        Not Then, You Couldn't

        In the days of Sun / Solaris, there was no /proc/anything at all, so no, this was not an option.

    4. Anonymous Coward
      Anonymous Coward

      At least it was an old enough system to be not running ZFS. No fsck there, ZFS is too perfect to need it...

      1. Anonymous Coward

        I remember the moment I realised that the way you check whether a ZFS thingy (pool? I forget) is good is to import it into the system which then checks it for consistency, in the kernel, where any kind of error is going to nuke the whole machine. Because, obviously, having any kind of userspace checker was beneath the people who wrote ZFS. So, here, I'm just going to run the equivalent of fsck on a pool I don't completely trust is not broken inside the kernel of the machine that's running <large financial institution>'s account database. Yes, of course I'm going to do that. Somewhere around then was when it became apparent that the reason to move to Linux was because some of the people involved in it had heard of the real world.

        1. katrinab Silver badge

          zpool scrub pool

          Then zpool status to see how the scrub is getting on.

          While it is doing it, it will show something like

          pool: pool

          state: ONLINE

          scan: scrub in progress since Tue Jul 7 18:42:43 2020

          520G scanned at 1.10G/s, 43.4G issued at 436M/s, 15.25T total

          0 repaired, 0.81% done, 0 days 03:28:36 to go



          pool ONLINE 0 0 0

          raidz2-0 ONLINE 0 0 0

          da1 ONLINE 0 0 0

          da2 ONLINE 0 0 0

          da3 ONLINE 0 0 0

          da4 ONLINE 0 0 0

          errors: No known data errors

          When it is finished it will hopefully show something like

          pool: pool

          state: ONLINE

          scan: scrub repaired 0 in 3 days 22:55:48 with 0 errors on Fri July 11 06:32:59 2020



          pool ONLINE 0 0 0

          raidz2-0 ONLINE 0 0 0

          da1 ONLINE 0 0 0

          da2 ONLINE 0 0 0

          da3 ONLINE 0 0 0

          da4 ONLINE 0 0 0

          errors: No known data errors

          1. Anonymous Coward
            Anonymous Coward

            zpool scrub pool

            Then zpool status to see how the scrub is getting on.

            Assuming that the system doesn't panic, or the scrub get refused, that is (I've had both happen).

            ZFS assumes that it's internal copy-on-write mechanism means that corruption is always detectable and recoverable, simply by tracking checksums & pointers. That is true, IF (and it's a big IF) the problem is just a ZFS operation that didn't complete properly.

            If the underlying problem is a disk going bad, or a driver that's corrupted data, there's no recovery mechanism. If the pool isn't consistent enough to be imported or scrubbed ZFS just reports "Nope, shan't.", and you've lost all access to all the data in the pool, even data which hasn't been damaged.

            It has many nice features, but regular snapshots sent to a remote system are an absolute necessity

          2. Anonymous Coward

            And that's the whole problem: the pool being scrubbed is online. I don't want that: I want a userspace program which I can point at the devices which make up an exported pool (or a pool which has never been imported on this machine, I don't just mean a pool I've just exported...), and which will go and check it, either for basic sanity (are all the devices even there? etc) or in detail, in the same way that zpool scrub does. I want that because I want to minimise the chance that nasty problems will cause the machine to crash, and that is a lot less likely to happen with userspace code than it is with code that runs in the kernel. I also want to be as sure as I can be that the pool is good before I import the thing, so I can back out whatever change I'm making before being half-committed to it. There is absolutely no reason why such a utility should not exist: it could probably even share most of its source code with zpool scrub. Indeed I know for a fact (because some of the papers on ZFS said so) that a fair amount of the early development of ZFS happened with the whole bloody filesystem in userspace.

            And I'm sure ZFS is more robust now than it was when I used it, but it certainly was the case a while ago that bad things could happen to pools which would cause awful results on the machines which imported them. I know this because I've watched it happen. That's why I want userspace checks: it's not academic nerdiness, it's because I've watched the smoke rising from the remains of some machine which ZFS bugs had just killed and been in the meetings where we decided to go back to UFS (or VxFS, I forget) as a result. And I worked for Sun, this was a personally embarrassing recommendation to have to make!

            Right up until the end Sun suffered from having too many very clever people with too little contact with reality.

  2. ColinPa

    Read and understand the instructions first

    I got called out to help a "production down" problem during an upgrade. I was trying to help over the phone, and not being able to see what was going on was a major problem.

    The instructions were clear.

    1 Delete the following files config1,config2 etc

    2 Recreate the system

    3 Enter the config data when asked.

    What could go wrong?

    I got called at step 3. "Where is the config info we have to enter?"

    "It is in config1"

    "You mean the file we just deleted?"

    They could not recover the file from the backups - because they were not authorised.

    We eventually found the data because someone has copied all of the config files into one place for education.

    We changed the instructions from "delete..." to "rename... " and added "step 0 - print out...".

    1. tip pc Silver badge

      Re: Read and understand the instructions first

      Surely step 1 was rename and also copy off box?

      1. DJV Silver badge

        Re: Read and understand the instructions first

        ...and also print it out multiple times and have a monk recreate a copy by hand using a quill pen and some gold leaf!

        1. Alan Brown Silver badge

          Re: Read and understand the instructions first

          Monks are only acceptable if they illuminate using rabbits and come from Antioch

    2. juice

      Re: Read and understand the instructions first

      It is mildly worrying how many things are resolved thanks to someone having a copy of the data sat in their personal filesystem.

      At one point, I had a minion sheepishly come up to me (while I was talking a new minion through their first day of the job, entertainingly/ironically) and sheepishly announce that they'd deleted something which Really Shouldn't Have Been Deleted.

      And while there were daily backups, the data was being constantly updated, so would require a fair amount of work to rebuild from the "last known good" state.

      Thankfully, someone had just that very morning cloned the data for testing purposes, so we were able to use that to restore 90% of missing data, leaving my very subdued minion with just an hour or two of hard graft to finish the cleanup.

      Fun, fun fun! Thankfully, it didn't scare the new minion off :)

      1. Marshalltown

        Re: Read and understand the instructions first

        I never had any issues with any unix or linux system but I worked for a company that were religiously faithful users of Micro**** products. These would not infrequently eat their own young, including the servers. The Boss would every so often insist that we "clean up" our hard drives, specifically of old project files. I never did so, and oddly he would always ask me for help in recovering lost or strayed project files. Once, an entire year's worth of work vanished including gigabytes worth of images which were critical (maps, property photos, ...). Unhappily in that insatnce, the only recoverable data was what I had cached on my system - and since these were "team" projects and other team members dutifully deleted everything they were told to, it was pretty ugly for awhile. The bad part was that somehow, the backups were corrupted as well. That really set some hearts beating. The boss never ever again advanced as company policy that closed project files be removed from machines of the individuals that created them.

  3. tip pc Silver badge

    Seems like a proper who, me

    The drama is palpable in this one, likely because I’ve been in a few (differing technologies each time) similar situations myself.

    Great save!!!

    1. BebopWeBop
      Thumb Up

      Re: Seems like a proper who, me

      And the relief can be sensed even this far, geographically and temporally from the incident.There but for the grace of... I suspect many readers will be thinking

      1. Will Godfrey Silver badge

        Re: Seems like a proper who, me

        Which reminds me, I was supposed to do a backup run yesterday

        1. A.P. Veening Silver badge

          Re: Seems like a proper who, me

          A backup isn't complete until it is also successfully restored.

          1. Doctor Syntax Silver badge

            Re: Seems like a proper who, me

            I discovered a client's system was set up to do a backup from the live system to the hot standby overnight. I also discovered that he rcp, ftp or whatever it was would be terminated if not complete by start of business next day. I also discovered that although it probably worked when first set up but by the time I came on the scene there was no way it could be completed overnight and probably hadn't been for months years.

            Fortunately there were also tape backups.

            1. grumpy-old-person

              Re: Seems like a proper who, me

              Many years ago when I was still young I travelled from Johannesburg to Windhoek to upgrade the operating system on an ICL SYSTEM 4, taking THREE copies of the necessary stuff on 9-track magnetic tapes.

              The mainframe could not read ANY of them!

              Eventually got a colleague to plead with a passenger about to board a Windhoek flight to nurse a removable disk pack (quite large in those days!) as hand luggage which I very gratefully accepted from hm on his arrival in Windhoek.

              Seems that the 'skew tape' used to align the heads on the tape drives was not well - but simply remedying this with a 'valid' skew tape was not an option as many years of the site's tapes would have become unreadable!

          2. donk1

            Re: Seems like a proper who, me

            Dave's Rule 1: You do not test backups, you test restores!

          3. Stuart Castle Silver badge

            Re: Seems like a proper who, me

            "A backup isn't complete until it is also successfully restored."

            I am not religious, but Amen to that.

            Too many people have fallen foul of assuming their backups are OK.. They need to be tested regularly. I've had to explain to quite a few people that their backup media is fallible, and has failed. Of course, working with students, I've also had to explain why it's important not to leave work to the last minute, but hey ho. Students left things to the last minute before I was born, and I suspect they'll still be doing so when I die.

            Even big corporations have fallen foul of bad backups. My old Computer Science lecturer at college liked to tell a story of a failure at a major bank (Nat west IIRC). They had an issue with their computer systems. They lost a lot of transactions, and when they went to restore the backup, they couldn't read it. There was a 24 period they couldn't restore and apparently had to cancel any transactions during that period. A potentially costly problem.

        2. Stevie

          Re: Seems like a proper who, me

          From a dictionary of Computer Terms, Datalink, Circa 1979: Backup: Something no-one has any time to do because of all the head crashes".

      2. Anonymous Coward
        Anonymous Coward

        Re: Seems like a proper who, me

        "There but for the grace of God go I."

  4. Sgt_Oddball

    Could be worse...

    I managed to forget which drives I had my OS installed on for my home server and happily nuked the raid array whilst finishing up adding all 12, 600gb drives I've bought (they're getting pricy secondhand now, not impressed).

    Thankfully, it was just windows 2012 r2 and I hadn't finished playing with it yet enough ot put anything sensitive.

    So it's now running Ubuntu server and couldn't be happier with only cli commands.

    1. BenDwire Silver badge

      Re: Could be worse...

      You can do just as many daft things with Ubuntu, but there is much less cllicking involved!!

  5. Pascal Monett Silver badge
    Thumb Up

    Good article, because it outlines the two types of goofs

    On the one hand, you've got the technician that knows the system, makes a mistake, analyzes the situation correctly, finds the loophole and re-establishes functionality without any major hiccup. Hair-raising to be sure, heavy implications for failure, but in the end his in-depth knowledge allowed him to gracefully recover from the error.

    Then, on the other hand, you've got the blithering idiot that knows just enough to make himself dangerous, has no idea of the consequences of his actions, and will be totally incapable of recovering anything.

    I know who I'd prefer working with.

    Good article.

    1. lglethal Silver badge

      Re: Good article, because it outlines the two types of goofs

      The moron, right? That way you've got someone to blame when the sh&t hits the speedily rotating device, and everyone will just assume its their fault...

      What do you mean I've been reading too much of the BOFH ?

  6. Doctor Syntax Silver badge

    Oxymoron alert

    "overconfident DBA"

    The first requirement of a DBA is paranoia.

    1. Anonymous Coward
      Anonymous Coward

      Re: Oxymoron alert

      Though this is true, I have been at the end of a paranoid rant from the DBA at a certain (no longer exists) insurance company. In the course of this he accidentally gave away that his paranoia was due to the company having provided totally inadequate disk space and backup capability.

      The next day I moved my car insurance to another company.

    2. Anonymous Coward
      Anonymous Coward

      Re: Oxymoron alert

      You do realise DBA stands for Dead Bloody Arrogant don't you?

      1. Juillen 1

        Re: Oxymoron alert

        Hard not to come across as arrogant when most of the people you talk to don't realise how absolutely wrong they are. :)

    3. Bruce Ordway

      Re: Oxymoron alert

      >> first requirement of a DBA is paranoia

      I learned how totally oblivious I was with my very first PC running MS-DOS.

      While cleaning up a directory, I was annoyed by two files persistently listed, "." and "..".

      I finally typed del .. & pressing enter...... nothing. It was a hard lesson.

      Luckily, back then, PC manufacturers provided 24 hour phone support.

      So I was able to recover the OS that night. Several days to reload everything else.

      Possibly due to that rocky beginning, I am an obsessive planner for disaster now.

      Even today, when VM's have rendered backups (mostly) obsolete , I continue to maintain multiple copies/locations.

      1. Dog11

        Re: Oxymoron alert

        "While cleaning up a directory, I was annoyed by two files persistently listed, "." and "..".

        I finally typed del .. & pressing enter...... nothing"

        I had a client who was a neatnik that did that. I had no idea it was even possible (surely there would be klaxons blaring and dire warnings, no?... no). We did recover, but it was a long time ago and my memory is droppiong bits. Probably with UNERASE and a good deal of puzzling what the first letter of each filename should be (MSDOS would replace that character with a placeholder to signify that the file was erased).

    4. Zippy´s Sausage Factory

      Re: Oxymoron alert

      I've noticed how SQL Server DBAs tend to be paranoid and quick-tempered, while Oracle DBAs are more laid-back. As opposed to the developers, which is totally the other way round.

      Why? No idea...

      1. DemeterLast

        Re: Oxymoron alert

        Oracle DBAs know that, if a problem occurs, the SOP is to explain to the money people that a check must be cut to your Oracle sales rep. I'm pretty sure that's Chapter 1, page 1 of the How to Be an Unleashed Dummy Oracle DBA in 24 Hours handbook.

        On the other hand, the first thing SQL Server does on installation is hurl a rock at your face and insult your mother. It will then proceed to convert all system DLLs to kiddy porn and notify the FBI.

  7. DrBobK


    A friend of mine who was doing his PhD and programming PCs in Prolog to do something to do with analysing people's understanding of skin diseases (shades of The Singing Detective) took it on himself to 'tidy up' a sparcstation 2 that was about to replace something ancient we'd been using as a fileserver and host for some early experiments in website design (this is 1990 or so). For some reason he decided (as root) to delete /dev as it seemed to be full of lots of useless empty files. Not a good idea.The machine was connected to a network and the console was running the SunOS 4.1 GUI with a terminal open. I did not know much more than my friend, but I did have my own Sun 3/60 and I'd been on a short course for scientists who had to deal with new-fangled workstation things. I have forgotten how I did it, but armed with my trusty SERC 'how to be a unix system admin' manual that came with the two day course I'd done, I managed to retrieve everything. In the land of the blind the one-eyed man is king.

    1. Antron Argaiv Silver badge
      Thumb Up

      Re: /dev

      'how to be a unix system admin'


      Why, when we got our SPARCstations, we could only *dream* of a manual. We were lucky to get all the pieces, and the mouse pad.

      Data General, ca. 1990. The decision had been made that the in-house MV machines and their in-house written CAD system were an expensive dead end for the engineering staff, so Suns and Viewlogic, it was. Got to name my own system, was my own sysadminnand it was visible from The Internet, because -- no nasties. Mr Morris and his worm, Canter and Siegel were all far in the future. Learn UNIX or sink, and learn, we did. Among other things, we found Usenet, and comp.os.minix

      Thanks for the memories!

      1. Stuart Castle Silver badge

        Re: /dev

        Re: "'how to be a unix system admin'


        Why, when we got our SPARCstations, we could only *dream* of a manual. We were lucky to get all the pieces, and the mouse pad."

        Mousepad? You were lucky.. We were given a sheet of tinfoil and we had to draw our own grid on it. For our younger readers, early optical mice (such as those used on early Sun workstations) required an aluminium mousepad that had a very fine grid of lines on it.

        1. Anonymous Coward
          Anonymous Coward

          Re: /dev

          Optical mouse? You were lucky, we used a real mouse, and your partner translated the annoyed squeaks into keystrokes!

          1. EVP

            Re: /dev

            Bah, we had to breed our own mice first. We were a bit anxious to see if the first brood would boot up. You see, the vendor was in rush to get their loaners back.

            1. Herby

              Re: /dev

              Four .....

      2. UCAP Silver badge

        Re: /dev


        Why, when we got our SPARCstations, we could only *dream* of a manual. We were lucky to get all the pieces, and the mouse pad.

        Former company (same one as my message below) received a brand shiny new DEC VAXstation for a project. Thing arrived on a pallet with loads of boxes, quite a few of which were documentation. All of the document ring binders were put on a shelf still in their shrink-wrap (we could not be bothered to unwrap them - too knackered after getting the workstation out and set up.

        A few weeks later the programmer using the system wanted to find out the VMS system call required to print a message on the console. Took four of us best part of an afternoon to solve that one, and we had to consult pretty much all of the manuals to do so (Manual X says "see Manual Y" which says "see Manual Z" which says "see Manual X" ...).

        I've hated the thought of VMS every since.

  8. MarkET


    Otherwise known as Microsoft SQL Server from 1993.

    Remember when it was delivered on two or three 5 1/4 diskettes for some flavour of OS/2...

    1. Tim99 Silver badge

      Re: Sybase

      I remember it as a joint venture from Sybase, Ashton-Tate, and Microsoft from 1989. It was possibly also a lesson on why companies should not partner with Microsoft. Although for Ashton-Tate their dreadful dBase IV was what probably finished them off.

  9. UCAP Silver badge

    / tmp

    Once, many decades ago, I was sysadmin for my company's three Sun-3 workstations (one diskless node, and two disked nodes). One morning, fairly early before my brain had fully booted, I decided to clean up the cruft in /tmp (it had got pretty full and the OS was complaining on occasion). So I loggedin as root and typed the immortal command "rm -rf / tmp". I then reaslised about the significant space in the command and hit "control-C" PDQ, but not quite Q enough to stop half of /user from having been deleted. I subsequently 'fessed to my manager and spent the rest of morning rebuilding the workstation.

    1. Chairman of the Bored

      Re: / tmp

      Aye! I've shot my toes off in much the same way.

      Tip: put a zero length file called -i in the root directory. It will force a rampaging rm -rf back to interactive mode. Then the trick to to not hit 'y' when prompted...

      It's a zero length file, but at least once has covered my entire posterior.

      1. logicalextreme

        Re: / tmp

        I like this idea a lot, but it's not working for me on Ubuntu 18.04 so I'm thinking the GNU utils have had some trickery added to prevent this sort of thing.

        1. Dan 55 Silver badge

          Re: / tmp

          The GNU utils should not wipe out / anyway unless you were to accidentally include --no-preserve-root as an option (keep cats away from the keyboard).

          1. logicalextreme

            Re: / tmp

            This seemed like a saner way of handling it to me. If it happens enough that it's the butt of a joke, that's probably reason enough to build a safeguard into the software.

        2. Anonymous Coward
          Anonymous Coward

          Re: / tmp

          Modern rm doesn't allow you to do rm -rf / without confirmation.

          1. Throatwarbler Mangrove Silver badge

            Re: / tmp

            "Modern rm doesn't allow you to do rm -rf / without confirmation."

            Insert ranting here about how kids today can't do anything without training wheels and how back in my day, Unix commands ran without confirmation because if you didn't know what you were doing you shouldn't be let in front of a computer and now everyone thinks they can use one and deserve protection from their own idiocy and September 1996 was the death of the Internet and why do people even use this new World Wide Web thing when gopher was just as good and nothing is as good as it used to be like it was when only furry-toothed geeks ruled the computer labs and don't even get me started on systemd and GNOME and who even uses vim instead of vi, linking /bin/vi to vim is heresy . . .

            Anyway, have some homebrew.

            1. Bruce Ordway

              Re: / tmp

              >> instead of vi.... heresy

              Oh yeah... vi, I was never very good with it.

              I still have my tri-fold command cheat-sheet tucked next to a copy of regular expressions.

        3. DS999 Silver badge

          Re: / tmp

          The '-i' thing would only work if you did "rm -rf *" in the root directory, it does nothing if you do "rm -rf /".

          1. logicalextreme

            Re: / tmp

            Ah, this probably explains my quick test (in a non-root directory) failing. For some reason it had never even occurred to me that filenames were a perfectly cromulent vector for command arguments (presumably the risk inversely correlates with the prescriptiveness of the argument order).

            1. Chairman of the Bored

              Re: / tmp

              Thanks, all, for the corrections on my -i fixation. I've been doing that since SunOS 2 and it definitely helped me way back then.

              In getting old enough to cargo cult MYSELF? Guess that explains all the gray hair!

  10. Anonymous South African Coward

    Way back in the 90's I did POS support for a couple of clients in Pretoria.

    One difficult client had all the bells and whistles - shiny new Novell 3.12 plus a couple of DOS workstations and a Windows 3.1 workstation for himself. And 120Mb tape drive.

    Laughably small in today's terms of Giga- and Terabytes. Anyway.

    He made a backup for the day, put it in the safe with the other backup tapes, locked the safe and went home for the weekend.

    Come Monday morning we received a frantic call from him - ne'er-do-wells happened during the weekend, they took the file server, workstations and safe (including the backup tapes). So he had nothing to fall back on, and he was due for a SARS (income tax) revenue. Ouch.

    1. Androgynous Cupboard Silver badge

      That happened to a mate a few years ago, also South African (although he was in Fulham at the time). Scrotes broke into his flat and took his backup disks as well as his laptop.

      1. logicalextreme

        Misread as "Socrates broke into his flat…". Looks like giving up caffeine is finally starting to pay dividends!

        1. Quinch

          I don't know about that.

          1. logicalextreme

            As long as a side-effect of something amuses me, I consider it a payoff. I'd get my eyes lasered but some of my distance misreadings are particularly hilarious (if you're me).

    2. logicalextreme

      Funnily enough, the first time I learned that backups need to be tested was the same day I learned that maybe keeping magnetic backup tapes inside a fireproof steel safe wasn't the best policy for a company to have.

  11. Zog_but_not_the_first

    Did you really say...

    "A master device file"?

    It's six of the best and off to PC reeducation camp for you.

    1. DavCrav

      Re: Did you really say...

      " Did you really say... "A master device file"? "

      Out of interest, what does MBR stand for nowadays? Main boot record?

      1. the spectacularly refined chap Silver badge

        Re: Did you really say...

        Out of interest, what does MBR stand for nowadays? Main boot record?

        It still stands for Master Boot Record. As opposed to Partition Boot Record. It never referred to IDE master/slave relationships if that is what you are thinking.

    2. TRT Silver badge

      Re: Did you really say...

      I think you'll find it says primary device file, and also contains a detailed prediction of the COVID pandemic.

  12. Anonymous Coward
    Anonymous Coward

    I am cursed with having to work with people who, when a filesystem goes full, go gzip happy.

    Java file? Ohh, that's big. I'll gzip it. Queue me looking around trying to see what was done because now the billing system is falling over with a Java error.

    Log file? Ohh, that's big. I'll gzip it. Log rotation breaks.

    LOG file? Ohh, there's a few. I'll gzip the lot. Who needs LDAP anyway?

    Pesky files under /usr/lib64? I have people who can deal with that. Who needs a running system anyway.

    Not as destructive as deleting, but still.....

    1. Anonymous Coward
      Anonymous Coward

      Eons ago, we had the monitoring system reporting Oracle log file systems to be partly full.

      And operators zipping the 2 files in there.

      And Oracle DB not being happy about it :)

      LOL, Oracle DB was quite robust back then !

    2. juice

      > I am cursed with having to work with people who, when a filesystem goes full, go gzip happy.

      Nah. The fun one is when someone deletes a log file (or similar) which a process is still hanging onto.

      At that point, you're left with a large chunk of space which can't be reallocated, and which will resolutely stay as-is until you find and kill/restart the parent process.

      It's the dark-side equivalent of the little trick used to save the day in this article :)

    3. logicalextreme

      It's also super-sustainable. Presumably when no more can be gzipped they intend to gzip the gzips à la Xzibit?

    4. Doctor Syntax Silver badge

      gzip /dev/*

    5. RichardBarrell

      > Java file? Ohh, that's big. I'll gzip it

      Please tell me they didn't gzip a .jar file? Those are already zip files, compressed with deflate!

      (You may get a little more compression out of them because zip compresses files one at a time, but rarely an interesting amount.)

      > Log file? Ohh, that's big. I'll gzip it. Log rotation breaks.

      Log rotation should already have been gzipping yesterday's rotated copy. ;)

  13. Remy Redert

    Recently had a fun time with backing data up. A semi competent computer user asked me to help him get a Linux instead going on one of his PCs. Having set up a previous machine of this with Linux in the past, I said sure, helped him decide what distro he wanted and came over armed with a live USB.

    I checked with him that he had everything important backed up, he had. So I plug the USB stick in and let it run the pre configured install while we talk.

    I walk him through the steps so he can do it himself next time, including the part where the drive gets repartitioned.

    At the end I said we were ready to put his backed up stuff on the machine and asked him for it. D:\backup he says. He'd created a new partition to put his back up on. It was of course nuked during the install.

    I've learned to always ask for the backup before even starting the machine.

    1. Antron Argaiv Silver badge

      My approach for this is to tell the aspiring Linux user to buy another disk off Amazon. They're well under $100 and will arrive the next day.

      I then carefully disconnect and remove his existing OS disk, replacing it with the newly purchased one, on which I do a clean install of Linux.

      We can (or not) copy his files off the old drive, which then remains "on the shelf", in case he has a change of heart and decides Linux is not for him. This also provides him with the comforting knowledge (he can see it right there on the shelf) that his decision to try Linux is completely reversible. Very handy when converting a less-than-knowledgeable friend or family member from Windows to Linux (for reduced incidence of service calls)

      I do this for myself as well, when upgrading every few years. Handy to have a complete backup drive, which was perhaps getting long in the tooth, and a fresh new, (often faster and larger) drive for the new install.

      1. The Oncoming Scorn Silver badge

        I For One Welcome This Strategy

        Yep - Did this for my massage therapist (The tales arising from that told elsewhere) & others in the past (Along with myself) replacing the spinning rust with a SSD,

        In the corporate world I always tried to ensure a re-image was done on a replacement system, other places your machine was assigned for life or at least for the term of your employment I'd rob a drive from elsewhere & keep the originals back for a week.

    2. Antron Argaiv Silver badge
      Thumb Up

      ...and with one of these excellent products: don't even need to open the box up again to copy his files over!

  14. Anonymous Coward
    Anonymous Coward

    ""We said a small prayer, crossed our fingers, booted the server..."

    Don't fib. That wasn't a SMALL prayer.

    1. Morrie Wyatt

      Re: ""We said a small prayer, crossed our fingers, booted the server..."

      Not at all.

      If the small prayer didn't work, they never stood a prayer in the first place.

  15. Blackjack Silver badge

    Windows 95 was still new...

    It was 1996, one of those "fix windows " programs deleted a file Windows needed to boot the GUI. I went into "DOS MODE" and found a backup. I renamed the *.bak" file to "*.bat" then rebooted. It worked and saved us having to buy a Windows 95 install CD, since the machine didn't have one as it came with Windows 95 pre-installed.

    I was 13 at the time and a totally noob with computers, who do you think had run that "fix windows" program in the first place?

    1. heyrick Silver badge

      Re: Windows 95 was still new...

      "I was 13 at the time and a totally noob with computers"

      And, yet, you likely knew more about what was necessary than the authors of those "fix Windows" programs.

      Before letting anything near my real machine, I'd give it a whirl on an old box running a basic installation (era of 98SE ~ XP). My god, I don't think I found one single registry cleaner that didn't make things worse. One of them made such a mess Windows threw a BSOD on booting.

      Accordingly, my machine's registry might be cluttered and suboptimal, but it hasn't been wrecked by some half-assed attempt to "repair" it.

      1. J. Cook Silver badge

        Re: Windows 95 was still new...

        Yep. I've had to whip out the voodoo beads and rubber chicken to recover from a few things. (malware before they got stupid complex, virii in the same manner, and the occasional HP driver* that decided to soil it's underpants.

        I generally leave the registry the hell alone unless the machine is broken or complaining about something that's in the registry, and even then only on the advice of the author/publisher of the program or the OS.

        *and almost always for a crappy inkjet printer that had ZERO BUSINESS installing kernel level drivers. WTF HP?

        1. A.P. Veening Silver badge

          Re: Windows 95 was still new...

          I also mostly leave the registry alone except for adding a couple of small things when installing Windows:

          To force Home systems to use <Ctrl><Alt><Del> to log on and not show the previous user:

          Windows Registry Editor Version 5.00





          To show seconds in the system clock (handy to detect a hanging system):

          Windows Registry Editor Version 5.00



          1. logicalextreme

            Re: Windows 95 was still new...

            I've got a PowersHell script full of 'em, and GP tweaks etc., to put Windows into "barely competent computer user" mode. Labels on taskbar buttons, file extensions shown, all that crap. I like the idea of seconds to easily check for hangs, but I worry that it would use an entire CPU core just to do so.

      2. Blackjack Silver badge

        Re: Windows 95 was still new...

        I actually didn't have that much problem with registry cleaners because I made backups. And because I mostly just used them to delete leftover register keys from uninstalled programs.

        Nowadays I don't bother with the registry unless I absolutely have to. And I always backup first.

  16. Stuart Castle Silver badge

    A few years back, my main PC was running Windows 7. The OS was actually running well, but over the years, had built up a lot of cruft. I tend to use my home PC for mucking around with different packages, so it builds up a lot of crap.

    To combat this, at the time, I wiped and re-installed Windows every few months. While I do currently back up the important stuff online, I didn't at the time, just relied on the fact I had multiple drives on the machine, with the OS being installed on an SSD, and applications and data being stored on separate HDDs, with my various network shares all stored on one dedicated drive.

    Just as I had got to disk management page on the Windows 7 setup, my housemate walked in and started talking to me. I happily hit the "delete" and "new partition" buttons, thinking I'd selected the SSD. Then realised I suddenly had 1TB of unpartitioned space (the SSD is only 256GB). After a couple of seconds, I realised I'd just wiped one of the HDDs. Thankfully, it was just the network shares, which did contain some things I needed, but nothing I couldn't download again or otherwise re-create.

    1. logicalextreme

      This is why I live alone. Clearly your housemate's fault that one.

  17. This post has been deleted by its author

  18. logicalextreme

    You know you've been reading El Reg too long…

    …when the sentence "all fsck knows is the inode number" doesn't scan properly.

    1. heyrick Silver badge

      Re: You know you've been reading El Reg too long…

      You know you've been reading too long when you just nod in sympathy and mumble "uh-huh".

      1. Chris 239

        Re: You know you've been reading El Reg too long…

        The real measure you've been reading Register forums too long is when Amanfrommars posts start to make sense!

  19. Anonymous Coward
    Anonymous Coward

    Oh, the joys of dd!

    Many years ago, I was working for a small company. On this fateful day, as the last one scheduled to be out of the office, I was tasked with starting the backup job of our file server, to avoid disruptions to the users.

    So at 6pm on a Friday evening, well after the last user had left, i fired up a terminal, ssh'ed in, ran cat on my cheat list file of useful commands which i kept in my home directory, selected the line i needed with the mouse, copied and pasted the command into the shell along with the following blank line i helpfully inserted after each command, because, you know, hitting return was so much unnecessary effort.

    So the command started and i took one last look to make sure it was running fine before shutting of my monitor and then I realized, to my horror, that i had actually entered the wrong command:

    dd if=/dev/zip of=/dev/hda5 bs=4M

    I had mistakenly copied the restore command instead of the backup command. I panic-pressed Ctrl+C as fast and as often as i could, enough to make me - in retrospect - appreciate the durability of the keyboards we had back then. I'm happy to report that for whatever reason, dd hadn't actually started writing before I was able to stop it, and even more surprising, it actually responded to CTRL+C - i seem to remember that not always being the case with the dd command. A satisfying 0 Bytes Read+Written message helped return my blood pressure and heart rate to a more sustainable level.

    Sometimes It's better to be lucky than smart.

    Obviously, I prefer to remain anonymous.

    1. Paul Crawford Silver badge

      Re: Oh, the joys of dd!

      Which is why dd is nicknamed 'destroy data'

    2. whitepines

      Re: Oh, the joys of dd!

      I'm happy to report that for whatever reason, dd hadn't actually started writing before I was able to stop it

      Very likely because it was busy filling its buffers before starting the write process, and if your tapes are as slow to respond to the initial position request as some of ours, that could be several minutes before dd would even have enough of a buffer to start writing anything.

  20. heyrick Silver badge

    Serious question from a non Unix person

    "The file space isn't reclaimed as long as the file is held open by some process."

    How does the system actually manage to delete a file that something has open and active? Surely any sensible filesystem should refuse to delete an open file?

    1. hugo tyson

      Re: Serious question from a non Unix person

      "Delete" only deletes an entry in a directory (folder) and decrements the reference count on the actual bag-o-bits file. When the file has no references and no open handles on it, *then* the file is deleted.

      Normally, in the simple case, of course "rm foo" just removes folder entry foo, and also the bag-o-bits underlying it at the same time....

    2. donk1

      Re: Serious question from a non Unix person

      To avoid name clashes with temporary files you create a temporary file and then immediately delete it.

      The file handle can be passed to child processes and even to an unrelated process via a unix domain socket!

    3. Paul Crawford Silver badge

      Re: Serious question from a non Unix person

      While it seems like a liability in some cases (i.e. you can remove the directory entry of an in-use file) it also is the reason that UNIX like systems can do updates with far less reboots and trouble compared to Windows (that will not all this on an in-use file).

      The typical approach in UNIX is you write out the new files to something like 'foo.tmp', sync the file system so it if fully committed to disk, then rename 'foo.tmp' to 'foo' which is an atomic operation (and works in the same way that removing a in-use file works - on the directory mapping to inode, not on the actual file contents). Thus any process will only ever see the old file (via an already-open handle) or new file but even if a system crash occurs around that time, never an in-modification file.

      Of course any running process using the old 'foo' won't be updated but many processes and background daemons can simply be restarted (or are short lived) and the new version is now in use without disrupting anything else.

      Doing the kernel is trickier as it has to be rebooted for a new kernel image, but some Linux distros support in-use kernel patching by other means.

  21. J.G.Harston Silver badge

    One thing I learned with early Unix: if you mkfs onto a mounted device, then sync before logging off, it writes the cached filesystem back onto the fresh blank disk you've just created. I have memories of tearing my hair out wondering why my clean new 720K disk kept claiming it had 2M free on it. I was in the habit of typing 'exit' to log off, and my exit script included sync to make sure everything was nice and tidy....

  22. Luiz Abdala

    This is pretty much like killing an engine in a 45 ton lorry by accident, drop it in neutral, and then slam the 9th gear in, hoping to $deity the clutch can survive, the crankshaft won't leave the bowels of the truck in a hurry while it jump-starts itself to life....

    ...while carreening downhill at ever increasing speeds, you know the brakes won't work alone, and there is a sharp turn ahead.

  23. Stevie


    THE Unix file system?


  24. Anonymous Coward
    Anonymous Coward

    Some two decades ago or so, most of our computing resources were maintained by researchers as a side job. I had the responsibility of keeping couple of Solaris and HP-UX machines up and running - including the large Sun server that hosted all most all our home directories and lots of important research data at our department.

    One time we had an inhouse office party and as I thought that the program was quite boring, so I got a brilliant idea - as no one would be using any of our computing resources, I could shut down that Sun server and upgrade Solaris from 2.5.1 to 8 (if I remember correctly). All backups were up-to-date, OS upgrade should touch only file systems containing OS directories and I could just sip wine while the upgrade processes - what could go wrong?

    As soon as the system booted to Solaris 8, it was easy to find out what went wrong - all the home & data file systems were missing. That was one of the most terrifying moment that I have ever had with computers. I was not able to find anything that could have helped in that situation - except the note that I did not read before starting the upgrade - that one should disassemble all software based RAID arrays before upgrade.

    So, it looked like all the metadata etc. that was required to run those "RAID0" mirrors containing /home & /data went missing.

    What next? Should I return to the party and tell everyone that most of you won't be working for week or so because I had to painstakingly restore all the file systems from backup tapes? My last idea was that what if I just try to recreate the metadevices and disk arrays using the very same commands that I used to create them in the first place? If the only other option was to restore everything from tapes, then why not to try?

    As by some strange magic, the trick worked - all the missing file systems came back on line and there was no corruption what so ever. The feeling that would be run over by 40-ton truck vanished at the same moment...

  25. Anonymous Coward
    Anonymous Coward

    mySQL Slave - oh bugger

    Relinking a broken mySQL slave to the master via SSH, windows open to both. Restore 3-hour-old version of the DB to the slave as a seed before I sync them (it's half a world away behind a slow connection) - then realise I'm in the "master" SSH session, and 80-odd staff have just lost 3 hours of notes and tracking data. Almost fainted.

    Deep breath, blood flow restored - and hourly backups were working fine, only lost about 15 minutes' data in the end. Lessons were learned!


    That feeling ...

    Hit CR key

    2ms later bowels turn to water

    1 sec in, can’t focus and it feels like head is in a clamp.

    2 secs start thinking like you’ve never thunk before ...

    1. tfewster

      The "Oh No" second - the time between hitting Enter and realising your mistake.

      With experience, time slows down as your finger descends towards the Enter key, as a sixth sense that something is wrong kicks in, enabling you to stop yourself before the fateful commit.

      (Yes, I do still make mistakes, just not as many these days.

  27. Anonymous Coward
    Anonymous Coward

    When rm -rf * is not what you meant

    I somehow managed to delete the entire java folder of a running WebSphere server when doing some maintenance. Oh this is going to hurt when I tell management the server is dead. Except if ket merily chugging away. Now what to do? Well, it was a node, and it had brethren, so, why not just scp the java dir from one and "restore" this one? Worked a treat and never had to tell anyone.

    Anonymous, cause it's not that good a story.

    1. Anonymous South African Coward

      Re: When rm -rf * is not what you meant

      Well, it was a node, and it had brethren, so, why not just scp the java dir from one and "restore" this one? Worked a treat and never had to tell anyone.

      A-ha... So this is what happened at Barclays then!

  28. aki009

    Reminds me of that day...

    I had an interesting experience deleting old user accounts some 30+ years ago as root (the only choice for the system).

    For convenience I was using a system provided script that also removed the home directory for a deleted user at the same time.

    Everything was going well until I hit that one user who had his home directory set to "/". I have no idea why a regular joe would have his home set that way, but the consequence was that the OS happily ran "rm -rf /" for as long as it could.

    Fortunately the system was being repurposed and only required an OS reinstall from -- gasp -- floppies. I forget how many there were, but it was quite a massive pile.

  29. Bibbit

    Just reading that made feel sick.

  30. Anonymous Coward
    Anonymous Coward

    Feeling lucky yesterday

    Yesterday, I had to to upgrade a linux 2 instance in AWS EC2.. in the interest of time and against usual procedures I made a short cut and just created an AMI, terminated the old instance and spun a new one.. and whalla! of course I couldnt access it. Fortunately, everything can be restored easily enough using the attached volumes.. but what could have been a 10 minute task ended up almost an hour.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like