back to article Oh, sugar! Sysadmin accidently deletes production database while fixing a fault

An unfortunate sysadmin has deleted the production database of diagramming outfit Gliffy, ironically while attempting to fix a problem with backup systems. An unspecified “issue” was discovered by the company in its backup systems last Thursday. However, on working to resolve the problem, during the scheduled weekend …

  1. Sykobee

    What we need is production databases that require 2FA or 2 user auth to run DELETE and DROP commands :p

    1. Alister

      What we need is production databases that require 2FA or 2 user auth to run DELETE and DROP commands :p

      Or possibly Sysadmins who stop and check, and then check again, before deleting anything, ever.

      My thought is that he restored a duff backup over the top of the live database, instead of creating a copy.

      1. Anonymous Coward
        Anonymous Coward

        uguu luckily never killed anything production, have trashed a server I'd been building in a fit of rage command typing.

        I'd of imagined Gliffy was all devops? They even have proper sysadmins? Where's the change control? Naughty people.

        1. This post has been deleted by its author

        2. swampdog

          Duff keyboard

          Dodgy right shift key. Been waiting days for a replacement. This scenario..

          # rm -v *.bak

          ..became..

          # rm -v *>bak

          ..only after this was I allowed to liberate the keyboard from the unused machine next to me.

      2. Steve Gill

        Probably thought he was doing a test restore but had the wrong database active

      3. allthecoolshortnamesweretaken

        - "What we need is production databases that require 2FA or 2 user auth to run DELETE and DROP commands :p"

        - "Or possibly Sysadmins who stop and check, and then check again, before deleting anything, ever."

        True. But you'll have to admit it - a sleek console with a two-key-thingy like in a missile silo would be totally cool.

    2. Nate Amsden

      I was at a company once where we had JMS queues backed by an oracle DB (weblogic). Had queue issues outage 24h+. The end solution was to truncate the tables with the queue data. I still remember to this day backnin 2004 the oracle dba saying along the lines of "I'm not truncating shit until you get the company president on the conference call to approve". The VP wasn't enough. The president approved and things got back to normal until the next outage at which point we were more comfortable truncating those tables. So many bugs in weblogic jms back then (haven't used it since that job)

    3. FuzzyWuzzys
      Facepalm

      When I got to do work on prod DBs I get a slightly, nervous sickly feeling, that keeps me on edge and makes sure I triple check exactly what I'm doing before hitting RETURN!

  2. Downside

    it's so easy

    ..to type DELETE * FROM TABLE everything and forget to add WHERE index_id=nnn

    It's taking a bit longer to restore because they've just discovered the backups have been broken for the last year.

    {cough} checks backups on own system {cough}

    1. JimmyPage Silver badge
      Boffin

      Re: it's so easy

      Isn't it DELETE FROM, not DELETE * FROM ?

      1. splodge

        Re: it's so easy

        Aye. It *is* very easy to type DELETE * FROM, but luckily, you get a syntax error rather than an empty table.

        It is almost as easy to delete the * when changing SELECT to DELETE, but that's a whole different story...

    2. zanshin

      My team runs our company's incident/problem/change solution. (Yes, the irony burns.) A few years back, we had tables in a QA DB that we no longer needed. DB administration at that level is managed by a dedicated DBA team, not the application team. We sent them a request for the table drops and, knowing our prod and test DBs had nothing to do with one another, thought nothing more of it.

      Except, unbeknownst to us, the tool the DBAs use to perform such tasks connected to both test and prod systems alike, and the over-eager person involved issued DROP CASCADE against the tables in question *everwhere*. In the middle of the US morning / EU afternoon.

      The only reason this did not completely destroy our production OLTP DB was that there were locks in play because of our level of user concurrency. (Logs later showed that the DBA actually tried the deletes several times when they failed.) Our prod reporting DB instance had no such protection and critical tables were wiped out. Restoring that took a long time because the tables were huge and, at the time, the reporting DB schema was not a 1:1 match with the OLTP system. (You can do that with fancier replication tools.) The reporting instance had to be restored from remote backup, which literally took days. Fortunately, for the duration, we were able to point most of our BAU features that relied on the reporting instance to the OLTP instance instead, accepting the modest risk of OLTP performance impact to keep important things working.

      Happily, this event did produce both process and architecture changes in the way the DBA support tools were used and set up. And, probably, at least one staffing change. o_O

    3. Tom 38
      Happy

      Re: it's so easy

      Because it is so easy, MySQL has a command line option for its shell called '--safe-updates', which disallows UPDATE or DELETE without a where clause.

      It originally had a much better name, '--i-am-a-dummy'. Gladly, it still accepts both variants.

      1. Anonymous Coward
        Anonymous Coward

        Re: it's so easy

        MySQL disallowing updates or deletes without a where clause is all very well...

        But imagine trying to type:

        DELETE FROM table WHERE ID >= 1234;

        And mishitting the "equals" key to get the key on the left. Hey presto:

        DELETE FROM table WHERE ID >- 1234;

        That's a whole different clusterf*ck...

    4. Phil Endecott

      Re: it's so easy

      My approach is to type

      SELECT * FROM IMPORTANT_TABLE WHERE ATTRIBUTE = SOMEVALUE

      cursor up ^a^d^d^d^d^d^d^d^dDELETE

      1. werdsmith Silver badge

        Re: it's so easy

        BEGIN TRAN

        START TRAN

        SET TRAN

        IMPLICIT TRAN

        (choose as appropriate)

        Then DELETE FROM WHATEVER WHERE THING = OTHERTHING

        If oh shit 1234434554453 rows deleted (I was expecting 1)

        then ROLLBACK

        otherwise if you are happy

        COMMIT

      2. JLV

        Re: it's so easy

        As a dev, not a dba, my approach when writing manual update commands for a live database that other folks will execute, tends to something like:

        select *

        -- delete

        from important_table

        where condition ...

        The instructions ask you to first execute the whole command & check that the select returns plausible values for what will be nuked. Then you are asked to select just past the -- comment on the second line, exposing the delete proper to execution.

        Slightly more complicated to do with updates.

        But I've bookmarked the wrap-in-transaction suggestion. Which is not exclusive with mine.

    5. Storage_Person

      Re: it's so easy

      Old habit but whenever writing a DELETE statement I always write the WHERE part of it first, then go back to the beginning of the line to add the DELETE. Same as whenever killing a process I always cut 'n' paste the PID. Little things that help to avoid issues like this.

    6. Alan Brown Silver badge

      Re: it's so easy

      "It's taking a bit longer to restore because they've just discovered the backups have been broken for the last year."

      In the case of a NEAX61M telephone exchange, the backups were fine, but what they were backing up was corrupted (but didn't get discovered until the system was rebooted after y2k updates had been loaded in)

      Cue having to go to 2+ year old backups, then replay every transaction from that point up to date into the system.

      On a live telephone exchange....

      With 50,000 lines on it.....

      In the days before mobiles were universal...

      The replay took 7 weeks to complete.

      The telco wasn't popular for some reason. I wonder why.

    7. Down not across

      Re: it's so easy

      It's taking a bit longer to restore because they've just discovered the backups have been broken for the last year.

      Which is why, if the data is important at all, you do fairly frequent test restores.

      Depending on your flavour of database it may even support some sort of validate without acually having to restore, in which case you do frequent validations (and still the occasional actual real restore to ensure it really can be restored).

  3. GlenP Silver badge

    Not quite the same but had a PC engineer kill a 486 SCO Unix server whilst trying to fix the external DAT backup drive.

    I was on holiday, I told them not to touch it until I got back but they thought they knew better ("they" in this case being the parent company) so called in their tame maintenance company who'd clearly never worked on a Unix box before. Engineer waltzes in, decides he needs to shut the server down but doesn't have a clue how so just turns it off.

    It took me about a week to get it fully back up and running with a mix of file recovery and restores from the last successful backup.

    The parent company wondered why I declined to go and work for them.

  4. gv

    If it's during the scheduled maintenance window, why was a full backup of the database not the first item on the schedule?

    1. werdsmith Silver badge

      If it's during the scheduled maintenance window, why was a full backup of the database not the first item on the schedule?

      The fact that they were trying to fix a problem with the backup system might have something to do with it.

  5. Johnny Canuck

    In a similar vein

    I once Ghosted a blank drive over a client's hard drive instead of the other way around - oops.

    1. psychonaut

      Re: In a similar vein

      yup, done that too! aargh! only the once though...years ago.

      1. Ken 16 Silver badge

        Re: In a similar vein

        I've seen it done once, years ago.

        Have we met?

        #CMS

    2. phuzz Silver badge
      Facepalm

      Re: In a similar vein

      Back when I was a young 'un, a friend had leant me a bunch of Amiga games along with a copy of

      "White Lightning", a fast disk copier that would work on a lot of copy protected disks. (yes I was pirating them, I was young and knew no better).

      My Amiga only had one disk drive, I think you can all guess how this one is going.

      The worst part was that I realised halfway through copying some game over the copy program and tried to stop it, but too late.

      I somehow tried to make out that it was all my friend's fault for leaving the copy tab open on the White Lightning disk.

    3. Andy Non Silver badge
      Coat

      Re: In a similar vein

      Holds hand up embarrassed. One Friday afternoon, a number of years ago, after pub-o-clock I did the twice daily backup of my employer's live production database on a DEC 8250 to a removable disk platter, but somehow fumbled the live database platter with the backup platter and overwrote the files on the live platter. ARGHHHHHH! After an hour or two of sheer horror I restored the live platter back to the close of the day before. Thankfully the files on the live removable platter were more or less static and it didn't appear subsequently that any data was missing or nobody complained of any missing data; which was a huge relief.

      Lesson learned: After pub-o-clock, don't do any mission critical stuff on a computer!

      1. Triggerfish

        Re: In a similar vein

        Worked on a hell desk where a newbie was supposed to delete a folders contents in a credit card processing server running SCO, that was causing it to fall over. He was actually a few levels up in the directory when he typed rm *.*

      2. Crazy Operations Guy

        Re: "Lesson learned: After pub-o-clock, don't do any mission critical stuff on a computer!"

        I've learned that the hard way so I built a script to connect to the ticketing system to push any changes planned for later than 4 pm or on a Friday to 9 am the following day or Monday morning, as appropriate. I ran a 10,000+ machine dev/test Datacenter so no one actually did anything outside of work hours.

    4. David 132 Silver badge
      Devil

      Re: In a similar vein

      I once Ghosted a blank drive over a client's hard drive instead of the other way around - oops

      I did the same, about 10 years ago. I had a gold master HDD full of product demos that I'd spent days carefully assembling, and I had to Ghost it to a blank HDD.

      I made exactly the same mistake as you, with two additional enhancements all of my own:

      1) My boss was sitting next to me in the lab, watching his "star engineer" at work,

      2) I had just said something like, "Now to copy this disk and we can both go home. Wouldn't it be funny if I got the source and destination disks mixed up! Ha, ha, ha!!"

      As I recall, Boss just shook his head sadly and left me to it; he knew from experience that I am not a nice person to be around when under stress.

    5. John Robson Silver badge

      Re: In a similar vein

      "I once Ghosted a blank drive over a client's hard drive instead of the other way around - oops."

      I have seen a RAID controller do that automatically...

      Mirrored disks, one fails - alarm goes off, everyone carries on.

      Pull it out, all good.

      Pop in a new disk, all good

      Array starts churning, excellent - copying data from one to the other, tea time.

      Erm, where are all the files?

      Why do we have two disks with identical unformatted data?

    6. Benno

      Re: In a similar vein

      Back in the late '90's I had a staff member accidentally ran a ghost multicast onto an entire subnet! Thankfully WOL wasn't implemented - so a bunch didn't start, plus we managed to turn a heap of systems off before they ran the client (thankfully Pentium 233MMX's don't' boot that fast!)

      It did take a few days to sort out the remaining carnage though!

    7. Anonymous Coward
      Anonymous Coward

      Re: In a similar vein

      Never done it myself but someone at our firm did the same. I am completely paranoid about this so always triple check when using Ghost or when writing Robobopy.

    8. Kurt S

      Re: In a similar vein

      Who hasn't? ;)

      1. Skoorb

        Re: In a similar vein

        Back early on in the Windows XP to 7 migration we needed to test the automated deployment tool, so we could see how the SCCM would actually deploy the installer image.

        So, a simple, near default, Windows 7 build with no software installed was created as a test by the project team to deploy to a couple of systems in a lab.

        Unfortunately, it was deployed to the early adopter test group, where all the monthly desktop updates go to be tested before being pushed out organization wide. This is the majority of IT, including the helpdesk.

        To get things moving along quickly, it was pushed out as mandatory, immediate and requiring a forced immediate installation including restart.

        Thus, the entire helpdesk and most systems in IT were simultaneously trashed in the middle of the working day before someone managed to kill the deployment.

        :-/

  6. Alien8n

    Haven't done it in a production environment but have trashed my own web server a few times when the upgrade path wasn't being nice and decided a clean install was a safer bet. Did take a backup of one particular table prior to trashing it though as it contained the guestbook for my brother's memorial page (no way I was losing that data)

  7. Anonymous Coward
    Anonymous Coward

    An ex colleague of mine...

    ... once accidentally deleted the whole CVS repository. The root partition had filled, and in attempt to recover space it obliterated the code repository - actually, the reason the root partition had filled was he incorrectly created the repository there.

    Of course his backups weren't working.

    1. David 132 Silver badge
      Devil

      Re: An ex colleague of mine...

      A friend of mine was responsible (circa 2003) for building drivers for a well-known brand of network card.

      He wrote a cron job to compile the latest build overnight, then e-mail him the result log.

      He made some elementary mistakes, and what actually happened was:

      1) build process failed, repeatedly re-trying and re-failing,

      2) his C: drive filled up with an ever-growing log file,

      3) eventually, when the disk was full, his script progressed to the next stage: trying to email the log to him.

      He came into work the next morning to find a very angry IT dept; he'd crashed the company's Exchange servers by trying to send a 2.1GB attachment.

  8. Anonymous Coward
    Anonymous Coward

    Some people are just born lucky

    I used to work in support for an IT supplier. Shortly after I left the support team I'd been hauled back in to run some training and got chatting with an ex colleague. He'd had a case a bit like this about a week or so before the training when he got a panicking call from a customer (from an ex institution somewhere in London) their DBA had been working on a problem and decided the best approach would be to dump the whole database out to some files, drop the tables and then reload it all. What could go wrong hey? The only problem was that this database of all their customers was quite big and so dumping it all to files and then reloading it was going to take time and the pubs were open. So he wrote a quick script to do it while he sods off to the pub.

    It's really only 3 simple steps after all

    Guess which of the 3 steps he got wrong?

    It was shortly after this that they realised that the safe was full of useless tapes which didn't contain the database (why doesn't anyone ever check their backups?).

    Queue panicking call to my mate in support on Monday morning.

    Luckily for the guy in question, my mate had been out on site the week before to investigate some problems on the database and just happened to have a tape sitting on his desk with their entire database (less the last couple of days) on.

    Me I'd have sent the tape to someone at the Bank of England, the directors might have had some explaining to do.

  9. psychonaut

    the feeling of your balls retracting into your body

    and a small "phhht" escapes from your anus as you silently mouth...."err, oh shit..."

    dont worry, we've got it all backed up.....

    right?

  10. Anonymous Coward
    Anonymous Coward

    No sh*t, Sherlock award of the week

    “We feel like we have failed you, our customers, and you expected better from us,” a statement form the company said.

    With incisive appraisal like that, it's only a matter of time before everything is back up and perfect

    1. Doctor Syntax Silver badge

      Re: No sh*t, Sherlock award of the week

      Well at least they're letting marketing do the PR, presumably the techs are getting on with the fix. Much better than the other way around.

    2. John Brown (no body) Silver badge

      Re: No sh*t, Sherlock award of the week

      My first thought too. There's "feel like" about it. They HAVE failed their customers.

  11. Miss Config
    Holmes

    Found Out The Rule The Hard Way

    That rules says that any data that does not exist IN THREE SEPARATE PLACES

    does not exist full stop.

    1. waldo kitty
      Paris Hilton

      Re: Found Out The Rule The Hard Way

      That rules says that any data that does not exist IN THREE SEPARATE PLACES

      does not exist full stop.

      Please define "THREE SEPARATE PLACES".

      Izz'at three separate partitions on the same device?

      Izz'at three separate partitions or devices in the same machine?

      Izz'at three separate partitions or devices in two or more machines?

      Izz'at three separate and distinct devices?

      Izz'at three separate and distinct devices in three separate and distinct machines?

      Izz'at three separate and distinct devices in three separate and distinct machines in three separate and distinct buildings?

      Paris because everyone cries when they realize their horrible mistake could cost lives or possibly just millions of $$$...

      1. Dazed and Confused

        Re: Found Out The Rule The Hard Way

        > Please define "THREE SEPARATE PLACES".

        There is a reason why the Veritas Volume Manager supports 32way mirroring. They had a customer ask for it. Said customer has 8 Data Center's (SIC) buried under 8 mountain ranges in 8 corners of their continent. At each data centre you need 4 copies, 2 for mirroring, a 3rd to split off for backup and the fourth so you can alternate the merge back, so there is always yesterdays data available too.

    2. Terry 6 Silver badge

      Re: Found Out The Rule The Hard Way

      Yes, and there should always be a copy( even if it's an extra one) of mission critical data that is never overwritten or replaced by the next oldest back-up until the latest back-up has been verified. (Years spent guarding a database of at-risk pupil records)

    3. Solmyr ibn Wali Barad

      Re: Found Out The Rule The Hard Way

      Also, having a good set of backups tends to increase the likelihood of never needing them.

      1. psychonaut

        Re: Found Out The Rule The Hard Way

        haha! this.

        the number of customers i get with dead hard disks

        "but youve got a back up right?"

        "yes"

        "oh great. where is it?"

        "well i saved them in a folder"

        "where"

        "on the computer"

        "this one here? the one that doesnt work?"

        "yes"

        oh dear

    4. Anonymous Coward
      Anonymous Coward

      Re: Found Out The Rule The Hard Way

      I don't care about the amount of places, I'll go one further:

      BACKUPS ARE USELESS

      What you really need are RESTORES. You can have a million backups, but if none of them restore you still have exactly zip. This is the lesson you learn when your backup medium is something that you cannot play back during your recovery, for instance if you've been using a new fancy tape device that you don't have a spare of.

      As long as you have not proven that you have a RESTORE (i.e. tested it) you have in my book nada. You'll get at best a 7 for effort, and a 0 for continuity checking.

  12. Jason Bloomberg Silver badge
    Pint

    Been there. Got the T-shirt

    Back in the day when not all computers had an OS I once saved memory to the system sector of the disk rather than the user sector it should have gone to. An all-nighter ensued restoring that from punched tapes - yes, it really was back in the day.

    I am sure I'm not the only one who has used MS-DOS 'COPY' to accidentally move a whole directory into a single file, then deleted that directory before realising. Or more simply deleted the wrong directory. I still occasionally get caught out by "COPY . .\BACKUP" where I don't press the keys hard enough so the first dot is missing and I overwrite the latest from the backup.

    With the best will in the world we all do something stupid from time to time. But there's nothing better than being screamed at that things have to be fixed quickly or heads will roll to make things worse than they already were.

  13. Mark 110

    Anonymisation script

    Reminds me of of the dba at a large financial institution that took a copy of the prod dB to use for testing and then ran an anonymisation script against production instead of the test copy.

    Took them two days to get the business back up and running.

  14. cd / && rm -rf *
    Alert

    It's easy to take the piss...

    ... but having come very close to doing the same thing* one day, I can only say "but for the grace of $DEITY there go I".

    * on a filesystem containing ten year's worth of scientific data. Yes, there were (proven) backups, but it would still have been highly embarrassing. Many a slip 'twixt brain and finger hovering over the Enter key**.

    ** ohnosecond, n: the shortest interval of time distinguishable by the human brain. It is thus named because it's exactly the length of time that elapses between hitting the Enter key and saying "Oh no!" (Thanks to Henry Law)

    1. Triggerfish

      Re: It's easy to take the piss...

      No way I'd class as a sys admin, but having worked with a few techy types, I came to the conclusion all decent sys admins, techs, engineers end up having a moment at some point where they have buggered up something for some daft reason. The decent ones are those who learn from it.

      I have certainly had my late night alone in the office moment of ooooh fuck, lets phone a friend and see if he can help me keep my job by morning.

      1. Solmyr ibn Wali Barad

        Re: It's easy to take the piss...

        Those important lessons of humility. That's what'll separate boys from men.

      2. werdsmith Silver badge

        Re: It's easy to take the piss...

        Yes, we've all experienced that feeling when you realise the mistake and you get a kind of sinking feeling, followed by a hot flush and then mind in overdrive as whatever the brain equivalent of adrenalin kicks in.

        But in my experience, 9.5 times out of 10 there is a way out, and you've just got to find it, and find it quick if you are in a pre-arranged downtime window (always ask for 5 X more minutes than you think you'll need).

        My colleague who re-configured an ODBC DSN on a 64 bit windows system, checked it, double checked it, triple checked it, then ran an upgrade process through it...... with a 32 bit application...... knows that feeling so well.

        1. Anonymous Coward
          Anonymous Coward

          Re: It's easy to take the piss...

          Doing a little work on prod one day when someone asked me to restart the UAT DB, I log into the UAT DB, get distracted for a moment, go back and restart the DB.

          After about 30 seconds the support guy turns around and goes "Any reason all the (workflow) engines have gone red"

          "Oh dear" in my mind, *looks at the terminals*

          "Yes, I just restarted the production database."

          An apologetic email and talk with the head of the production floor, it was only about 3 minutes of downtime, but heh. Felt bad man.

          This is why all my production databases are now in the dark green terminal with light green font.

      3. Anonymous Coward
        Anonymous Coward

        Re: It's easy to take the piss...

        "lets phone a friend and see if he can help me keep my job by morning."

        Yeah, those calls are always good for a laugh.

        A good long laugh.

        By the recipient.

        1. Triggerfish

          Re: It's easy to take the piss... @Meldreth

          They are also good for ending up having some really good steaks, pies and a ample supply of beer being brought down. Being the recipent of laughter is acceptable and possibly deserved, especially if the error was caused by hubris.

    2. VeryOldFart

      In a similar vein.

      from personal experience:

      p(fu) ∝1/n^2

      where p(fu) = probability of losing the live data

      n=number of backups

      therefor, no backups = data already gone

      one backup = it's certain to go

      two backups = still a 25% chance of losing it

      etc.

  15. Anonymous Coward
    Anonymous Coward

    Quit drinking !

    "An unfortunate sysadmin has deleted the production database of diagramming outfit Gliffy, ironically while attempting to fix a problem with backup systems."

    Well, as the subject says, bloke really needs to quit drinking ! When you're a storage admin, working on fixing any problem on backup systems, you need to be sharp, not fuzzy.

    Geez.

  16. BigWomble
    Facepalm

    Been there done that.

    Been there done that.

    Playing with MySQL Master - Mater replication.

    Dropping a table anywhere suddenly became bad news.

    We lost a few days of posts on a support forum and got one or two confused customers as the backup I had to hand was older than I realised.

    I'll stick with Master Slave for now. And keep better backups.

    1. Rich 11

      Re: Been there done that.

      Master - Mater

      Well, there's your problem, Mum.

  17. Anonymous Coward
    Anonymous Coward

    I was asked once to resize a virtual machine image.

    So I powered down the virtual machine, and typed:

    qemu-img resize image.qcow2 163840

    thinking it was measured in MB. It was measured in bytes. Anyone want to guess how much of a Windows 2008 R2 installation fits in 160kB?

    1. Anonymous Coward
      Anonymous Coward

      Re: I was asked once to resize a virtual machine image.

      Poor design on Qemu's part. It should check if data is going to be lost as a result of the command and require a '-force' or something like that. At the very least be smart enough to know that no useful VM can be 160K in size.

      Not that this excuses any admin from using a command he's obviously not familiar with without checking the man page first to be sure of what he's doing.

      1. Anonymous Coward
        Anonymous Coward

        Re: I was asked once to resize a virtual machine image.

        Poor design on Qemu's part. It should check if data is going to be lost as a result of the command and require a '-force' or something like that.

        Indeed. However that might break scripts, another option would be to add a "grow" command that sanity-checks the input. I had read --help at the time, but was so used to moving between it, Ceph commands and OpenNebula, the latter two specify megabytes as the unit.

        Luckily, the VM was not mission critical, and we had an old copy somewhere that we were able to press into service.

    2. psychonaut

      Re: I was asked once to resize a virtual machine image.

      is it the bit that says "pxe boot - failed"

      (alright i know thats the bios)

    3. John Brown (no body) Silver badge
      Coat

      Re: I was asked once to resize a virtual machine image.

      "Anyone want to guess how much of a Windows 2008 R2 installation fits in 160kB?"

      All of the good quality, useful bits?

    4. Dazed and Confused

      Re: I was asked once to resize a virtual machine image.

      I always like the way HP-UX's LVM had lvextend and lvreduce as separate commands, just to avoid this issue.

  18. R0man

    one of my first contracts, came from desktop support, started to work with servers, other staff away for the day on some course.. A problem with the backup server.. hmm i think i know what it is . i just need this tool from AltaVista.. oh oooo.. malware knocked out the servers network, i didn't know enough to just clean it and get it back up.. Called the admin...good guy but didn't tell him what i did. said.. err you'll have to rebuild it.. Never built a server, didn't even know what smart start was.. got it all back up same day and the backups working.. fastest learning experience in my life.. got kudos for rebuilding the backup server in a day.. and no one was any the wiser that me downloading dodgy software was the cause.. . Now responsible admin.. I think we've all got one.. oh shit.. that was a f*ck up.. as long as it's just the one.. all good.

    1. Rich 11

      [Deep breath] Would you please repeat that in English?

      1. R0man
        Happy

        Naa you'll have to copy and paste and put new lines in if you need the gaps to catch ya breath Rich ..

  19. Throatwarbler Mangrove Silver badge
    FAIL

    Bye, homies

    One mistake made as a young sysadmin was, while trying to clear out some hidden directories in a user's home directory, running "rm -rf .*". Did you know that ".." falls into that wildcard? I sure found out quickly, once I realized that the rm was taking longer than expected and killed it. Only lost a few users' home directories and was able to recover them from NetApp snapshot, but it was definitely a brown trousers moment.

  20. Midnight

    "Hey, guys. We really should test our restore procedures."

    "Not now. We're busy."

    "No, really. We need to test our restore procedures."

    "You already said that. We have too much going on. Maybe we can put aside some time for it around September."

    "You're not listening. We really, REALLY need to test the restore procedures. NOW."

    "Why is that so important?"

    "Because I just had a little accident with the production database. And it's kind of gone now."

  21. Tikimon
    FAIL

    My so-called supervisor killed an alarm processing server

    Working for a company that made wireless backup comms for fire and burglary alarm systems. My supervisor was a guy who thought having worked for a rocket company in the 60's made him smart. Our data lived in an SQL database which held all customer and traffic data for 24/7 alarm monitoring and call dispatching.

    One day he's playing with queries in SQL, in spite of knowing almost nothing about it. He types "delete" to clear his query, so he thought. So SQL obligingly deleted the WHOLE DATABASE. When the system crashed, I saw him in the SQL console and knew instantly what had happened.

    They had always refused to let me test the existing backup plan, citing the 24/7 thing. So the backups were useless. They had to spend three days rebuilding and reindexing the database from scratch, flying in two people from out of state, and paying an outside SQL expert massive overtime.

    Brilliant.

    1. Anonymous Coward
      Anonymous Coward

      Re: My so-called supervisor killed an alarm processing server

      ".a guy who thought having worked for a rocket company in the 60's made him smart.."

      Isn't Rocket Science the benchmark for smart?

      1. John Brown (no body) Silver badge

        Re: My so-called supervisor killed an alarm processing server

        It is, if your the one doing the rocket science. Not so much for the guy cleaning the bogs.

        1. Scott 29

          Re: My so-called supervisor killed an alarm processing server

          Hm, what's bog?

          A bog is a mire that accumulates peat.

          Frig. What's a mire?

          There are two types of mires: fens and bogs.

          No, what is it really?

          A stretch of swampy or boggy land.

          This Googling is haaarrrdddd.

          1. John Brown (no body) Silver badge

            Re: My so-called supervisor killed an alarm processing server

            "Hm, what's bog?"

            Netty, shitter, karzy, privy, crapper, throne, big white telephone, the reading room.

  22. Anonymous Coward
    Anonymous Coward

    Cisco ASA on older release a few years ago, sometimes after dicking about with VPNs and crypto, it gets a bit frustrated and usually removing crypto from an interface and reapplying it works a treat to clear the issue.

    Was working on a customer's VPN one evening from home, had strange crypto issues so thought I'd remove crypto and start again . . . too tired to realise I was accessing the device via a VPN and shut myself and all VPN customers out until data centre staff could restart it for me.

    1. Down not across

      That's why I like "reload in X", or "conf t revert timer X" (if on IOS 12.4 or later on a supported device (or better yet use the IOS' archive feature to archive configuration versions))

      JunOS of course has "commit confirmed X"

      Yes, of course I've locked myself out editing an ACL remotely. Once. Hence the above.

  23. Herby

    Backups, what backups, Oh, that one...

    Back in my PFY days (it was the 70's, forgive me), we were going to (eventually) upgrade the OS to the next version. A few months earlier we had ordered some more memory (expensive in its day) and I knew that the old (currently running OS) would upon seeing this new memory would use it, and since the new OS took more memory (but not as much as the add on we had just purchased) I decided to patch the current OS to limit its scan of memory and just use the fixed size of the existing memory. Fast forward to the installation of the memory, and I chime in "see, we need the new OS to use the added memory". I was hoping to get it installed because it had nice features I liked.

    Well, the powers at be decided that it would take training to get users up to speed (not really, but OK), so the installation was delayed a while. A little while later while experimenting, I wiped out the old OS, and being a good guy, and not knowing where the backup system tape was (or if it existed), just loaded up the new OS. Posted a few instructions (actually pretty simple ones) and left for the day (it was late).

    Surprise surprise, I come in the next day and tell what happened, and was asked why I hadn't used the backup tape (it was in a filing cabinet, safe keeping and all that). It was handed to me and I looked at its date, which was before I had buggered up patched improved the OS to only accept the old memory limits. OOOPS!!

    I reloaded the old OS and everyone wondered why they had more memory. I flustered a bit and said, well the new operating system was MUCH better!!

    Eventually we did go to the new OS, but I did a whole lot of dancing that day.

    Ah, youth.

  24. Mark Exclamation

    I remember the time the senior IT person from our parent company used scripts to copy the registry from the AD primary server to all the secondary ones (so they all had exactly the same registry). Once he realised his mistake, his solution? - reboot them all! It was a Friday morning so our network was down all day, and it took us local IT people all weekend to fix it up. Expecting to get a "thank-you for working all weekend" from the IT manager of the parent company, all we got was a "Why did it take you so long to fix up our error?". And the person who made the error? - he's now a senior manager!

    1. Anonymous Coward
      Anonymous Coward

      "And the person who made the error? - he's now a senior manager!"

      Yes. Yes, I am.

      Even though your dawdling fix could have made me look bad.

      Thankfully, I controlled that flow of information.

    2. Trixr

      "Senior IT person"? What, the senior Mac desktop support person?

      Also, if it was actually Active Directory (and not NT domains), there ain't such a thing as an "AD primary server" (yes, PDC Emulator, but that's not the same).

      As for making him a manager, well, safest place, I suppose.

      1. Mark Exclamation

        Yeah thanks, Trixr, I remembered after I had posted that it was actually the Primary Domain controller and secondary DCs, not AD controllers.

  25. Kepler
    Facepalm

    Ouch!

    "ironically while attempting to fix a problem with backup systems."

    Reminds me of the time in the Summer of 1985 when, while attempting to make a backup copy of a paper I'd already been working on 'round the clock for two full weeks, I accidentally destroyed my only copy and had to start the damn paper all over again.

    My goof? Instead of a blank floppy diskette, as I intended, I inserted the disk that already contained my only copy of the paper into my Tandy 1000's B drive, and then typed "Format B:"!

    Oy!

  26. Anonymous Coward
    Anonymous Coward

    I think the worst I've managed....

    Was emptying the company email unsubscribe list... Thankfully the mornings backup saved everything before the powers that be ever noticed....

    And they still don't to this day. (I have since moved jobs but it still gives me squeeky bum time thinking about it....also makes me very wary of live data).

    Annon to save the innocent and protect the guilty.

  27. Anonymous Coward
    Anonymous Coward

    Just hit enter

    So, the other day an email popped up from our endpoint protection server:

    "Malware detected on one of the workstations in your environment...

    bla, bla bla,

    Malware Name: win32/TesCrypt

    Computername: bla bla bla

    MalwarePath: c:\users\blablabber

    Action: Quarantine; succeeded"

    Quarantined. Okay we're good... Wait a sec. Tescrypt?!? Security guy next to me almost has a stroke, calls the user and tells them to pull the power cord. "I know you're not supposed to. Please, do it now!"

    Apparently the user had tried to click 'No' on the UAC prompt several times and finally put in a ticket cause it wouldn't go away. Helldesk promptly called back and advised to just click 'Yes'.

    "... clickety. My credentials don't work."

    "Let me try, it's probably just windows updates."

  28. Chairo
    Devil

    dd ?

    what do you mean - "if" should be the backup volume, not "of" ?

  29. John R. Macdonald
    FAIL

    Things seen in a distant past

    Reminds me of a gig I did as a contractor in the mid 1970's. Client company was running a small IBM mainframe (370/135?) with removable disks.

    To make things 'easier' for everyone (operations and programming staff) TPTB decided the production and test disk packs would have the same volume serial numbers.

    They did until the day the production disks were overwritten during a test run.

  30. Oengus

    SCO Unix

    Yes we used to run a SCO Unix server. We had an administrator who was responsible for the backups insisted in logging in as root.

    One day on the production server he entered

    rm -r *

    and pressed enter (he was in the / folder).

    About 10 minutes later he came out of the computer room and said that he was having a problem with the server.

    When I went in and looked at the screen I started laughing and he couldn't work out why... I called a mate in and he laughed as well (we were not in any way responsible for this system). They learned how good their backups were and the administrator was quickly moved on.

  31. Picky
    Unhappy

    Users can do it as well

    In the 80's I installed publishing systems for about 15 local papers. Each site was given a few boxes of 720k 3.5" floppies for backing up after each edition.

    Then one day I noticed that to "speed" things up the Editors were inserting disk 1, then disk 2 then disk 1, then disk 2 etc - to save using disks (needed a box for a full backup)

  32. I Am Spartacus

    VAXen - it was easy then too

    I watched a guy who had just come back from a VAX/VMS system admin course trying to set up mirrored disk - what DEC called RAID-1.

    Before I could stop him, he had mirrored the system disk with a blank disk, but had the blank disk as the master. We watched as the system slowly evaporated and crashed.

    Ahh an evening with TU45's loading VMS again.

  33. People's Poet

    http://www.taobackup.com/

    They should have read this first! In fact if you haven't read it yourself you probably should!

  34. Alien8n

    Crashed macs

    Due to an issue with one of our macs I had to completely wipe the hard drive of the mac and re-install.

    Cue the issues with re-installing, for whatever reason it would not accept the apple id and password to re-install. Solution? Time Machine to a fresh portable drive of the new macbook I'd just built and then went over the old macbook with the new Time Machine image. Result! (and as an added bonus meant it didn't have to wait several hours downloading the company data onto the old machine).

  35. Anonymous Coward
    Anonymous Coward

    test system and disk

    Reminds me of the time I worked for a bank, that used to accumulate trades on one system and transfer them by 5.25 floppy to anotger system to be executed. Oh woe was i the day that I used the floppy full of test data. Ah well 600ml dollars worth of trades CAN be reversed but it aont easy. I felt my career prospects were blunted somewhat. Talk about dead man walking. Thonking about it now, not sure how much was my fault, but somebody had to carry the can.

  36. Anonymous Coward
    Anonymous Coward

    It's pretty much a required mistake..

    .. to get your sysadmin creds, because it lets you experience what an adrenaline rush is, followed by a feeling of dread and panic. When you move on to BOFH stage you learn to ensure you keep the rush and let the users get stuck with the dread and panic part, but I'm getting ahead of myself :).

    Experience is something you get AFTER you need it, but in this case I think it ought to be part of anyone learning stage - better screw up something that will only get you a chewing out or mass derision for a while than not having the experience and do this in production.

    Mistakes are always made. You can recognise the professional by her/his ability to plan for them, even if all looks well and functional.

  37. This post has been deleted by its author

    1. Down not across

      @1980s_coder

      Are we supposed to laugh at this? Seems like an extreme case of simply not being up to the job or a company hiring idiots instead of professionals to save a few pennies.

      Pathetic.

      You never made a mistake in your life?

      These things happen. They shouldn't., but they do. It's fairly safe bet he/she isn't likely to repeat that mistake any time soon.

      Much more of an issue is how the company dealt with the incident and communicated it.

  38. Stuart Castle Silver badge

    A few years back, we needed a new equipment logging and tracking system at work. The system needed to interface with the existing inventory system, and offer facilities for booking equipment between given dates and also tracking who has that equipment.

    Myself and a couple of colleagues designed a system that would enable us to do this. It was a simple system. A user website that would enable users to book items of equipment. An admin website that (amongst other business admin functions that were nothing to do with equipment) enable us to print out the bookings, ban users from booking equipment and change it's status and a small utility that would enable us to scan equipment barcodes in and out, and perform stock takes. All of these used a custom designed SOAP service to access a database on SQL server. The justification for this is that with several mission critical systems accessing the database, my colleague who designed the database thought it a good idea to route all access through the service to prevent problems, and manage the accesses correctly. I suspect the real reason was he'd spent a lot of time researching SOAP and wanted a chance to put his research into practice.

    A few months after the first version of the system was released, the backend SOAP service fell over and would not start. It took out all the attached websites and the utility, which cause massive problem as by that time, we relied on it.

    My colleague investigated, and after a couple of hours found the fault. Someone had logged into SQL Server Manager, gone to the database holding the tables used by the system, and renamed the transaction table with a full stop.

    Now, I have no idea why it took him two hours, but I suspect he couldn't believe someone would be so stupid as to directly access a production database, so checked everything else first. The annoying part (for me) is that there were three of us with access to the tables. I genuinely didn't do anything (partly because I wouldn't anyway, and partly because I didn't think I had access), but when our department head found out, no one would admit to doing it, so all three of us got a bollocking. The two of us who didn't need direct access to the database also lost our direct access (which is how it should have been).

    Perhaps the ironic thing is that all three of us have Computer Science degrees, so should know not to make changes directly to production systems. For the record, I don't make changes directly to production systems. I always maintain a development and testing version of a system, then when the changes are properly tested, copy them over to the production system.

  39. Anonymous Coward
    Anonymous Coward

    He must be fired

    He must be fired for that.. he cannot be trusted.

  40. Scaffa

    I must admit I've done something shameful on a production environment before.

    After the initial "you fucking idiot!" calls were out the way and the issue was fixed, I actually got thanked for owning up.

    I got the trust back in the end, but I think people appreciate knowing that they can trust you to own up to a mistake - instead of hiding and praying it isn't discovered.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon