back to article A server apocalypse can come in different shapes and sizes. Be prepared

I run into the same misconceptions about business continuity on an almost daily basis. “We’ve already got backups, so why would we need to have a disaster recovery site as well?,” comes up with alarming regularity, as does: “We spent tens of thousands on a disaster recovery site, so why did we have that four-minute outage – why …

  1. Paddy Fagan

    Generally a well written article - but I have one bug bearer

    >You can have the most rigorous backups in the world, but if you don’t have somewhere to restore them (i.e. a disaster recovery site) then you may as well not have them at all.

    This isn't 100% true - in this case your time to recovery is the time it takes to source and access a replacement piece of hardware. If you've nothing in the cupboard, and no DR, that's down to what you can get out of your suppliers (or ebay).

    I think this is important because to me all of this is a continuum of business impact versus cost, that starts from this point and runs all the way to HA with BC, etc... But for some data, a backup maybe enough - the business cost of not having access to it for a few days may not be high enough to justify the cost of DR.

    Paddy

    1. Robert Carnegie Silver badge

      Re: Generally a well written article - but I have one bug bearer

      Good luck making purchases, even on eBay, when the server computer that your business runs on died, exploded, got stolen. (Company credit card? Really?) Also, good luck operating your business without it.

      If you're just running a window cleaning round then you may not have a separate server at all, but then this article is not for you.

      1. Paddy Fagan

        Re: Generally a well written article - but I have one bug bearer

        @Robert Carnegie

        To me at least, not every system needs much beyond a plan to buy replacement parts (not withstanding proviso's about availability ) - some systems your business can't live without forever, but can work around for weeks or even months.

        Paddy

      2. Anonymous Coward
        Anonymous Coward

        Re: Generally a well written article - but I have one bug bearer

        Good luck making purchases, even on eBay, when the server computer that your business runs on died, exploded, got stolen. (Company credit card? Really?) Also, good luck operating your business without it.

        Any sane business would likely just go back to doing the accounting by hand on paper. It'd be like it was before computers — temporarily, and a new machine would be purchased, installed and the data loaded back. Happened to Sony a little while back if you recall, if they can do it, so can we.

        If it were still absolutely needed that we had that server up before any purchase could be made (i.e. they couldn't do it on paper), I'd be reaching into my own pocket and buying a server for the office to loan temporarily. For short term "pull ourselves out of the excrement" situations, AU$500 goes a long way to getting something half-reasonable that will get us through.

        Once the "server" is up, we can then make the necessary requisitions for a replacement, do things properly, then it's simply one more outage as we shift the data over and decommission the temporary box (which I then take custody of … they can keep the hard drive if so desired).

        1. Pascal Monett Silver badge
          Trollface

          Yeah, but then accounting refuses your requisition on the grounds that the service is already running, and you have a devil of a time getting your expenses paid when all the accounting is done manually.

          1. Anonymous Coward
            Anonymous Coward

            Yeah, but then accounting refuses your requisition on the grounds that the service is already running

            Soon fixed.

          2. Keith Langmead

            Business continuity isn't just tech orientated

            @Pascal Monett - That's why you plan for that kind of scenario in advance in your documented plan. For instance you could have it in writing that in the event of a disaster, select staff WILL get expenses repaid (no questions asked within reason) where used to get things operational again. Or perhaps ensuring ahead of time that there are staff who have access to a company credit card (other than Directors in case they're not immediately available) or have limited purchasing authority.

            IMHO the biggest issue with DR is that done properly it has to be business led, but business tends to think of it as an IT only issue. Given enough budget we in IT can effectively get a customers system running 24/7 with zero downtime, but if a days downtime doesn't massively impact the business there's little sense spending thousands to protect against rare events. It's the business that needs to decide what the maximum allowable downtime is, and determine the financial implications of any downtime that does occur, for instance in £s per hour. Only then can be plan and budget a solution that's appropriate to the requirements and perhaps most importantly, the justifiable costs.

            1. VeryOldFart

              Re: Business continuity isn't just tech orientated

              I agree completely. Our approach uses a slightly different definition of Business Continuity to the one in the article. For us BC is the starting point and is done by the business. They determine how they will survive in the absence of the IT systems and what they will need to do merge the results of this back into the IT systems when they have been recovered. This focuses the business on how they will continue to operate while the tech is unavailable and consequently helps them identify the maximum acceptable RTO and RPO and balance this with how much they are willing to invest to meet them. I have led several desk-top exercises taking people through this process.

    2. Groaning Ninny

      Re: Generally a well written article - but I have one bug bearer

      If popping out to the ships to get a replacement PSU/disk is part of the plan, then the written plan should say so. It should also look at the risks (downtime, stock availability, zombies roaming the street) and explain how they are justified. Once senior management signs off on it it's okay.

  2. Peter Gathercole Silver badge

    Disaster Recovery?

    Your definition of "Disaster Recovery" does not match what I've traditionally worked with, although I'm prepared to admit that things have moved on a bit.

    Depending on your allowed down time, a disaster recovery solution may include a rental company delivering pre-arranged configurations of machines to a known site. It does not mean that you have to have those servers already installed.

    If you have a nominated site, with the correct communication and power already laid on, and you have an agreement with a rental company that can deliver predefined configurations of kit within an agreed timescale, and this (and the rebuild) fits into the allowed window, this can also count as Disaster Recovery (in fact, this used to be what Disaster Recovery was all about).

    It all depends on how long you can be down for. Nowadays, many businesses really require full-blown geographically separated high availability or multi-site concurrent solutions, as even a few hours downtime can seriously hurt some businesses.

    You've also not included the term Fault Tolerance, although you've alluded to this with the redundant power supply example, but redundancy and fault tolerance are not exactly the same. Maybe it's just normal that people expect fault tolerance by default now, but I'm sure the term is still used in documentation.

  3. Fortycoats
    Alert

    You forgot the most important point

    TESTING!

    No matter what your recovery scenario is, be it just restoring from tapes or near-instant failover to a secondary site, it's not worth anything unless it is tested, and tested regularly. Both the systems and the people operating them need to know what to do when the s**t hits the fan, and it should be written down somewhere, preferably in dead-tree format.

    A backup that you can't restore is useless, as is a database cluster with mirrored storage that won't switch over when you need it.

    Or you could just use the Dilbert version:

    http://dilbert.com/strip/2000-08-15

    1. Anonymous Coward
      Anonymous Coward

      Re: You forgot the most important point

      > Or you could just use the Dilbert version:

      Darn, you've completely saved me the hassle of writing anything since that perfectly describes our BC and DR plans :-(

      At my last place, after an audit, our director came to us and said "write some disaster recovery plans for IT". That was about it. I was lucky, by chance, to get a last minute discounted place on a BC course which was ... quite illuminating.

      Needless to say, when I got back and asked what the BC plan had to say about recovery point and recovery time objectives (without which it's pretty meaningless to start talking about technical plans which would either not meet business need or would waste money) - I just got told to "stop being difficult".

      Of course, there was zero budget in any case - just like here really !

      Anon for obvious reasons.

      1. Captain Scarlet

        Re: You forgot the most important point

        hmm but its missing a vital item, a scapegoat!

    2. Groaning Ninny

      Re: You forgot the most important point

      But even the Dilbert one is a plan. It's researched and defined given the resource constraints, and that's an awful lot more than many people have. Showing it to the management system is a way of then covering your own posterior. Job done.

      Definition and communication really are key!

  4. Captain Scarlet
    Coat

    "perform a bare-metal restore"

    I remember these being an absolute pain to configure and test (Ex ArcServe user, I think these came in at Version 10), thankfully never had to perform a bare metal restore on a live system.

    1. Pascal Monett Silver badge
      Trollface

      Um, if you have to perform a restore, I do believe it would be a dead system at that point.

  5. Mark 110

    Clustering does not eliminate need for backups and logs

    "Take databases as an example. In the dim, dark past, a database might be reliant on a technology like log-shipping to offer redundancy in case of an outage.

    This relies on nightly backups plus up-to-date transaction log copies for the intervening period, and ultimately will result in some data loss (for any logs that had not completed transmission prior to the outage) and some service downtime while your DBAs work to bring the system back online.

    Some people might refer to this as a warm standby. At best, it’s probably tepid.

    With database clustering on the other hand, the storage between the primary and secondary nodes is shared – perhaps with a SAN volume or replicated disks – so there is no requirement for backups and logs to be shipped."

    Database clustering does not eliminate the need for backups. Assuming you are talking about a geo cluster then it does eliminate the need for backups in a site failure scenario. There are other scenarios.

    Take the example of the admin asked to anonymise a test database and who ran the SQL script against the prod database. All data effectively destroyed and the destruction synchronously replicated to the DR site by block level storage replication.

    Without daily backups and 15 min log backups this would have been even nastier than it actually was (2 days loss of business).

    1. Anonymous Blowhard

      Re: Clustering does not eliminate need for backups and logs

      "Take the example of the admin asked to anonymise a test database and who ran the SQL script against the prod database"

      This is a good example of why a "prod database" needs to be on a different server from the "test database", ideally one that can't be remotely accessed by tools that can access the test DB; then your anonymising script (or whatever) won't be in the production environment and your admin will realise he's in the wrong system, because there's no anonymising script, instead of realising once the production DB is AFU.

      But your point about backups still being required still stands.

  6. Anonymous Coward
    Anonymous Coward

    Come back VAX/VMS, all is forgiven

    In what can only be some sort of cosmic karma, a pointy-haired type told me the other day,

    "Our VMs are bulletproof"

    ...which was a ludicrous claim. If only he had said

    "Our VMS is bulletproof"

    ...it would have been true. What a different a s/S makes. Does anyone still use OpenVMS clustering?

    1. Anonymous Coward
      Anonymous Coward

      Re: Come back VAX/VMS, all is forgiven

      I know a few systems from a past life are still running for governments in Europe & Africa - hence the anon ;)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like