back to article Can you recover your data if disaster strikes? Sure?

Disaster recovery is complicated and usually expensive. It comes in many forms, and many companies mandate a minimum of off-site backups for various reasons, from regulatory compliance to risk aversion. Disaster recovery planning is considered to be part of standard IT best practices today, but which solutions are appropriate …

  1. Anonymous Coward
    Meh

    "Though precious few systems administrators will admit it, every one of us goes to bed never completely sure that if all of the IT under our control was turned off it would all come back up again."

    I'll happily tell you many of our systems won't come back online, because the powers that be won't invest in it. So I go to bed in the knowledge that it all wont come back online and I sleep soundly because of it.

    1. Callam McMillan

      Sleep soundly knowing you can recover the emails that cover your arse!

    2. Peter Gathercole Silver badge

      Ultimately, though...

      ...whether you know that it won't come up, even if you're arse is covered by secured unheeded warnings from you to the Management, you still have to piece it all together using whatever is available.

      Unless, of course, the first thing that will happen after a disaster is your resignation hitting the temporary desk of your manager.

      Which is exactly what a number of sysadmins at a major UK financial institution told me would happen a number of years ago if their primary data site was destroyed. They knew the plan would work (it had been tested pretty well, but piecemeal), but they did not fancy the long nights, location disruption, bickering about what sequence the business workloads needed to come back in, and the almost complete inability to fail back to the primary site if it was resurrected.

      Of course, professional system administrators would not do this, would they?

    3. Anonymous Coward
      Anonymous Coward

      > I'll happily tell you many of our systems won't come back online, because ...

      At this point in time, I'm "fairly confident" that it'll all come up. But that's only because it's not all that long since we had a "black start" due to said lack of investment in the UPS and a power cut. That weeded out some of the older kit.

      On the other hand, I currently have a stack of recently retired kit to take my pick of for "upgrades". At my last upgrade, I looked up in our purchasing system how old the "new machine was - as my manager pointed out, about 9 years past it's refresh time !

      Says a lot about how the services I'm responsible for are valued when I'm putting into service, as an upgrade, hardware which most places would have scrapped years ago. Needless to say I'm on the lookout for another job - posted anon for obvious reasons.

      1. Lee D Silver badge

        I've worked in a lot of schools, state, private, primary, secondary

        Equipment refresh in them is 4 years at most, from what I've seen. That might be "4 years on reconditioned but warranted equipment", but once it comes in it's only 4 years before it leaves again. And to the scrapheap (I've tried for years to convince schools to allow resell of their old kit but they are never interested).

        The worst I've ever had was when I started a new job, and they said their Internet was dog-slow. When I stuck Ethereal (Wireshark) on the outgoing connection, I wasn't surprised. So much junk going in or out completely unnecessarily. I kind of put my neck on the line and told the head I could triple the speed he saw on Speedtest after doing that (purely because the line was SO busy because it wasn't managed properly).

        They took me up on it. I grabbed an old desktop from the "Someone bin this when you get around to it" pile that had previously been an office PC for many years. I put two network cards in it, installed Linux and - as a proof of concept only - put in transparent Squid proxying for web, decent firewalling and blocked all the junk leaving to the Internet. Needless to say, the "speedtest" result jumped up enormously, nearly 5-10 times the speed depending on the time of day. I'd got the job already, but that cemented the relationship for years.

        The reason I did it was to show my point quickly and not have to buy anything to do it, and also so that whatever I did could be undone in seconds if it broke anything important (literally, pull the two cables out, put a Cat5 coupler on them, and you were back how you were before).

        Five years later, the staff all moved on, and I moved on from that school and the Internet connection was still running from that desktop. Why? Because it worked. It had something like a 200Gb web cache on it, it was even running our fax-to-email setup and numerous other tasks (hell, it was there, it was near the telephone lines, it was plugged into the net, it cost nothing...). A DansGuardian filter was in use every day (and doing a damn good job in combination with redirecting all external DNS to OpenDNS which also filtered) and its logs had been queried several times in order to determine who was wasting school resources, etc.

        When I left, some consultant guy was brought in to advise the school about IT given that I was leaving and on his list was a Smoothwall box to do exactly the same (I don't begrudge that decision, as such, because someone needs to manage whatever it is that's doing that job but... thousands of pounds down the drain for a solution to replace an almost identical solution that had worked for years!). The Smoothwall UTM was something like a dual-core with 2Gb RAM as well, so god knows how it compared.

        But that original old-office-desktop box was the oldest (and probably the most busy) in the place.

        I understand not upgrading when there's nothing wrong with the solution you have. But you shouldn't be running on computers that were that old. As it was, from the first years onwards, we bought a replacement server and sat it next to that machine, with the intention of waiting for it to die until we replaced it. We imaged it across to the "new" server on a regular basis so we had a "warm spare". We even tested once, just pulling the cables and putting them into the other machine - worst you needed to do was change the network parameters (because of the device detection order) and you were good to go. But we never actually ended up replacing that desktop.

        Old and established is good when it's not hindering you in any visible way.

        As soon as it hinders, you should be ditching it.

        And you should be planning on how to replace it from the day you buy it.

  2. This post has been deleted by its author

    1. Peter Gathercole Silver badge

      Whilst I agree with you, and don't condone cloud services myself, it is becoming quite clear that the cloud pundits are singing a song that the beancounters of this world want to hear, even while not understanding it.

      It is inevitable that the steam-roller of this technology will flatten a large part of the corporate IT world, whether we want it to or not. It is happening quicker that I am comfortable with, because it is affecting my livelihood in a way that I will need to change what I do, something I don't relish at my age.

      But the article does make some quite valid points. If you find yourself working in a cloudy environment, then a lot of the advise in the article make a lot of sense.

      I hope that some of the wisdom gets as far as the people holding the purse strings.

      Is there a "Cloud services for Dummies" yet, because we sure as hell need one.

    2. Anonymous Coward
      Anonymous Coward

      The author's bio includes: "As a consultant he helps Silicon Valley start-ups better understand systems administrators and how to sell to them."

      I guess he didn't know how to sell to you. ;)

  3. Anonymous Coward
    Anonymous Coward

    Cloud services are fine, just don't expect the pigeons to carry more than a usb stick at a time.

    Trevor, like you we manage the IT of a manufacturing plant. The big problem is that it is out in the countryside and about 10km (as the cables run) from the nearest exchange. ADSL speeds are 'problematic' to say the least - sometimes they get a better service using dial-up. There is no way they can use cloud services in any meaningful way.

    I have a lingering doubt about cloud services being the general panacea for the IT industry. They might be in the cities with all the high speed cable services but it is going to be a long time before that type of service reaches out into the countryside - we even have a problem trying to get a 3G signal out here in many places.

    1. Trevor_Pott Gold badge

      Never said cloud services are a panacea. Not even close. Very specifically, I said DRaaS is not a viable tool for use. It's come of age, and there are multiple viable providers. Like all tools, DRaaS won't be right for all use cases or for all shops, but then again absolutely nothing is.

  4. Duncan Macdonald

    Second server room

    For a company large enough to be in 2 buildings - put a file server room in each building. (With modern kit even a garage can house a fairy large backup file server without needing air conditioning.) Use a dedicated high speed link between the main server room and the backup file server. The cost of the backup file server and link will be far smaller than the cost of a cloud backup by the time that the network costs to the cloud provider are taken into consideration.

    Examples:- Low end - 4TB effective storage - PC with 5 1TB SSDs in RAID5 and a 1Gb ethernet link - change from £3000. Medium level 20TB effective storage - Server with 28 1TB SSDs (25 in a 20TB RAID5 group and 3 spares to hot swap a failed SSD) and a 10GB fiber link - change from £20k. High end - dedicated building holding as many PB as you need - price HIGH!!

    Use SSDs for the small and medium cases as they have better survivability (heat, water, vibration) and higher lifespans than most HDDs (e.g. 5 year warranty on Samsung 850 EVO)

    1. Bloakey1

      Re: Second server room

      I really do do a belt and braces backup at one large company. I have the cloud for unimportant stuff, effectively what i do not need to recover. I replicate servers between two sites, I also have hardware encrypted hard drives that are effectively my daily 'tapes'. Topping all that off I have a server in the sun that is my fly back with goods in hand option.

      I had some practice this week when there was a big fire and power outage in London, data lost was one contract being written in some specific software. Software crashed out and created a 2k database, my backup was two hourly so I had a copy of 2k file, but more to the point my version control allowed me to go back 32 minutes to main database and recover it.

      Lots of backups, lots of version control (good for Cryptolocker, Zeus etc.) and all is well. crap in cloud so buzzword can be 'ticked' when asked.

      My arse was covered and my reputation was as good as ever.

      The cloud is effectively useless as far as I am concerned and I have no respect for journalistic puffery that infers it is OK. latency, sovereignty, security, time to recover (5 terabytes down a broad band line?, how many months will that take {or talk to host buy drive and wait for them to copy data and send it!!!}).

      I would be more likely to have Michael Jackson babysitting my children than have Microsoft looking after mission critical data.

      1. Lee D Silver badge

        Re: Second server room

        As someone who is in their current job because the previous guy did not backup (yes, you may now pass out in disbelief!) and their RAID5 went screwy and they lost almost everything (literally, all they had was what came back from data recovery and what could be recovered from client PC's roaming profile caches!):

        I'm with you.

        I work for a school. My "prime directive" when I was hired is to ensure that kind of data situation can never happen again. By the time I came in, they'd got some data back and got as far as throwing some storage onto the network to get operational, and backing up to that every day but my remit was to just do what was necessary to ensure business continuity if the worst happened again.

        So, although we don't have two sites, we are one LARGE site separated into many buildings. Which are all joined together with fibre. So why are we messing about? Stole a cupboard. Slapped a rack full of server blades into it, replicated everything across. That's our "Whoops, a cable has been cut" live copy. Always available, always idle, does nothing but replicate the main servers. Don't fall into the trap of USING those severs to do things because they are idle - they are THERE to do nothing at all until needed, and you don't want to find out that you don't have room/memory/capacity to spin up your secondaries because all the other junk you loaded is taking precedence!

        We already had a backup NAS device, so we bought five more. One pair go off site. Now you have daily, weekly and monthly too, and offsite. Now we're in negotiations with other schools to mirror VM across to entirely separate locations so long as we provide in kind too. With encryption, VLANning, VPN, etc. there's no reason not to. And I'm still considering a "cloud" backup provider as another fallback. Yeah, it may take days to get my data back from them. But knowing it's there is invaluable.

        The one thing that's absent? Tape. We don't use it. I'm not sure I understand it's purpose in a modern system, to be honest. As you state, so long as the version control is available (even if that's by rotating a pair of monthly offsite disks and PHYSICALLY TAKING THEM OFFLINE), there's little to use tape for.

        Cloud is just "remote servers". If that's your own remote servers on someone else's network, that's still the same - whether that network is another site, another company, or a datacentre that you host with. So long as you have other backups, and access to those backups somehow, it's just another part in the same plan.

        Backup strategies are a product of multiple tiers of speed, latency, and network "distance". You have things you can get backups from in a second, and others that might take a few days. You have things that are only 15 minutes "latent" compared to your system, and things that are months or even years latent in case you need to go back that far. And you have things that are on-hand with multi-gigabit connections, and things that are on the other side of the world/country/city with slower connections.

        Mix and match to get one of everything and you have a backup system.

        So long as it's as properly managed as any other, there's really no need for cloud, or (similarly) to dismiss it.

  5. Anonymous Coward
    Anonymous Coward

    On The Cloud Front

    To be fair, it seems to be horses for courses.

    Plus, I used to be sceptical about it myself, until I came across someone who is doing it right. In their case, the difference is that their system has been designed from scratch taking disaster recovery into account from step 0, so that it's woven into the very fabric of their infrastructure.

    These guys use cloud services because that's economical and provides other advantages but they don't *rely* on them--they assume those will also fail. It is impressive to see how they can keep running (well, more like walking) even in the event that *one* *crippled* server is all that's left. I'd never seen anything like that before. At the same time, even an idiot like me could understand their architecture, it was so cleanly designed.

    Bottom line: the most resilient systems seem to be those where disaster recovery was designed into them from the very start, rather than as an afterthought.

  6. Mayhem

    Version Control is really important too

    As Bloakey mentions above, Cryptolocker and its ransomware ilk are getting increasingly common, bringing us full circle back to the destructive child viruses of the 90s, which would corrupt everything.

    We recently redid our whole company backup solution, because while it provided wonderful protection against hardware failure, physical disaster, and accidental deletion, it noticeably wasn't good enough to 100% protect us against deliberate sabotage. And our daily/weekly/monthly backup times were set to be minimally disruptive to staff, which meant a problem that hit us on the last Thursday of a month if not spotted on Friday could be too late to recover from by Monday.

    You need to make sure you have air gaps in your backup scheme, whether that be physical gaps of backing up to tape, or virtual gaps like changing the underlying platform to limit the spread.

    Version control means you can effectively ignore the impact to roll back to before it hit.

    Cryptowall spreads across any mapped network drive or attached USB drive, so if your servers are set with permanently mapped connections, it *will* use them. I know other variants will use commercial exploit kits to search for open shares and spread there. And while AV may run on servers, it probably isn't running on your NAS.

    One of our clients got hit pretty badly a couple of weeks back, so it's something I'm very aware of at the moment.

    1. This post has been deleted by its author

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like