The only verified backup is one that you have restored from
I was told to make backups, not test them. Why does that make you look so worried?
Each week at work creates memories many are happy to forget, but some are willing to share with fellow Register readers in On Call, our Friday column that tells your tales of tech support. This week, meet a reader we'll Regomize as "Lionel" who recounted a story from a moment in his career that saw him serve as "senior …
COMMENTS
-
-
Friday 7th February 2025 08:17 GMT Joe W
The only backup is one that you have successfully restored from.
The things I have seen... also in my current job. Thankfully there is (at my current job) a team dedicated to backups, their implementation, testing etc. Things fall down (oh so hard) when people decide "they know better". Ugh.
-
Friday 7th February 2025 08:48 GMT Headley_Grange
The only verified backup is one that you have restored from
When I got my first Mac nearly 20 years ago I was amazed at the simplicity of setting up Time Machine backups after doing battle with Windows, Acronis and a NAS. It worked fine and invisibly. It saved my neck a couple of times by allowing me to roll back a couple of days on files that I messed up. Fast forward too many years and when I finally got round to buying a new Mac I selected the option to transfer all my old stuff from a TM backup. It failed because (in those days) the OS was also backed up along with user data. The new machine wouldn't restore from a TM run on an earlier OS. The solution was to update the old machine to the latest OS, but it was too old. That taught me to always have a plain copy backup of important stuff so now, as well as TM I use rsync to just copy my data to a couple of drives, one of which is stashed in the shed.
-
-
Friday 7th February 2025 14:50 GMT Snake
Re: my backups on DVD and RAID-5
@Pascal Monett
But RAID-5 isn't a redundant backup, it's redundant speed to put it in a way that makes sense. RAID-1 RAID-10, RAID-50 and RAID-60 are actually redundant; RAID-5 does not handle hardware failures well, contrary to common methos, because IRL your drives will fail during the extended rebuild process.
Change over now.
-
-
Tuesday 11th February 2025 07:18 GMT Brian Bixby
Re: The only verified backup is one that you have restored from
Speaking from experience; create a dumb text file, back it up, and do a restore as a test. When I started doing backups at one job they were proud of having a year of backups off-site. The first time tapes rotated out of storage I tested one to make sure I understood the restore process. I understood the restore correctly, but Backup Exec had been reporting an almost-full tape, and then verified the backup for a year. Then formatting the tape. They had a year of empty tapes.
-
-
Friday 7th February 2025 10:32 GMT Ishura
Re: The only verified backup is one that you have restored from
Although Time Machine has some cleverness, there's a folder on the drive (I think it's called "Latest") that just contains a plain copy of all your files. You can manually restore any files you need via Finder without needing to do a formal TM restore.
-
Friday 7th February 2025 10:50 GMT Tony W
Re: The only verified backup is one that you have restored from
This is the difference between a backup and an archive. A backup is for emergencies, it can be restored to the system it was backed up from. An archive can be restored to any current system. As I learnt when I got a new PC with a later version of Windows, can't remember which one but quite a long time ago. I'd backed up my Outlook Express email files with the backup facility provided, thinking the backup would form an archive, but the new version of Outlook Express was different and there was absolutely no way to convert the old backups to the new system. Not surprisingly there were complaints, and MS recommended a solution. This was to keep a PC running the old version of Windows, for as long as you think you might want the old emails.
I presume the programs were licensed from third parties and MS had found a cheaper source, with zero concern for the users. After all, only home users used Outlook Express and they have no clout.
-
Friday 7th February 2025 12:21 GMT phuzz
Re: The only verified backup is one that you have restored from
I've always preferred to concentrate on making sure it's quick and easy to rebuild a machine on fresh hardware (or a VM) and restore the data, rather than backing up an entire system. It's a more flexible approach because it forces you to be aware of any unusual quirks (eg 'you have to use this version of libexample'), which don't just help you during a restore, but they're also important to know for troubleshooting.
Of course, you then have to actually write the documentation.
-
-
Friday 7th February 2025 11:20 GMT Jou (Mxyzptlk)
Re: The only verified backup is one that you have restored from
You _can_ have that on windows too, Shawdowcopy (aka snapshots) were introduced 2003. You know, the "previous version" tab on the properties of a file or folder.
Albeit deliberately limited a bit for client OS since Windows 7 or 8, but with a bit scripting you can have it comfortably.
-
Sunday 9th February 2025 23:05 GMT Persona
Re: The only verified backup is one that you have restored from
No. Just because you have restored from a backup doesn't mean that it's really there.
Back in the early 90's our new SA implemented tape backup and we knew it "worked". When we asked him to restore a recently deleted file he would load the previous days tape and have it back in seconds. Oddly there was no evidence of it working with files that had been deleted or modified further back in time. Eventually I got suspicious and took a look at his backup/restore script. This in turn led me to look in /dev and see that one of the devices there was not a device but a very large file. What he had done was to make a typo with the tape device name so instead of backing up to the tape drive through /dev/rmt0 he was backing up to and restoring from the regular file /dev/rmtO
All the backup tapes he had been cycling through for months were blank. Our sole project backup was a snapshot from the previous day and stored on the same disk drive as our live project data.
-
-
Friday 7th February 2025 09:55 GMT werdsmith
I keep telling people that you don't have a recovery process if you haven't rehearsed it and drilled it and revised the docs at least every six months. That to take into account changes in personnel and changes in infrastructure, software, and business processes.
They are not interested because cost vs risk and it seems that to a manager, having the backups is sufficient to tick the box.
-
Friday 7th February 2025 13:29 GMT Doctor Syntax
"it seems that to a manager, having the backups is sufficient to tick the box."
This is where the Holywood Protocol comes into its own. The surprise auditor comes along and asks the manager for the ticked box for the last successful restore from backup. Extra points for the last successful full-scale DR exercise..
-
-
Sunday 9th February 2025 10:39 GMT Roland6
A side effect of this is it encourages people to read the log…
Back in the 80s I remember a system failing, no problem it’s backed up…
Naturally the recovery failed, someone then looked at the log, for the last 3+ months the entry was “system too big to back up” (ie. It needed more than the single QIC tape for the “unattended” overnight backup.)
-
-
Friday 7th February 2025 10:15 GMT mickaroo
>> The only verified backup is one that you have restored from <<
Back in the day, we ran an application on OS/2 that generated sequences of files with OS/2 long filenames. Our backup department (off in another building) ran backups of our data daily.
One day, we needed to restore some data. The restore was successful with one caveat... all the files restored with DOS 8.3 filenames and were completely unusable.
After faffing around for a couple of weeks trying different things with different backups (same result), one of the ladies in the backup group called and said "There's a checkbox in the top right corner to restore long filenames. Should I try that?"
That's a big "YES"...!!!
-
Monday 10th February 2025 11:13 GMT John Brown (no body)
After faffing around for a couple of weeks trying different things with different backups (same result), one of the ladies in the backup group called and said "There's a checkbox in the top right corner to restore long filenames. Should I try that?"
Why didn't "the backup group" already know about that and how the backup s/w worked? Or was that not their problem because there is another team called "the restore group"? :-)
-
-
-
Friday 7th February 2025 08:28 GMT Yorick Hunt
Ah, memories...
About a quarter of a century ago, I installed a painfully expensive tape backup system for a customer and left instructions on best practices, including regular manual trial restorations.
About a year down the track, disaster struck - as luck would have it, when I was interstate with no easy/quick way to get back.
I had a local agent pop out with a replacement drive, but he said there were no valid backups to be found. Quizzing the customer yielded something to the effect of "we started getting tape errors so we removed the tape and the errors stopped."
They hadn't performed a proper backup in months, had only done one trial restore about a week after I set the backup up, and didn't even think to get a replacement tape or simply roll onto the next day's tape - nor even to call me for advice.
No skin off my nose as they were only a casual customer, but it really makes you wonder just which cereal packet some people got their brains from.
-
Friday 7th February 2025 08:52 GMT Mishak
Here are the copies
A friend who used to work for a long-defunct mainframe company got a callout to help a customer after a disk crashed (in the days when it was a spectacular event).
After replacing the hardware he asked "do you have the backup copies so that I can restore from them?"
He was handed two sheets of paper with "See, I remembered to copy both sides"!
At which point the cause of the crash became apparent - the "operator" had taken out the platter so that it could be (photo)copied...
-
Friday 7th February 2025 09:27 GMT Prst. V.Jeltz
Re: Here are the copies
Cmon ,
I have heard nearly every dumb IT support story going but that is just cartoon level dumb
I've heard of the cleaning lady pulling the plug out
I've heard of the home user ringing the shop only to reveal theres a powercut
I've heard about 5.25 disk stuck to filing cabinets with fridge magnets
I've heard of same disks being hole punched for a ring binder
these are all "user" issues
.
An actual "I.T. professional" , a mainframe operator , photocopying a platter ??
I've heard those mainframes things were pretty expensive , only to be maintained by people with some idea what they are doing surely !?
"A friend who used to work ..." is how all urban myth stories start.
-
-
Saturday 8th February 2025 00:57 GMT cosmodrome
Re: Here are the copies
I've seen the same with 3½" floppies. I didn't even realize why these "copies" were made and put into a ring binder at the time. (I was a young and naive apprentice back then, assuming every professional would be at least rudimentarily competent.) It came to me years later when I read one of the many anecdotes on the net about "copied disks".
-
Friday 7th February 2025 10:05 GMT werdsmith
Re: Here are the copies
Many years ago a developer stapled a 5 1/4 inch floppy to a piece of paper and tried to blame an admin lady.
It was suspected that he had done it because he was way behind on his project and wasn't going to meet the deadline.
I remember him blanching when told a specialist company had managed to recover most of data from the unaffected tracks.
-
Friday 7th February 2025 10:21 GMT MrBanana
Re: Here are the copies
Our standard method for "gotta ship something, anything fast". Was to open the drive door (QIC tape or 5 1/4 floppy) while writing a few blocks of something plausible to the media. Put media in a jiffy bag, put an obvious footprint on the package and generally make it scuffed up. Post to customer. This would buy you a couple of days extra development/test time.
-
-
Friday 7th February 2025 13:00 GMT BenDwire
Re: Here are the copies
I found my stash of QIC tapes from the Win3.11 days, and for a laugh tried to restore one. It soon became apparent that the pinch roller in the drive had liquified over the years, which of course wrecked the tape too. Black goo everywhere.
These days I use a portable hard drive for backups, and blue-rays for archives. So far, so good ...
-
-
Saturday 8th February 2025 23:21 GMT Anonymous Coward
Re: Here are the copies
Be careful with VERY old CDs as they were not made for the speed of modern drives.
I discovered that the hard way: an old CD wouldn't read, spun up to max and the centrifugal force shattered it into thousand of pieces taking out most of the drive's mechanism as the shrapnel jammed it. I never found a way to prevent the spin-up, but as I now no longer use CDs and DVDs (I have a USB drive around somewhere but that's it) it's no longer an issue :).
-
Wednesday 12th February 2025 06:52 GMT Zarno
Re: Here are the copies
I remember a tool on Linux that would tell most ATA optical drives to limit their speed. I used it at one point to avoid vibration and subsequent bad data reads when a disc was off-kilter.
On a quick search, it was "eject -x N /dev/cdrom" where N is the "x number" that you want to limit to.
-
-
-
-
Friday 7th February 2025 17:28 GMT Neil Barnes
Re: Here are the copies
I did once transport (personally) a copy of the entire uncompressed radio 2 current music archive from London to Birmingham... several days/weeks faster than sending it over the existing data links.
(And some years later, PM'd the replacement of all the existing analogue and digital circuits with high-speed digits - 16GB/s from memory - for a third of the UK)
-
Tuesday 11th February 2025 07:18 GMT Brian Bixby
Re: Here are the copies
I worked on the AWS Snowmobile project, a cargo container full of drives to migrate exabyte-sized data lakes from physical storage to AWS. We parked the thing, the customer drained their backup library into it (which failed three times in the process), trucked it to the DC, and poured it into the AWS system. (There was a lot more to it, of course.) The customer said that using the fastest data connection available it would have taken 2 1/2 years of uninterrupted use of that pipe, assuming no failures.
-
-
Monday 10th February 2025 18:00 GMT IanRS
Re: Here are the copies
My father was involved in the early days of HDD development. Prototype drives that worked in the UK were shipped to the US, where they did not work. Closer inspection found that US customs had opened the package up, then what they found inside, and thoroughly poked around - there were fingerprints on the platters.
-
-
-
This post has been deleted by its author
-
-
-
-
Friday 7th February 2025 12:56 GMT Anonymous Coward
Re: Here are the copies
Nope, I've personally seen a secretary at an Xerox machine with a stack of 8" floppies, saying "boss told me to make copies of these disks"
I knew several "machine operators" on our IBM 4381 at uni that barely knew more than how to log on and enter commands from the instruction binder.
A malicious roommate of mine figured this out and would do things like call and ask to have his account session killed because it was hung.
"What account?" "RSCSN" "OK - wait, why did the printers and network just die?"
Technical stuff doesn't always get done by the technical people.
-
Friday 7th February 2025 13:28 GMT Jou (Mxyzptlk)
Re: Here are the copies
This type is the reason why RBAC (Role Based Access Control, or Role Based Admin Control) got invented and implemented. If done right, he couldn't have killed that session.
Your IBM 4381, did it have that capability for such fine grained admin control? (And did the Admins have the capability for such fine grained admin control, if avail?)
-
-
-
-
Friday 7th February 2025 11:04 GMT Dr Dan Holdsworth
Re: Ah, memories...
Possibly the same cereal packet that a UK police station derived their technical knowhow from.
They were backing up data to 3 1/2 floppy disks, and when a new admin went to check on them, he found that the people doing this had no conception of how much data they had or how much would fit on a floppy. They had enough data to fill a stack of 20 or so floppies but somewhere in the backup instructions something had been missed out.
For months these people had been taking one new floppy disk from the pack of new ones, formatting it and then believing that this was their data all backed up.
The admin's name was cursed forever afterwards because he disabused them of the notion that this was a backup, and instead made them do it properly, which took a long, long time to do.
-
Friday 7th February 2025 18:37 GMT James O'Shea
Re: Ah, memories...
I remember the days of backing up to 3.5" floppy. I spent many a fun-filled Friday night backing up my home machines, swapping floppy after floppy. It got old real fast and I got an external SCSI CD burner. After a while I borrowed a tape drive (also SCSI) from work and used tape. (I put the drive back on Monday mornings and used my own tapes, of course; I later bought my own tape drive.)
I had boxes and boxes and boxes of 3.5" DSHD floppies covering years of backups of Mac and Windows plus stacks of CDs and DVDs, and tape of various types, mostly DAT. I dumped most of them, the only working floppy drive I have is in an ancient beige PowerMac G3, the only SCSI system I have is that same G3, and the only internal CD/DVD drives I have are that G3 and a Pentium 4 machine from 2004-5 or so. I have a few external USB optical drives; the aged G3 and Pentium are allergic to USB 2 or later. (Offically the Pentium works with USB 2; in reality, it's picky.) I still use tape and DVDs and BRs. No floppies.
-
-
-
This post has been deleted by its author
-
-
Monday 10th February 2025 16:28 GMT Anonymous Coward
Re: Been there...
Ditto. At an office I worked at in the late 1990's, the IT Manager was adamant that he had full control over the office IT - a dozen PCs (running NT4) and a single server for email and shared files. When he left, the Managing Director assigned the job to the office secretary, sending her on a 3-day "IT" course. I was told, in no uncertain terms, that IT was nowhere in my job description. When the secretary left, without any thought of a handover, I was told that IT was now in my remit, and I was allowed to timesheet no more than one day per week to it.
The server had been set up to run daily (overnight) backups to tape, Monday to Thursday tapes being recycled every fortnight; Friday tapes were kept in a safe. One of the first things I looked at, when passed the chalice, was the backup logs - and they all reported errors. Digging a bit deeper I found that none of the daily tapes had anything useful, nor any of the weekly tapes (going back to when the secretary had taken over the task. Further digging revealed that one of the things she'd been told on her "IT" course was to immediately change passwords. That she'd done, but not realised that, for the backup software to be able to access the files (to copy them over to tape), it needed to be updated with the new passwords! Fortunately, we had never needed to restore. My next task was to run two backups, put one in the safe, and test restore from the other - success. A later task was updating the AV software - something she hadn't done as, on her "IT" course, she was told to test all updates before applying them (she decided the easiest option was to avoid any updates); since the AV software was also installed on each desktop, a spare one became my test m/c.
The list went on - providing a good case study on why short courses can be a dangerous way to bypass education and experience.
-
-
Friday 7th February 2025 08:57 GMT Lazlo Woodbine
For many years I worked for a large retailler, we had over 300 stores in the UK.
Each store had a SUN server with a DAT backup unit.
Each store ran a backup to DAT every night. The first job each morning was to pop out last night's tape and slot in the next tape.
Each store was supplied a box of 10 DATs with the shiny new server, one tape each for Mon - Thur, Sat & Sun, and 4 for Fridays.
When I left the company a good 4 years after they'd installed the Sun gear, I'm not aware of any store replacing any of the tapes, even though they were only supposed to be used 10 times each. Apart from the Friday tapes, each of the other tapes had been used at least 200 times.
The store staff would be completely unaware of the backup status, as the servers had no local monitor, they were only accessed remotely, or by a visiting tech guy.
I remember the system was so nasty, one year the tech support company quadrupled their fee because they wanted out, and the retailler carried on using them, as no other company had tendered for the job.
-
Friday 7th February 2025 11:11 GMT Bebu sa Ware
Each store was supplied a box of 10 DATs
DDS tapes† might have been better. ;)
DAT/DDS drives for me were pretty finicky but when the backup worked it was fairly reliable on restoration. I had reason to restore an accidental 10+ year old archive that I had made with gnu tar on a DDS4 / DAT120 tape. Rather surprised everything restored - the official backup was unreadable because there were no functional drives left that could read the media.
DLT and LTO were nicer technologies. I recall the QIC-40 tapes on early Suns were close to indestructible.
† I never did understand the difference between DAT and DDS and have never seen a DAT player.
-
Friday 7th February 2025 12:11 GMT Cessquill
Re: Each store was supplied a box of 10 DATs
I used to sell DAT players back when they were part of hi-fi components, and have had a few gigs recorded to DAT (which meant I needed to find somebody with a player for transferring). I remember demoing a Sony separate (and maybe others), but don't recall ever selling one. An interesting, but largely unnecessary format for the general public, but a great uncompressed format for the industry.
AFAIK, DDS started of with the same design as DAT, and I think initially the tapes were interchangeable - the only difference being the players / recorders themselves and what they're able to read/write. As DDS developed in capacity (thinner, longer, broader tape), they forked off *. Would love to know more though, as I've not had any experience with DDS.
* my patchy memory, not fact
-
-
Sunday 9th February 2025 11:19 GMT Michael Strorm
Re: Each store was supplied a box of 10 DATs
Yeah, Wikipedia seems to confirm that DDS was originally the data storage version of DAT.
Later versions apparently used narrower tracks etc al and presumably went beyond what the regular audio DAT hardware could have read/written. Also, after the first couple of generations, the physical tapes weren't backward-compatible with regular DAT machines:-
https://www.tapeheads.net/threads/which-dds-data-tapes-will-work-on-dat-recorder-deck.45773/
-
Wednesday 12th February 2025 19:30 GMT Andy A
Re: Each store was supplied a box of 10 DATs
DAT was hated by the (US) recording industry. Because it could create recording copies IDENTICAL to the source, thus avoiding the need for people to buy yet another album when their previous copy wore out, they were a threat to the profit stream. Buying a DAT drive in the US became very difficult, and manufacturers found it difficult to get the benefits of large production quantities. The main users for DAT seem to have been the broadcast industry.
-
-
This post has been deleted by its author
-
Friday 7th February 2025 16:09 GMT Anonymous Coward
Re: Each store was supplied a box of 10 DATs
"† I never did understand the difference between DAT and DDS and have never seen a DAT player."
Originally they didn't have any difference at tape level, recorders of course were different, being part of a computer vs. stand-alone unit on the desk. DAT cassettes had playtime in minutes, DDS -tapes in gigabytes (2GB). It was rumored that DDS- tapes were higher quality so many people used them for recording audio.
I've actually used a DAT ... and it wasn't very positive experience: It is/was using similar stripe sweeping technology as VHS, which meant load/unload time was in 20 seconds class. not very handy. Once it got it running, sound was good, CD -quality or (in some models) even better. Cumbersome to use and very high price didn't lead to a commercial success.
Data density on DDS, later known as DDS-1, was suitable for audio use but for data storage it wasn't enough (what ever is?) and there relatively soon came DDS-2 with whopping 4GB of uncompressed data. I still have some DDS-2 tapes from ancient backups .... more or less useless as even if I find a recorder able to read them, those are backups made in XP, not a trivial task to read.
DDS was eventually pushed to 160GB capacity, but that was the absolute maximum the tape could handle and it was replaced by LTO using a lot larger tape.
-
Friday 7th February 2025 22:30 GMT Richard 12
Re: Each store was supplied a box of 10 DATs
DAT was very popular in professional entertainment.
8 tracks plus timecode on a single tape was very handy. Very reliable too.
They vanished completely once computers got fast enough to run good multitrack sound. Zero rewinding and tape loading time was a true killer feature.
-
-
Friday 7th February 2025 23:45 GMT rcxb
Re: Each store was supplied a box of 10 DATs
DDS worked fine, but the capacity and speed wasn't great, and the drives sure did fail after a relatively short lifetime. And with the 160 (not 120) and 320 variants, they got quite unreliable, and considerably more expensive.
LTO-1 was a big upgrade.
As far as naming, they couldn't very well call it Digital Audio Tape when it was backing-up data with no audio, so DDS it is. They were DAT tapes, at least until DDS-160 switched to larger tape widths.
-
Saturday 8th February 2025 07:02 GMT PRR
Re: Each store was supplied a box of 10 DATs
> have never seen a DAT player.
I used to go through DATs like candy. I recorded audio on them, watching the mini-reels turn, hundreds of hours of live musical performances. We actually rarely played them, cuz the only machines were in the studio. When we first moved to DAT, good old cassette was the handy-if-hissy medium. We stopped buying DAT tapes when CD-R got more convenient, cheap, and (like cassettes) playable in the car on the way home. I understand most DAT drives aged poorly, but I was easing/being-eased out the door by that time.
-
-
Sunday 9th February 2025 16:32 GMT el_oscuro
Ah yes, DAT tapes. About 25 years ago, I piloted a new backup process at a remote site which backed up a database to a DAT tape. Fully tested restores, all of the rest. The site set up a two week rotation with 14 tapes, one for each day.
About 6 months later, the site was hit by a hurricane and the database server got the BSOD. The site contacted the central office for assistance and they tried to restore the backups to an older server. I got a panicked call from the central office admin who was saying that my backups were bad. They had tried to restore and said "bad tape".
So I asked them: That is an older server. What is the capacity of it's DAT drive? 4GB. I said: "you do realize that you can't restore these on that drive as they were written by a newer 20GB drive, correct?" Silence. Then they informed me that they had tried this with every backup tape. A terrifying thought occurred to me: What happens if you stick a 20G tape in a 4G drive? Did these idiots completely destroy all of my tapes before bothering to call anyone?
Fortunately, we didn't have to find out. The vendor of the original server had a BIOS recovery disk. We provided instructions to the remote site, which was able to use that to bring the server back up.
-
-
Friday 7th February 2025 09:01 GMT Mishak
It takes too long
Another friend used to work as a developer for a well-known automotive OEM, with all of their work being stored on a central file server.
All the "important stuff" was backed up on daily basis, which was fortunate as a major failure trashed the content of nearly the whole server.
Hardware was replaced and the restore process initiated - followed immediately be "Error - tape is blank". Not a problem as the tapes from the previous day could be used - which were also empty. As were all the tapes in the rotation for most of the previous year.
On investigation it was found that the backup+verify cycle had grown to the point it was taking over 24 hours, so a "temporary" measure was introduced to drop the "verify". That temporary measure was, of course, forgotten about and became permanent.
At some point the wires to the write head on the tape drive broke, leaving just the erase head operational - leading to a lot of well-erased tapes in the rotation set.
-
Friday 7th February 2025 09:09 GMT Anonymous Coward
Been there
IT department didn't need to test backups because "we're asked to restore so often".
Come the day of the failure the backup was corrupted, and the previous day, and the previous day. In fact every backup was FUBAR for the previous month.
The backups had been trying to write all data to the first 1kb of the tape then proudly stating 'backup sucessful'.
So obviously they started doing test restores after that....no they didn't, don't be stupid they stuck to the "we restore so often we don't need to". Backed up eventually by "we've got volume shadow copies anyway" and ending with, "365 replication is the backup".
I've left since then.
-
Friday 7th February 2025 09:24 GMT 42656e4d203239
Once upon a time...
I was working for a company doing IT/Network support, including backup stuff up on disparate systems and, yep, testing restores.
Then I was encouraged to leave. So with good grace I did - it was the right decision for everyone.
A few years later I heard that one of the systems I was responsible for backing up had a problem; disk failure or some such. The company I left hadn't made provision to hand over to anyone else (they couldn't, but that is another tale) so the system hadn't had any backups but had kept on trucking... until it didn't, several years down the line.
On hearing of the system's demise and subsequent discovery of "oops! no backups for n>3 years" I smiled sweetly and wondered about reaping what you sow... did they learn form the experience? You would have to ask them; the matter is in the big bucket containing things that are "Not my problem"
-
Friday 7th February 2025 09:28 GMT Michael H.F. Wilkinson
When I was doing my PhD research, I had a habit of doing weekly back-ups of my development (MS-DOS) machine in duplicate on a pile of 3.5" floppy disks (talk about a tedious chore). I would then restore one of these on a bigger "production" image processing machine, thus testing and verifying the back-up. After that, the whole shebang was copied to tape. I am not sure if the tape jockey verified anything, but for good measure I took the other back-up home, and restored that on my home machine. Paranoid? Perhaps, but I didn't lose any data during that time.
-
Friday 7th February 2025 09:29 GMT Christoph
Due to the boss keeping multiple copies of absolutely everything, the backup tape filled up completely and the backup failed. So when the system crashed our latest backup was a month old, and we lost a lot of stuff.
A year later, I asked the hardware guy what our latest backup was. That same tape, now 13 months old.
But the boss saved some money by not fixing the backup system!
-
-
Friday 7th February 2025 09:42 GMT ColinPa
Key - what key?
A customer told me that they had had a problem. They were taking backups, and doing a test restore - which worked fine.
The tapes were then sent to the backup site for long term storage.
They had a major problem, and needed to restore from a tape at the backup site. Unfortunately, the data was encrypted on the tapes. The primary site had the key, but the backup site didn't.
There was a quick panic while they found out how they could get the key exported from the primary system, and entered on the backup system.
They didn't know if the encryption was on the tape hardware or in the software.... which added to the confusion. I think it was both.
I had a check list when I went to customers. I added .... when did you last test the backups can be processed on the remote sites?
-
-
Saturday 8th February 2025 15:24 GMT druck
Re: Key - what key?
An encrypted or even just compressed backup has more riskl. If you have a single bit flipped due to a disk or tape error, you may get an incorrect character in a text document or database, or a corrupted pixel in an image. But if it is encrypted or compressed, you lose the ability to recover the rest the file from the point of corruption onwards.
That's why it's not a backup, but backups, you need to ensure you have many copies, then even if encrypted or compressed, you have a good chance of recovering each piece of data.
-
Sunday 9th February 2025 19:09 GMT rcxb
Re: Key - what key?
But if it is encrypted or compressed, you lose the ability to recover the rest the file from the point of corruption onwards.
A single bit-flip in an encrypted or compressed data stream does not necessarily corrupt everything. With something like LUKS2, flipping one bit will affect a block of 16 bytes. gzip may be recovered if you're willing to take some time/effort.
Many compressed/encrypted formats have a little bit of ECC or at least a checksum, so they may be able to correct a few bitflips, or at least it will tell you when you've got one, while the uncompressed versions will be of unknown integrity with NO WAY to find or fix the issue. More importantly, it is NOT OKAY to have "an incorrect character in a text document or database". Your bank wants to know whether you had $1 in your account, or if you had the $4,000,000,001...
See:
https://unix.stackexchange.com/a/684105
https://github.com/enssec/squashfs_bitflip_repair/
https://web.archive.org/web/20180708075208/http://www.gzip.org/recover.txt
https://www.urbanophile.com/arenn/coding/gzrt/gzrt.html
https://github.com/arenn/gzrt
-
-
-
Friday 7th February 2025 09:55 GMT Evil Auditor
Backup to /dev/null
I've surely posted this here before...
Early 2000s I asked a client, a small bank, whether they performed restore tests. Same answer: no, we get the daily "backup successful" message. At least, I managed to convince them that restore tests are a rather necessary task. And a couple of weeks later I get a phone call from that client after they found that all of their backup tapes were empty.
While setting up and testing the backup procedure, someone didn't want to wait for an hours-long backup to finish and directed the data stream to /dev/null. And then never changed it to write to the tape.
-
Friday 7th February 2025 10:04 GMT Anonymous Coward
I have 3 instances.
First one was a backup that I was not allowed to run as root (this is going back to the mid 80's, and for some reason access to root was extremely hard to get). What I ended up with was a Frankenstein of a backup. It would write a header, write my data, then wait for a piece of code to backup a database (written by the DBAs) before writing a trailer. To a tape device, which used a no-rewind device (nst) until the trailer was written, at which point it would rewind the tape. I would check that a trailer file had been written as it signalled to me that the backup had finished. What could go wrong?
Well, the DBA decided to change the tape device to a rewind device in his part of the backup sequence - so all that actually got saved was the trailer! I went to do a restore one day and that's when I found the problem. Now in my defence, I was not a sysadmin. I had no way of testing a full restore (only a couple of ICL Team servers and I was the first in Ops to work on UNIX). It was a hard lesson to learn early on in my career.
The second was a few years ago. There was a request to restore a very large database (nothing I dealt with this time - there was an entire offshore backup team that managed backups). Trouble is, it had been failing for months and no-one said a thing. There were some very worried people running around, and I believe some serious words were had.
Third one was when I was a support engineer for an app back at the turn of the century. Received a panicked call from a customer in the USA. She was complaining that the app would not start. Found out that it was unable to start the database. Then, after questioning her and checking a few things, realised that the database was missing. Turns out she was running out of space and the best course of action was to delete the database. I felt sorry for her (she was nearly in tears), as the last known good backup was the last monthly backup about 2 weeks old. They lost a LOT of work.
-
Friday 7th February 2025 10:35 GMT KittenHuffer
I have only 2 instances.
One I was directly involved in, where the HD in a PDP11-73 decided to turn up it's toes between completing the backup and verifying the backup. DEC were called to replace the drive and get us to the point of attempting the restore. The monthly full system restore went fine ....... then the unverified backup from the night of the crash also went fine! No data lost, full restore acheived!
And the one I was only indirectly involved in. A system crashed and it was then discovered that due to 'an unpublished Oracle issue' the last good backup was 6 weeks old. And it was necessary for everyone to scramble to reinput 6 weeks worth of worth. My involvement was that even though my team were scabbling to help with the problem I was the only member of the team that was not only not involved in the scrabble, but had not actually been told that there was an issue to scrabble about! To this day I believe that this happened because my mangler correctly assessed that I would be the one person to stand up in a meeting and say 'the Emperor has no clothes' concerning the reason that was given that there were no recent backup.
-
-
-
Monday 10th February 2025 08:08 GMT Nitromoors
Re: test, test, test
Ok. So you do the backup to a tape it does not matter DC300 or DLT of one on many thing that existed in between. The. You do a verify and compare and check the logs for file count, blocks written, errors reported and all is good.
How can you be certain that the tape will still be readable and not damaged on the last rewind ? Tapes are fickle things.
I did this for 35 years. First as the ‘oik’ and the only one who understood the computer and finally as IT Director with a staff and distributed systems all over London. Despite test restores and occasional disk failures it was always a clenching moment when someone called ‘Let’s restore it from tape’.And. that is without the random variations that an Exchange Server restore might add.
God Bless HP NAS,s VMWARE and Veam of blessed memory. We all slept better when that came along.
-
-
Friday 7th February 2025 10:26 GMT Anonymous Coward
Backing up the Internet
I spent a couple of years working for a decent sized boarding school
Much of this time coincided with covid lockdowns, so we spent a lot of the lockdowns updating systems, installing new servers etc.
One day, the director of IT popped his head around the door and mentioned in passing that the Governors had been looking at disaster recovery and decided they didn't trust OneDrive not to lose data.
This prompted one of the Governors to offer up space in a data warehouse he had shares in, which was nice of him.
A colleague wrote some code and set in motion the backup of 300TB or so of data our lovely students had amassed, made up of over 90 million files (probably mostly movies, music and game saves), to a bit barn somewhere in Buckinghamshire.
6 weeks later, when I changed job, the backup was still in progress...
-
-
-
Friday 7th February 2025 15:50 GMT Anonymous Custard
Re: Backing up the Internet
I can empathise with that live.
We've recently started moving in that exact way (some edict from on-high that PST files are now to be verboten for some reason) and so I'm piloting such a move as all my stuff is in nicely organised PSTs that our IT crowd can easy grab and upload in the background directly.
One got missed though, so I've spent today copying it within Outlook. At the start (around 9:30am) it said 3-4 hours remaining, and as of now (3:45pm) it's about 1/3 done and is saying 12 hours remaining (and it's locked out Outlook, so I'm doing mails via the webmail portal instead).
Gotta love those elastic Microsoft minutes - guess whose laptop is going to spend the weekend locked to the desk and slowly chugging through the task...
Somehow I'm expecting to come into the office on Monday to find it 95% done, and have either crashed or the remaining time to be just about enough to reach the heat death of the universe...
I feel more sorry for the IT guys, as apparently in our office there are some 4,500 or so PST files to be moved.
-
Friday 7th February 2025 16:33 GMT Anonymous Coward
Re: Backing up the Internet
".... PST files are now to be verboten for some reason"
There has been a nasty suspicious that Microsoft wants to own your emails, literally. That's why local storage is a no-no.
And anyway, they feed everything to their own AI: You can't do that if if emails are stored locally.
-
Friday 7th February 2025 22:40 GMT Anonymous Coward
Re: Backing up the Internet
I lost a huge amount of work email due to something like that.
Don't know what the IT dept did or didn't do, but half my folders are completely empty since we moved to "the cloud".
Had a reminder today, as I ran into an issue we'd worked around a decade ago. The Sharepoint file describing it is long gone, as is the Confluence page it was copied to, but I had an email trail with the draft...
Ah.
-
-
Friday 7th February 2025 16:57 GMT FrogsAndChips
Re: Gotta love those elastic Microsoft minutes
This one never gets old: https://xkcd.com/612/
-
-
-
-
Friday 7th February 2025 10:31 GMT ralphh
No backup existed
Decades ago I started a new job and created a new team.
IT supplied drive space and we dutifully wrote source code and saved it to our drive. Two years later IT came to me and explained there had been a drive failure and they discovered that they'd not added my department's drive space to the backup schedule.
Being head of department I'd had my own backup schedule with off-site backups. Lost nothing. Had most of the QA department's data too.
-
Friday 7th February 2025 10:53 GMT Anonymous Coward
AC for reputation
Worked for a Bank in Live Support on call, was called into a Sev 2 Major Incident one night, circa 2am. Middleware server had had a newbie project resource make a disaster zone of failed changes and finger ferkups... can I fix it.. it was in such a state i didn't even know where to start, he had no notes, no recollection of what he had changed etc so I did the dutiful thing and asked Unix support to start working out a restore from backup ( Tivoli ) . Restore was approved on the call and so off they went to crack on.. whole file partition was about 30 GB so not huge even back then.
1 hour went, 2 hours went, 3 hours went.. " how long is this going to take I ask".. Well... its like this as we are on day 29 of our monthly backup process which is a full on the 1st then incremental we need to work through each one, but the tape system has a automated tape switching process and the way it works it needs to switch tapes to do each file which takes 6 mins each time..
17 sodding hours that one restore took.. since then I always ensure I am in full control of every single change I do, if I am changing anything I have full backups myself that I control. I rely on no other team or tech to be able to just put it all back as it was and walk away if necessary. Its saved me a few times when deployment files were not checked before being passed to me to deploy.
-
Friday 7th February 2025 11:18 GMT firu toddo
Where are the tapes?
A few years ago now, in a job I'm trying to forget,a wondrous bit of kit was installed in a server room. Although it was in the IT server room it was the responsibility of another department.
Part of the wondrous machine was a SAN. There was also a pair of DAT drives for daily backups. The tapes were stored in the same server room, on top of the DAT drives and were on a seven day cycle. Over the first twelve months the tapes weren't changed, ever!
IT techs were deemed to stupid to do the job and the wondrous machine owners couldn't be arsed.
And this was discovered when a restore was requested. 365 data backups tends to wear out the tapes.......
-
Friday 7th February 2025 11:23 GMT stungebag
Yes, I've taken a backup...
I used to support a few small schools. They all ran a product that made running a Windows domain fairly easy. You could add users, push out applications and so on with just a smigeon of training, and each school generally had a single person whose job it was to administarer the system. It was never their day job. Usually they were teachers, sometimes adminstrators.
The sytems were configured with a tape drive and a daily backup was scheduled. One of the admin's tasks was to manage the tapes, swapping the tape each morning and keeing the old one nice and safe. Except some of the admins were a tad forgetful. I found that at several of my sites the same tape had been left in the drive for months or even years. The admins couldn't see how that could be a problem.
You can lead a horse to water ..
-
Friday 7th February 2025 11:32 GMT Kevin Johnston
Sadly common issue
A company I was at were about to completely replace the backup system as they were told it was not compatible with some new blah blah blah.
I made myself very unpopular by demanding that they retain the original system until they could show the new one was able to recover files from existing backups. 6 weeks and a lot of red faces later it was discovered that while the new system COULD do that they needed extras which they had not purchased.
The various managers who had been shouting at me for costing time/money delaying the project suddenly had a new target in their sights.
-
Friday 7th February 2025 11:36 GMT GeekyOldFart
So horribly familiar...
I can't count the number of places I've been when test restores weren't done.
I admit I've even been guilty of it myself once. I had quite a nice little network at my home office, all of it built from previous-generation hardware purchased cheap from various employers. Along the way I'd also managed to acquire a license for a decent network backup suite and a small broken-but-repairable 5-slot DLT robot.(repairing that thing was pure geeky heaven and generated a significant amount of smug when successful, but that's another story) and a couple of boxes of unused tapes that the company just wanted gone - Full backup of my entire network on the first Sunday of each month and incrementals every Sunday night thereafter. Just needed to remember to swap the tapes out each month.
I did partial test restores to one machine per month, just enough to make sure my backups were ok... Then I acquired another server, set it up, got it into the backup system... and had a brainfart and didn't add it to the script that kicked off after each full backup and selected the next machine on its list, restored the backed-up /etc to /backup-verify/etc and emailed me if the directory contents differed.
Of course, we all know which machine decided to commit spectacular suicide and its backup then proved to be garbage.
-
Friday 7th February 2025 12:14 GMT An_Old_Dog
Red Light
I once was asked to help get data off an IBM System/32 and convert it to something usable by a PC.
I walked up an sat down at the console, never having seen one before, and gave it a good look.
Me: "Excuse me, how long has this red light" (on the empty eight-inch floppy drive) "been on for?"
Secretary: "It's been on as long as I've worked here."
Me: "Which is how long?"
Secretary: "About eleven years."
Me: "Do you have any backups?"
Secretary: "Yes, they're right here in this box." She handed me an open box of ten dust-covered floppy diskettes.
Me: "How long have you been using this box of floppies?"
Secretary: "As long as I've worked here."
-
Monday 10th February 2025 11:57 GMT John Brown (no body)
Re: Red Light
Similar experience here. Called to a site whose 3-drive array had failed. I rock up, take a look and ask how long the two red LEDs have been flashing. "The first one has been flashing since I started here 5 years ago, the other one started flashing the same time the system stopped working".
-
-
Friday 7th February 2025 12:23 GMT keithpeter
Random thoughts
Not IT person but its raining.
1) I used to print out a few key documents on paper for each of the projects/processes I was implicated in. These printouts were initialled and dated and stored in lever arch files in one of those cupboards with a roll down door. Never had to use them to cover an IT failure as the local techs did do their backups and not much went wrong. These contemporary records did come in handy when goalposts decided to go for a stroll however.
2) Tapes that should only be written over 10 times... could some kind of indicator not be devised that turned red on the 10th insertion? Thinking of a mechanical ratchet/escapement thing that gets turned through a certain angle on each insertion and moves a plastic card printed a green/red gradient under a window? Or just even an insertion number and then a stop after 10 so tape won't insert?
-
Friday 7th February 2025 13:00 GMT Sceptic Tank
Re: Random thoughts
This whole "only good for 10 writes" claim seems to be a bit unfounded. I did a bit of goooogling and the more trustworthy sources state that tapes can be rewritten indefinitely, although there will be a little bit of wear and tear. They should last up to 30 years. Tapes are chiefly damaged by incorrect handling or faulty recording equipment. This makes sense because there is little difference between playback and recording and the magnetic media could not care less how many times you changed the orientation of the magnetic field.
-
Friday 7th February 2025 16:57 GMT Anonymous Coward
Re: Random thoughts
" This makes sense because there is little difference between playback and recording and the magnetic media could not care less how many times you changed the orientation of the magnetic field."
False logic: It never was about that: Tapes literally wear out, mechanically.
Tapes are very, very thin and soft plastic and the magnetic coating on them is even thinner. There are a lot of mechanical parts moving tape around in any tape drive. The dust that accumulates in them isn't coming from outside, it comes from the tapes themselves.
DDS especially was bad as it used a rolling drum to read/write tapes, LTO is better.
-
Monday 10th February 2025 13:13 GMT DJohnson
Re: Random thoughts
To amplify your point, allow me to introduce the young'uns to the behavior we called "shoe shining". The DDS and DLT systems in use at my old employer would only read or write data at one speed. If the attached system couldn't keep up the tape would stop, rewind a bit, wait for the buffer to fill (or empty as the case may be), then try to resume where it left off. If you had some systemic bottleneck this could be happening ever minute or so through a multi-hour backup/restore.
So when I read "ten uses" I'm not thinking of ten full passes + ten full rewinds, I envision ten days of the tape running back and forth, starting and stopping hundreds of times because someone left the target on a 10mbps hub...
It was a happy day when I learned LTO drives has variable streaming speeds, and if you stayed in their allowed range it would never exhibit this behavior. I made a neat (to me) script to manage backups to an autoloading LTO4 over a 1gbps iSCSI link that used 'mbuffer' to keep the rate acceptable to all components. If anyone needs to repeat that stunt get in touch, I kept a copy for posterity.
-
-
-
Friday 7th February 2025 17:07 GMT Kevin Johnston
Re: Random thoughts
The simplest way to manage tape use count is to have a human readable reference number on each tape and record that number in a register with brief details on when it was used and a high level of what went on it. This is obviously just for small companies as the big ones will have robots which do all this automatically. While the mechanical method you suggest would be possible it would add a lot of complexity and probably not be backwards-compatible.
An old favourite is to use a 'Tower of Hanoi' type system where a tape is used on a set day of the week for a 3-4 weeks then it rotates to the Friday/Saturday night weekly backup for a couple of weeks before being used for a monthly backup on very long retention. Requires a regular supply of new tapes but that is part of the cost of ensuring data stays available.
-
-
Friday 7th February 2025 12:38 GMT Bebu sa Ware
"Unfortunately, even today, test restores are sadly very uncommon"
It's even worse than that.
When pointing out that, with the proposed backup process, restoration was not even possible†, I could hear the Bell, Book and Candle‡ being grasped in the face of my apparently gross heresy.
The idea of working backwards from the objective is apparently so alien that it appears that the disciplines of Operations Research have largely been forgotten.
I had mistakenly believed that clearly identifying what would need to be restored in the light of various contingencies should guide the processes by which the restoration might be made and those processes in turn ultimately guide the design of the backup process itself.
As for trying to communicate the difference between an archive and a backup - clearly a bridge too far.
I am certain a great deal of so called shadow IT is purely created as an act of self preservation to remedy just such management myopic shortcomings.
(The job you save might be your own but it's also miserably true that messengers suffer an unaccountably large number of gun fatalities even outside the US.)
Even now for me the go to guy is W. Curtis Preston whose "Unix Backup and Recovery" and "Backup and Recovery" are still on my bookshelf augmented by "Modern Data Protection."
† "There's a hole in my bucket, dear Liza"
‡ the anathema is rather theatrical but I am fond of Kim Novak/Jimmy Stewart movie which inspired Bewitched as The Brass Bottle, I Dream of Jeannie
-
Friday 7th February 2025 13:06 GMT Anonymous Coward
Re: "Unfortunately, even today, test restores are sadly very uncommon"
> I am certain a great deal of so called shadow IT is purely created as an act of self preservation to remedy just such management myopic shortcomings.
Yes, I have a dozen cheatsheets of dearly-won information in plain text files. I'm always asked why I don't keep them in a wiki or SharePoint or Confluence or JIRA.
And I then go well what happened to [wiki that shit itself and had no backups] or [SharePoint that shit itself and had no backups] or [Confluence that shit itself and had no backups] or well, you get the idea...
How about JIRA? OK, how about a system that shits on my code snippets or command examples because it thinks it's markup? Hell, today I couldn't send a command to our DBA because Teams kept shitting on the wildcards trying to interpret them as markup. I don't have time to babysit that.
-
Sunday 9th February 2025 21:54 GMT Terry 6
Re: "Unfortunately, even today, test restores are sadly very uncommon"
I am certain a great deal of so called shadow IT is purely created as an act of self preservation to remedy just such management myopic shortcomings.
You betcha. When after long years of nagging my service was given a shared area, then later a proper server, many of my staff made their own back-ups to floppy.
Quite rightly so. It took several more years to get the higher ups to action a backup device, and they never did sort out secure off-site storage for a backup tape.
-
-
Friday 7th February 2025 12:42 GMT Anonymous Coward
My stories
2 of them:
Once, in the 90s, I was introduced to a company (showbiz, erm ...) where the main hub was where we worked and some satellites (Paris intra-muros) where we rarely would go (traffic, weirdos in the site, etc ...)
Upon coming there I was told than on each sat. site, with NO IT people, some good will lady would rotate the local server's tapes daily+weekly+monthly as per process.
Of course, I wanted to check by myself by going on sites. The first site went like this:
She showed me the local server
me: ah good, the daily tape is inside (was ejected after every daily backup)
her: yes, the tape ...
me: where are the others ? 5 daily, 4 weekly etc ...
her: which others ? There is a single tape, and I push it inside back every morning
me: ...
Turned up a single DAT 8 mm tape had been used for daily backups for some years now. And no, this very system never showed up any error (too old techno) and would not even stamp the tapes ! Of course, there was no useable data on THE tape ...
Second one was upon dismantling a big DC. We had a STK 9840 lib full of old tapes + some vault also full of tapes.
Some 10 emails to mgmt to request instructions on what to do with half a ton of tapes, and 0 response later, I decided we'd securely destroyed all that stuff.
5 years later, I received a call from an ex ops team colleague from this company (I had left) asking me: pal, we need to restore data for a legal proceeding, what did you do with all the tapes 5 years ago ?
I guess my honest answer didn't meet his expectations, lol :)
-
Friday 14th February 2025 14:35 GMT CrazyOldCatMan
Re: My stories
some good will lady would rotate the local server's tapes daily+weekly+monthly as per process
My wifes' first job was as a data entry clerk but, because she'd done a NCSC computer course, she got to be the part-time operator of the on-site minicomputer. The majority of the job consisted of swapping the backup disk platter (spin drive down, unlatch drive cover, lift out disk pack, put on shelf, take out the next days pack, insert, close drive cover, spin back up, type command into console to check it's all OK - otherwise phone support company who supplied the computer).
They had a series of 6 backup drive packs (10MB each? Something of that astounding capacity) that backed up all the days transactions and inventory. And one for use over the weekend that she changed to last thing on a Friday (which meant leaving 30 minutes after everyone else - the backup got done early).
Every month she would select a random one to test (same procedure except after the 'verify' step there was a 'test the backup' which essentially copied the data into a set of temporary files then parsed them to make sure they were valid). If she didn't do that step then the support company would send a stern note to her.
This was in the mid 1980s..
-
Friday 7th February 2025 13:06 GMT chivo243
My backups
I worked in a place that had 3 different backup solutions. I never had to restore for users more than a folder or document. I had to restore a few servers, successfully. But one sticks out in my mind... A C level's PA had been working on a document, and some how deleted it, and emptied the trash? Couldn't find the temp file either... She asked "you can restore it?" I said no problem, I'll restore it for you, give me the name. I hit my desk, fire up the backup console, start searching for the name, and nothing found. I called her back, and asked when she created the document, she replied today! There's a bedum tiss moment!
-
Friday 7th February 2025 13:11 GMT that one in the corner
A rod for my own backup
At New Year, I had a think about the pain of restoring our systems at home: a random collection of servers that do whatever we find useful/interesting around the house (from NextCloud to 'Pi BirdNet).
The data and archives are on a NAS, which syncs to a duplicate box for a basic backup. So reasonably happy with that. But restoring an individual one of the machines is a PITA.
Ah Ha! Move them all into VMs, with a common hypervisor (dead easy to install, not "restore", as needed) and then just use the admin tools for the VMs. Oh, and see if the R'Pi's SD card can be replaced by network boot. Don't want to worry about high-availability and live VMs moved from host to host (just a simple home setup - and the random weird peripherals don't auto-unplug themselves, so not worth the bother).
Second week of Feb: oh, if only I was running a Data Centre, or at least a "Home Lab" that I pretended was a DC! All the hypervisors (Proxmox, xcp-ng are the bestest candidates so far, 'nuff said) are geared up to single-click manage all the Big Boy Heavy Lifting, but I don't *want* to merge everything into one pool! The boxes are all "whatever I got my hands on that year", no two the same...
I *have* made progress: I have now added two new-to-me PCs to the pile, so that I can trash those learning how (please be "how") this can work!
But when it *is* working, I *will* be able to easily make backups of the VMs, can easily check they'll restore and run.
Won't I? Please.
-
Friday 7th February 2025 13:36 GMT Anonymous Coward
I remember visiting a warehousing site where the stock control ran on a PDP system managed by a single person with IT support from a larger site in the next town..He had a detailed set of backup instructions (on paper) to follow, together with a log of when he ran each backup, the tape id, and the tape location in one of two large metal cupboards with roll-up doors. Also a schedule of when the 2400ft tapes shoulld be physically rotated in the cupboards ( this was a thing at the time, I was never convinced it was necessary ).
I asked about restores, and he explained that the backups were just for the stock data and were actually written in an ICL format so they could be restored onto an ICL mainframe at the larger site, So when was the last time this was checked ? Oh, well he wasn't sure, but it couldn't have been recently because the other site had got rid of their mainframe a couple of years ago.
-
Friday 7th February 2025 14:31 GMT tweell
Brought in to look at a Novell server that had lost a drive from its RAID 5 array. Replaced the drive, set it up to rebuild, and during the rebuild a second drive crashed. Well then - replaced that drive, but no rebuild this time. After reformatting and loading the Novell system back on, I went to run the backup from the DAT tape they had. "Um, this is a incremental backup, where's the full backup? I need those tapes first, then I can use the incremental." "This is the only tape we have."
Turns out they'd wiped the tape because it had gotten full, and then continued with incremental backups on that tape. Oops.
-
Friday 7th February 2025 17:54 GMT CA_Diver
So many examples
The customer who chronically "forgot" to load her site's tapes. I escalated to my manager, who wrote letters to her manager. Until the day they lost a drive, and had to go back a week to find a backup. Sometimes you have to burn your fingers to learn.
The customer who loaded their tapes as documented. Sometime in the last few months, the tape drive failed, the support staff who failed to check logs screamed for help. By some miracle, a year old tape had been write protected, the last time a restore was required. That tape, still in rotation, let us recover the system, and use transaction files to bring data to current day. Two weeks later, called by support staff trying to make a tape to migrate data for a new system. Backup was hung, holding tape drive allocated. Some people won't learn even when their fingers are burned.
I completed a risk review on a vendor who would be hosting company data. They used a third party hosting site to stage the application, and used a disk to disk backups to the next rack over. "What happens when an aircraft falls on your data center?" Made Purchasing require remote backups for our data. Annoyed vendor who believed data center would always be there.
-
Friday 7th February 2025 18:33 GMT ColinPa
Backups mean you take them off site
Someone reminded me that there was a University department in Terrapin sheds out in the car park.
They regularly took backups, and left them in a box in the cupboard.
They had a fire, and building burn down, including the cupboard and the tapes. They lost over 10 years worth of work.
-
Sunday 9th February 2025 16:46 GMT gnasher729
Re: Backups mean you take them off site
Small company I worked for bought a fireproof safe very cheaply. The reason it was so cheap was that the lock was broken. Most people think a safe with broken lock is kind of useless.
But you could open the door, put your backup tapes inside, close the door, not lock it, and the tapes were in a nice fire proof place.
-
-
Friday 7th February 2025 18:42 GMT Confused of Tadley
Regularly testing the restore process can be a bit inconvenient and time consuming, but does have its uses.
I heard a story from a consultant about a bank that had trouble with a restore. It was a mainframe shop, and they called in the consultant after the in-house people could not get a particular file to restore that was desperately needed.
The bank showed the consultant the script, on one of those old 24x80 monitors. Indeed the file name was in the backup script, somwhere on page 3, so it should be on the tape and it should be possible to restore it.
The consultant had another look. At the foot of page 1 of the list of files, there was a /*. Mainframe folk will know this as an end-of-file mark. All of the files listed after this had the status of comments, The consultant then had to explain to the bank exactly what 'non-recoverable' meant. The good news was she still got paid.
-
Friday 7th February 2025 20:02 GMT Anonymous Coward
Many years ago now I word for a company that had a number of backups for a UNIX system. The backups were scripted and would go through a list of areas to copy. The script would would copy each individual area to tape, then read the TOC back from the tape and compare the two lists of files. Then it would fast forward the tape and do the next area.
There was only one issue. The device used was slightly incorrect[1] so instead of _leaving_ the tape fast forwarded, it would rewind again and so the next area to be backed up would start from the beginning of the tape and overwrite everything and then the next area would do the same! The only thing on the tape would be the _last_ area to be backed up.
This wasn't noticed for _years_ until unfortunately someone needed something off one of the later "areas" and there wasn't anything there!
[1] It was missing an "n" off the end of the device name so for example instead of /dev/ios0/rstape003chn it was /dev/ios0/rstape003ch
-
-
Tuesday 11th February 2025 09:42 GMT Anonymous Coward
Re: Let's Thank Various Unix Vendors
I'm somewhat torn because while they are overly-complex they are consistent and logical. That device path gives you which SCSI bus it hangs off, what LUN its on and allows you to specify compression, high density and rewind (or not).
Less so for AIX however...
-
-
-
Friday 7th February 2025 21:39 GMT Anonymous Coward
DR ? DR ? We don't need no stinking DR !
Long ago my primary responsibility in IT was a 3 year refresh of file & print servers in 100+ remote offices when the hardware leases were up. Like clockwork every 3 years (had done 4 refreshes when this happened). In fact when a refresh was done, almost immediately I would start thinking about the next refresh. Took a lot of planning and Disaster Recovery was always a main concern because these systems were very remote. I was going nuts trying to come up with a reliable backup and recovery solution that could be handled easily by inexperienced remote staff. Had a middle manager question me on why I was bothering with any disaster plan at all. "We don't need a DR plan as modern systems are so reliable". Idiot. I related the stories of the office that got flooded with 3 feet of water, the office that had a fire and the office that used ONE tape for 2 years before the HD controller failed and fried the RAID array drives. I got overruled and we shipped the first batch of servers with a crap backup solution and NO DR. Fortunately I was outsourced before any blew up, but I did hear through the grapevine one did crash. Never did hear what they did about it :)
-
Friday 7th February 2025 22:24 GMT DS999
Its so easy to verify too
Make a script to choose a few random files to restore to a test directory, and do a checksum compare with the actual files on disk.
Yes it is more work to verify a full restore can be done since you won't have that amount of disk space lying around, but even if you only do those as part of your DR exercises you can still have that daily test script total up the amount of disk space in each filesystem being backed up and comparing it with what your backup reports say was backed up. If it is within a few percent then at least you know you're backing up SOMETHING of appropriate magnitude.
In my experience the faults with backups isn't even "nothing is backed up" or "backups quit prematurely due to lack of space", but "oh I didn't know that directory was supposed to be backed up!" That one can't be fixed with scripts or even full system restore tests (unless you have people try to use the fully restored system long enough that they would notice that missing directory was not restored because it wasn't backed up) These "oops" cases are what gets you every time, and most often it is because the user assumes that EVERYTHING is getting backed up. Even some random directory they created under /usr or C:
-
Saturday 8th February 2025 00:36 GMT cosmodrome
Automated backups, verified, tested and...
...worst expectable desaster after an actual headcrash. How did I do it? Verified and tested my automated backups, tweaked comfiguration files until everything was perfect and then - relied on my perfect, tried, rock solid and daily performed backups. For too long. So I didn't realise that my backups had all been exactly 0 bytes long for six months until a head crash took down a complete volume group. I had changed compression algorithms to one I firmly believed to be supported. Which it wasn't. Unfortunately the error within the compression algo did not escalate to the main backup process and I got my daily "backup completed successfully" notice. No need to look into the details ("0 files and 0 directories backed up...") because everything was fine, wasn't it?
-
Saturday 8th February 2025 15:03 GMT Anonymous Coward
It’s all very well saying…
…a backup is only as good as the restore and that is absolutely true, but when you’re running the backups for a large enterprise (as I do) of 1000s of servers and NAS, there only so much test restoring you can do. You can’t praxrice DR for an entire application if there are multiple links between other applications.
Realistically you have to pick and choose some random datasets for restore out of place and test those have recovered successfully. The rest you have to put faith in the vendor, the software and take steps to ensure the storage platform is the right type and properly tested and maintained. In an old job years ago I got badly burned by the well-known backup software claiming to have written to DDS but when it came to restore, somehow they were not readable and we had to send them to Vogon for specialist recovery. They succeeded at significant cost, and we changed backup software quickly.
-
Saturday 8th February 2025 15:13 GMT old_n_grey
That reminds me ...
of my last role as a beancounter. The company I had joined ran their accounts/SOP on what was referred to as a network. However, this was in the mid-late 1980s and server was an IBM XT, albeit with an external 28MB disk with integrated tape drive, also on the network was an IBM PC and a Compaq portable. I soon discovered that if one person was using the system, no other bugger could! It didn't take long for me to threaten to throw the server out of the window if we didn't upgrade to a proper multi-user system. Happily the software house informed me that they also did a Xenix version that would work fine on the XT. I also bought a number of VDUs so all the staff could access the system. Come the upgrade day and the s/w house took our data and converted it for use on the new version.
It all went really well until a coupe of months or so later when the system decided that our data was bad and refused to work. A quick phone call to the support team and quick as a flash came their response - restore to backup. Happily we backed up every night so I didn't anticipate any issues. Being an inquisitive chap, I took a quick look at the backup script and noticed that the very first thing it did was to delete all existing data files. And then, for some reason I checked the previous night's backup tape. We backed up using tar with the -v so could watch the file names scroll up the screen. So I was a little surprised when I listed what was on the tape to find the answer was - nothing! I tried the previous night's tape. Again nothing. In fact none of the tapes we had contained a single bit of data. Had I just run the restore script I would have deleted the only copy of our data (I don't recall whether undelete was a thing in Xenix back then).
After some unsatisfactory telephone support I told them I was driving down with the server. Not only did they fix the data, they also identified some kind of device conflict that Windows didn't mind but that Xenix objected to. So a happy ending, although I was disappointed not ot be given a huge bonus for saving the business.
-
Saturday 8th February 2025 20:51 GMT An_Old_Dog
Hardware Problems Ignored by MS-DOS and MS-Windows
There are many hardware failures and misconfigurations which are ignored by MS-DOS and MS-Windows which *nix/Linux will not.
I had been given the assignment to fix an AST 286 running SCO Xenix, whose symptoms were random speed changes and random reboots. Everything worked fine under MS-DOS. What I had to do to fix it:
* Change jumpers so that each card got a unique device port, DMA channel (if used), and interrupt assignment (if used).
* Replace clone video card with a genuine Western Digital Paradise VGA card.
* Replace Diamond Flower Industries clone extended memory card with a genuine AST extended memory card.
-
Saturday 8th February 2025 21:48 GMT Jou (Mxyzptlk)
Re: Hardware Problems Ignored by MS-DOS and MS-Windows
Sounds like the classic problems which non-multitasking OSes can get away with. Interrupt and DMA sharing is no problem, if those are not talked to at the same time (parport and sound card are the best known example). For the video card and memory card probably the same: Shared resources in case of the video card, and for the memory cards inability to handle some access pattern which never can happen under single-tasking OS like DOS. Man, you woke up old memories here! (Edit: Windows for 286 cannot really be called multitasking capable, and handled sound card and parport in a way interrupt sharing worked - i.e. not really at the same time)
Since PCI made all that automatic in a very good way todays variant of "shared medium problem" is different:
I hear more and more often from relatives: "The wireless headset works good in Teamspeak until i start a game, and then its cracky like a bad internet connection. But my Internet is fine! Do you know what that could be?".
Solution most of the time: "Yeah, move the wireless dongle for the headset from the back of the PC to the front, line-of-sight between those two."
"What? Huh? For real? That cannot be!"
"Oh, it can. Just do it and you will hear."
-
-
-
Saturday 8th February 2025 15:38 GMT druck
The boss thought he knew everything
Small company back in the 90s, one desktop machine used as the network file server and Exchange email server, and a boss who thought he knew everything. Being 90s, the machine enviably died, but the boss wasn't worried as he fastidiously backed up every week, never failing to rotate the tapes and keep them securely in the fire safe. The restore claimed success, and all the files (except anything newer than a few days) were back, but Exchange wasn't happy despite it's database file being present. It was only when he consulted the backup program's logs for the very first time, that he discovered the warning message that the Exchange database couldn't be backup up as the running process had it locked, and very helpfully it had stored a zero byte file on the backup every week for the last two years.
At that point he asked for help, but I was unable to suggest anything other than reconvening to the pub to drown his sorrows.
-
-
Sunday 9th February 2025 10:42 GMT Jou (Mxyzptlk)
Re: Verification
Surprised me: Used it for such a long time, but now I understand why LTO is so reliable and, practically, the only one survived. There is a nice animation about that. Didn't know those heads are always doubled until now, including being able to switch the role. Better analogue tape recorders use(d) the same technique so you could tune input vs. output to really match, though they did not switch their role.
-
-
Sunday 9th February 2025 11:15 GMT Nematode
I used to do onsite sopsys updates on an MTOS-based system. First thing we had to do of course was to back up (yes, tape unfortunately) the current opsys and config. Then test the backup to a spare hard drive. One site took 4 attempts to get a successful restore. Tapes were not the only thing dodgy about that system. I started that job with haor. Left 4 years later with distinctly less hair than before.
-
Sunday 9th February 2025 17:24 GMT TWB
LTO wear out
Where I used work we had a large 10,000 tape LTOx library with 8 or 16 drives (can't remember) and a very fast robot to swap the tapes in and out. It was "commissioned" to copy video files to and from video servers. The video servers would prioritise realtime operations such as playing or recording video clips stored on their RAID sets. File transfers were therefore low priority and so could be relatively slow. After a few months we were getting several LTO tapes wearing out. After lots of investigation - because this was new technology to us broadcast engineers - we found out that the LTO drives did not like running "slowly" - they wanted a sustained data feed of around 270Mbps but sometimes the feed to was them as little as 25Mbps - depending on how busy the video server was at the time with realtime stuff. The LTO drives were effectively having to run up to speed record a short burst of data then slowdown, wind back and do the next burst and so on. This also meant that the overall copy speed was even lower. The winding back and forth was wearing the tapes. This was also an issue when copying the file(s) back to the video server(s)
The solution was to buy a load of intermediate servers which cached the files and allowed sustained fast speed to the LTO or a slower feed to the video servers.
-
Monday 10th February 2025 09:39 GMT koborn
Many moons ago I was running a shop with a motley collection of Intel Unices. One of them, running Interactive SVR3, had the tape drive.
I was very lucky when I found that the backup script would gaily report success when there was no tape in the drive.
Mind you, the backups were *really* fast.
-
Monday 10th February 2025 12:44 GMT anthonyhegedus
Just one teensy weensy thing...
My colleague and I once - in the 2000s - went to see a potential new customer. It was a small company with a handful of PCs and they had a server with a backup drive. You'll be glad to hear that unlike in the main story, they had not only cleaned the drive regularly, but actually bought new tapes every two years.
They explained that they had a system with five daily backups, and several weekly backups, and monthly backups. This had been explained to them by their previous IT support company, who had set it up. They religiously inserted the correct tape every night, let the thing do its backup, and then swapped it the next evening. They even had a system for taking tapes off site.
This was all well and good except for one thing. One very important thing. In the backup software that they used, it was full of profiles and schedules for all this - great. And every single profile did not have the "enabled" box ticked - not so great.
So despite their adherence to the backup policy, what with replacing the tapes with news ones, head cleaning and so on, not one single backup had ever been done. We checked a few of the tapes to be sure - and yes, they were all blank.
We pointed this out to the customer, but they clearly didn't understand the significance of this, because we didn't get the job.
-
Monday 10th February 2025 13:37 GMT TBi
Very old Nixdorf backups
When I was very young I was hired in to "manage" a data centre of a small oil company. We had every sort of computer including some inherited Nixdorfs with some large tape reels. Every evening a small team used to come in to do the backups etc. They installed the tapes and then went off to enjoy their evening meal, paid for by us because the backups took so long. After some time they took the tapes off the system and stored them somewhere. Once I happened to be in the computer room and noted that the tapes were making a lot of backwards and forwards movements as if the tape drive was having problems. When the backup was finished I took the tape and had a look at it. It was very nearly transparent. Thus after persuading the computer centre boss to buy some new ones, we were able to do the backups in such a short period of time that no evening meal was required. The hired in group was not pleased.
-
Monday 10th February 2025 17:34 GMT Anonymous Coward
NHS Organisations required two data centres
Of course. So the outsourcers assured several hospitals in the South that their medical and admin records would be doubled up on two data centres.
They were.
One centre was in one wing of the building, the other was in the other building, the building was by the perimeter fence of the very large Buncefield oil depot.
One week there was a sudden interruption in services.
Easily visible from orbit.
More recently GPs, who used to keep records and automation on their own servers in their own buildings, and take backups generally seriously, have been pressed or persuaded to rely on systems shared between many practices, and held centrally.
One large system is now opened, I gather, by United Healthcare in the USA, whose business record has been recently discussed, after an American is alleged to have shot the CEO dead.
Another generated sufficient profit from contracts during the previous government for its owner to be able to support the righter end of the Party of Government greatly.
-
Monday 10th February 2025 20:12 GMT Andrew Scott
years of no backup
was asked to restore a system that had been backed up on tape. Nothing had been backed up for 3 years i believe. the backup logs had been complaining that the backup had failed but no one had been looking at them. When i first started working on a file server i was in charge of backups. 80 meg file server being backed up to a 40 meg disk drive using a dos backup system. it fit because i only backed up user files, the directory system and account information. backup system created a file containing filename translation permissions and directory tree. it got so big that fragmentation slowed things to an almost unusable rate. never had to restore from it fortunately. Seen a lot of people use small usb disks and flash drives. usually after they realize the backup hasn't been working for a while. usually too late to do anything. check the backup process, logs etc. if you don't, you'll regret it.
-
Monday 10th February 2025 20:30 GMT Jou (Mxyzptlk)
Re: years of no backup
Sounds like a bigger cluster case! If my calc is right: FAT12 with 16 KB Cluster. Optionally 32 KB cluster if 1MB = 2^20 instead of 1 million bytes. If you know you only store one file that is the way to get fragmentation down :D.
(I know 'cause I once inherited an Olivetti PC with MS-DOS 2.11 and 20 MB HDD, which was formatted FAT12 to my surprise - discovered by using interlnk/intersrv over parport and checking from a DOS 6.22 PC with DiskEdit)
-
-
Tuesday 11th February 2025 07:41 GMT Snowcat
Reminds me of a company I worked for. They had 2 different teams, one called backup, one called restore...
Think you may guess the next, the 2 didn't communicate at all. Even worse if another team wanted to have backups, they would separately have to ask both the backup team and the restore team.
If you forgot restore, you would have backups but you couldn't know if your backups were any good, even worse if you needed your backups restored, restore wouldn't do it because restore was never requested so they wouldn't do it. if you had requested restore but forgot backup, same thing, they would do a restore, but since they had no backups to restore they would do nothing.
That they are 2 teams shouldn't have been a problem, if only they communicated or the rule would have been, if you request one of the other, the other one would be automatically requested. Or if only 1 of them was requested their request would be denied...
And then to consider this was and still is one of the biggest system integrators in Europe...
-
Friday 14th February 2025 14:28 GMT CrazyOldCatMan
We once had..
.. an operator sacked for attaching an untested DASD string to the live mainframe. It *looked* like it was working but, absent proper read/write testing, didn't actually work. Sure data looked like it was written successfully but, once out of the cache, was gone forever.
Which meant that for instant transactions, everything worked. But data that was written for later use was gone, gone, gone - something that would have showed up in testing as the read/write test deliberately used a dataset that was bigger than the cache (we are talking about 4k dick blocks here so it took a fair few to overflow the cache!).
He'd turned it on, got green lights and just connected it to live.
2 days later the brown stuff hits the air impeller as vital data couldn't be retrieved. He gets to accept an instant P45 (gross misconduct), we get egg on our faces contacting all the users to request that they re-enter all the data for the last two days. Lost us a lot of goodwill and reputation!