"data migration of a network drive caused [...] deletion"
It's a little known fact that copying data rarely destroys it. Maybe someone should have told them the difference betwen copying and moving.
A bungled data migration of a network drive caused the deletion of 22 terabytes of information from a US police force's systems – including case files in a murder trial, according to local reports. Dallas Police Department confessed to the information blunder last week, revealing in a statement that a data migration exercise …
"A move to a different drive was actually a copy, with no delete of the old file."
No, a move to a different volume is copy then delete the original. If you catch it before the newly freed space gets overwritten, you might be able to undelete it, although that's a slightly more complex operation than the old FATx drives were. Left click and drag is copy only and you end up with two copies. Right click and drag, you get a context menu that lets you chose copy or move.
robocopy \\serverX\Share \\serverY\Share /MIR
Oops \\serverY\Share should have been the source! Now \\serverX\Share is empty!
I've seen this one more times than I want to remember as I'm always the guy who has to pull out the backups and hope they worked!
I suspect it was someone who thought having copies = having a backup good enough. Problem is, if you are doing a migration, it's a copy-then-delete. If you don't delete the original, it's not a migration, it's adding capacity.
Sounds like DPD IT learned a valuable lesson on why offline backups are necessary. It's not just about protecting from bad guys, it's also about preventing showing up on a future edition of the "Who, Me?" column.
If you don't delete the original, it's not a migration, it's adding capacity.
Depends on the nature of the migration. I wouldn't expect a DB migration (eg. MySQL -> Postgress) to impose a deletion of source data.
But we can defo surmise that:
1. They didn't follow back up procedures prior to migrating (ie. full backup with verification);
2. They didn't verify the target platform before flattening the source;
3. They were negligant in reporting the clusterfuck within a reasonable timeframe.
or 4: They assumed that 1-3 were good enough.
What I hinted at re: offline backups was that even if you do 1-3, and you test in a test environment, if your backups are not also copied offline, they're still vulnerable to "who, me?"
I've seen at least 4 examples in my IT career where companies thought having an off-site replication was a backup. Then either a hardware bug or a human error resulted in corruption or deletion of both online copies.
To put it through your steps 1-3:
1. Did we replicate the data (dallas3) to the new storage medium (dallas2)? Check.
2. Is the new copy online and did we run 100 tests to verify that it's online, performant, and fit for purpose? Yep, we're 100% certain the new system is perfect.
3...
Well, I _intended_ to sensibly type 'flatten Dallas2' and then wait for a second set of eyes to confirm my command, what actually happened was that my fat fingers bumped 3 and enter at the same time because they're right there together on the keyboard.
Whelp, there goes that data, and I just reported it to you, my boss, right away - so you could sit it for four months.
A backup is not a mirror site. The back up is performed. The backup is separated from the source then verified on a discrete system, then taken off line. (Put into a fireproof safe off-site and guarded by <insert superheroes of choice>.)
The migration is performed and the target is verified against the source.
Only when the new system is shown to be working is the source even considered for zeroing. IMO I'd keep the source for as long as is practical (to the heat death of the universe).
Taking the backup off-line prevents the fat fingered cock-up, as you say "if your backups are not also copied offline" - so keeping the BU on-line is numptyville.
they're still vulnerable to "who, me?" <-- I like that :)
One of the interesting details is just how many people (and I include a lot of highly skilled professionals in that) believe that having a copy is the same as a backup. I have argued this time and time again. Replication is the worst offender because there is this notion that it is as close to real time as you can get.
Delete the source and guess what, the replica magically vaporises as well. The same goes for "copies" hosted in the cloud. If there is any for of synchronisation or access to the data using the same tools are the primary copy then it is vulnerable.
Now as data volumes increase backups become more challenging and expensive so manglement err on the side of caution and run with the path of least resistance, protect their budgets and compromise on the backup they believe costs too much and well, we have a snapshot, replication or synchronised copy in the cloud so we can do without.
Agreed. Snapshots, RAID, replication etc are great but they don’t replace proper backups*. I’d argue at a minimum the one thing you need is backups, after that all the other things are extras to help reduce the chance you’ll ever need to use that backup to restore everything and minimise downtime, but they’re never a replacement.
*If you’re doing a daily “backup” but can only recover the most recent version rather than a few days/weeks ago before an issue happened then it’s not a backup, it’s just a copy, whether it’s offsite or not… eg most cloud storage systems aren’t backups.
Sorry, the budget we have is allocated to buy surveillance equipment. Priorities demand that we keep the people safe from each other first. That is why we need to listen, peek and poke into all corners physical and digital without you knowing. How we use the data and what we keep or save is not part of our department's budget and therefore not our concern. Please redirect your future inquiries to the department responsible.
If you think any NAS device you can purchase for 4K is suitable for mission critical data availability that can literally mean life or death for a suspect, you know, you might have a future at DPD IT!
Edit: I know you weren't making a serious proposal, but I had to use your comment, it was just like you teed that straight line just for me.
but I had to use your comment
Absolutely fine by me :)
...mission critical...
I wanted to mention the significance of this, but I'll condense it to: what fucking muppets!
Following your original post's title "This is why we still use tape libraries" then yeah, if it's critical data then use multiple media stored in multiple locations.
Here, I think we deserve one or two of these --->
You have the right to remain silent. Anything you say can be used against you in court. You have the right to talk to a lawyer for advice before we ask you any questions. You have the right to have a lawyer with you during questioning. If you cannot afford a lawyer, one will be appointed for you before any questioning if you wish. If you decide to answer questions now without a lawyer present, you have the right to stop answering at any time.
>You have the right to remain silent.
Remember you can't just remain silent you have to state that you are remaining silent !
>You have the right to talk to a lawyer for advice before we ask you any questions.
And don't say "I want a lawyer, dog" because the court will rule that you were demanding a lawyer of the species "Canis Lupus Familiaris" and since those don't exist - you weren't actually asking for a lawyer and so they can question you without one.
" ... case files in a murder trial"
They need to check the names of some other suspects due for trial. Like a buddy of someone in IT.
It wouldn't be the first time a convenient deletion occured.
On the surface, it sounds like quite an impressive fuck-up.
However, we don't know the total size of the data migrated. We hear of 22TB lost, with 14 recovered. Does that mean that 22TB was the total of the data being migrated (in which case 100% fuck-up - well done!), or was it a small part of the total volume (1% fuck-up would look a lot better)?
Agreed. There are unknowns. But if they could store the data in the first place then there must be the ability to store the same data elsewhere. I really cannot imagine that budget would impede this considering the nature of the data.
As an aside, I remember looking at holographic storage decades ago (IIRC gallium arsenide & lithium niobate). I calculated that a cubic metre of the stuff would store all data ever generated in the universe. It was a bit of a bitch to actually get the data in there at the time though. Anyone know how that's progressing?
Back to reality though, as Cybersaber says above, tape is good, cheap, and it's never ending (ish).
Reading the PDF the total migration is unspecified however 22TB of data was deleted over 6 days. Now if we assume a general level of incompetence here (not unjustified given the outcome) some sort of process was setup to copy the data prior to the loss. This ran and completed but nobody checked that the source and destination volumes were roughly the correct size. Now even at petabyte scale 22TB would be noticeable but It is more likely to be in terabytes and would be even more obvious.
Maybe they where running multiple jobs in parallel but the tool was running in a "simulation" mode for some jobs.
Maybe the storage was using archiving and they just copied the stubs over for the older files rather than the file itself. This would make the file count correct, not the size.
Jobs are left running and are checked, all jobs have completed, all is well, new copy live, delete the original. The problem comes to light when someone tries to access a file, 6 months later, and whoops......
Backups of older data are maybe overwritten or aged so there is your 14TB before the 20th July lost. It is all conveniently close to 12 months before the date the error was discovered.
Just musing as a backup & storage specialist at petabyte scale.....
Just having tape backups off-site does not constitute a complete restoration process. You need not only to have the capability to restore files form those tapes, but to do it without damaging said back-ups.
I heard somewhere of an organisation which did all the right things regarding creating and storing backups, but messed up the restore process with a tape drive that damaged the FAT (or whatever it is at the start of a 1/4" taped cassette that says what is on the tape and where it is) making them pretty much illegible.
Cynically, I do wonder whether there might be a touch of 'Yes Minister' here. Now that we cannot blame floods for destroying paper records, we have to rely on electronic means to get rid of unfortunate information.)
... and that's why every tape cartridge system I've ever seen or used in the past 25-30 years (DAT, DLT, LTO, Travan/QIC, 8mm, and a few exotics) has a write protect flag on the cartridge. All it does it tell the drive that the tape is supposed to be read only, though, so there is a possibility that something got fouled up in the process.
@J.Cook
----
All it does it tell the drive that the tape is supposed to be read only, though,
---
I'd love to know when that particular cock-up occurred. Old-school mag tape drives (at least from IBM, like the 729 and even the budget 7330) had a groove in the back of the reel, which could be plugged with a "Write Ring" if one intended to allow that reel to be written. "No Ring, No Write". It was not advisory, but actually was needed to apply power to the write head drivers.
In the (two) installations I was familiar with, the ring was _ALWAYS_ removed as the tape was taken off the drive and put back in storage, only to be added when a tape was mounted and the "run sheet" specified it. If the Boss caught you carrying around a reel with a write-ring, you were due for an ear-roasting.
My recollection of 8-inch floppies is similar. At least some drives had the "Write enable" sensor physically control ability to write. Later 5.25inch drives did not (IIRC), and switched from "put the little sticky-tab on if you want to write" to "put the little sticky-tab on if you want to prevent writing, and pray it doesn't fall off on its own". I assume this attitude extended to also making the sensor merely advisory.
So, the movement from write protection to "pretty please don't erase my data" probably started in the early 1970s
Epstein files are among all those missing,,, kinda like the Enron case years ago, all the files (no backups) were in the basement of building #7 (that supposedly had both Bu__ and Cli___ named as), never mind, its in the past, like every case that won't get prosecuted in Dallas...
If you were looking for a quirky angle on this, the 8TB of data covered anything pre-July 2020, that's an eye opening date range for US cops to lose data.
It is curious that Democrat leadership lost all the data, from when the police were battling & processing both COVID & the Floyd riot arrests. When there was err... likely to be 1000s of cases of unfortunate behaviour, from both police and protestors, clogging up the court system, while they could not physically attend court due to COVID.
There is a cynical part of me, that wonders if it was a tactical deletion, that solves a number of problems and "wipes the slate clean". Dallas is a strong Democrat safehold within a notoriously Republican state and virtually all court justice & police leaders are Democrats.
Ironically, apparently they lost all the data when copying the server from a cloud solution, back onto a physical server in their own building. OOOOPSIE! Apparently they still have most data from "serious offenders" (including criminal charges, drunk driving, avoidance of arrests, and illegal possession of firearms). But they lost most "low level offences" and I have no idea how they lost the files for a murder case.
When I hear 'data migration', I assume the old drives and data are going to remain as they are and be copied to a new location. I have never in my career been involved in a migration that did not leave the original data intact, because you always want to be able to roll back if things go pear shaped.
Once the data migration is complete and verified, you have the option of repurposing or decommisioning the old system and its data. And, of course, there are still backups, right? Right?
It would be interesting to find out what their process was.
Speaking from bitter experience and a stuffup by usually competent TLA (3 letter acronym) firm staff, a copy can delete source if the appropriate flags are not included in command. Nearly lost a large FedGov production database in a test of migration process when "special" software was used instead of something common, well tested, mature etc. It deleted the source files and volume (AIX virtual disk) and only AIXs ability to reconstruct volumes if one knew the physical disks allowed the destroyed volume to be regenerated. An fsck on the volume returned the files in a stable state. No data lost AFAIK, but the worst early morning I ever had.
We could have restored from tape, but it would be a half day of new processing lost, compensation to clients, interrogations by mostly incompetent manglement focused on managing by screaming at tired techies etc. I really began to prefer AIX over its competitors after that, despite its quirks. And yes, I know about rsync flag to delete source. Software used was something I have never heard of before or since that seemed to claim the same advantages of rsync. In an earlier life I had used rsync to work around buggy firmware in Sun network cards when migrating between storage networks. It worked faster than anything else I tried. Have a few brews of choice, Tridge, I never got a chance to thank you. In short, it would not be surprising if a similar stuffup to happened. Insufficient testing as root cause.
For a migration of the size described, we always used third party file utilities designed for the purpose that allowed an actual copy, a comparion after the copy, and an update for files that may have have been created or changed in the meantime.
But as I said before, I'd love to know what their process was. It seems to have gone wrong in so many ways.
Nonono. It was an understandable mistake.
It seems like when you buy cheap thumb drives from China they sometimes have reject chips in them and code that simply overwrites data to pretend that eg 8TB has been saved when in fact 13 or so bytes have.
Could happen to anyone.
Just read the Amazon reviews.