Clearly not tested correctly, perhaps the Queen of Carnage was in charge?
Or did anyone state that your data isn't backed up until you've done a restore of all data?
An "error" in a "standard housekeeping process" on the UK’s controversial Police National Computer (PNC) database has led to the deletion of more than 150,000 DNA, fingerprint and other records, the Home Office has confirmed. The PNC - the national law enforcement DB that holds personal info on people arrested by the police as …
"Or did anyone state that your data isn't backed up until you've done a restore of all data?"
Surely it'd take a while to get specific data off a backup though? I mean, your standard backups should also be purged of irrelevant data under GDPR (remember the conversations on this forums about that?) so maybe you'd have the data on tape as part of an archive. In that case you have to restore the whole archive backup onto a separate computer (as your usual computers are currently being used...) and then retrieve the individual files there.
"GDPR has specific exemptions for law enforcement."
Certainly you don't need consent (obviously) to process PII, but I thought they still couldn't do whatever they fancied, and indeed they were deleting this information off the PNC database because of GDPR (maybe it's a slightly different law then?).
They were deleting after three years*. That's not GDPR. That's other regulations. Specifically, it appears to be from the Protection of Freedoms Act 2012. At least the biometric information is covered there. That is what the article talked about, but there's other information the police get, such as full images of computers and phones which they like to extort out of victims for reasons I don't understand. I'm not sure where that's stored or which laws the police use to set the data retention policy for that.
*Well, they claim to delete after three years.
Yep, there is specific legislation that describes in detail what data the police have to delete and after what time period. It is nothing to do with GDPR.
Personal [nominal] data depends on a number of factors as to if/when it can be deleted. Data on category 1/2 offenders cannot be deleted. Data on offenders in categories 3 and 4 is deleted after 10 years (IIRC).
Unsolved crimes obviously have no offenders. Nominals may still be linked with these as suspects, witnesses, victims and so on. Similar rules apply as for offenders, the data cannot be deleted for category 1/2 crimes.
Most police systems follow a POLE model, people, objects, location, events. Personal data is not limited to people. Locations can be classed as personal, eg. home addresses of nominals or non-personal, eg. crime scenes.
Objects can also be in scope. These can include items held as evidence and as such, deleted personal data can trigger destruction of physical evidence, warehouse records and so on. This is usually only an issue for object data held by individual police forces and not on the PNC.
The seriousness of this incident is getting overhyped by the press. Whilst crimes occurring across force borders will not be easily linked, it is completely false to suggest that criminals will become unknown by the police, their fingerprints no longer available. Data is held in a multitude of other systems by police forces, some off the shelf, some custom built. There is some level of automation to transfer data from some of the packaged systems to the PNC but , to my knowledge, there is no linking of data for deletion to/from the PNC. Each force is responsible for deleting its own data, and the PNC is managed separately. The deleted data will still exist in the original source systems. Dependent on which system, it may be possible to identify the data in the force systems that was transferred into the PNC and is no longer there. This may be an option if other attempts to restore the data are unsuccessful.
Guidance from the ICO suggests there’s little risk to retaining the data on backups as long as it is just used for backups and will be expired on a defined schedule https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/right-to-erasure/#ib5
"Guidance from the ICO suggests there’s little risk to retaining the data on backups"
There's a hazard that nobody seems to have allowed for. If records have been deleted as a result of data subjects exercising their right of erasure, and this has been done since the last backup, in event of a restore from that backup the deleted records will re-appear. In order to be fully compliant (is there really any other kind?), an organisation will need a means of identifying those records and purging them from the restored data set.
One wonders if marketing groups who get such requests are periodically restoring old tapes and not reprocessing the deletions - I'm seeing a bunch of such stuff reappear months/weeks/years after having unsubscribed....
It makes a nice excuse to give to the ICO when caught, doesn't it?
Databases are good at replaying transaction logs onto backups, it's true. They're not so great at retrieving those transaction logs from a room full of smoke particles if your server has gone up in flames though, which is one of the purposes of backups.
Admittedly, if you want that sort of level of protection, what you need is a mirror of your server, and log shipping to keep it in synch. Again, not so useful if your DR mirror has gone up in flames because it was situated in Buncefield. This does, of course beg the question of why anyone would situate their DR facility next to something as potentially explodey as an oil refinery.
Yes, there are typically transaction logs, but you still need business processes to [a] endure they applied, [b] to verify that they've done the job, and [c] to generate evidence of success.
Commonly no such processes take place. This is comparable to the common situation where network and AV logs are gathered, but never examined until after the accident.
"This does, of course beg the question of why anyone would situate their DR facility next to something as potentially explodey as an oil refinery."
"Hey, look at this really cheap deal I got us!"
Plus, of course, you can't be sure what someone has put in the office building next door, a self-storage facility or a building on an industrial estate, but you would like to think that something as important and explodey as an oil refinery (or even just oil storage tanks) would have more-than-adequate fire safety precautions...
With some establishments, those "more-than-adequate fire safety precautions" are secondary to "more-than-adequate insurance". There's only so many precautions you can take with something that is full of boiling hydrocarbons, before those safety precautions take the form of specifying a minimum safe distance.
There's a hazard that
nobody seems to have everyone doing it properly has allowed for.
Backups always get raised in GDPR discussions. I've always been given the same guidance - If you hold non-compliant data because it's not technically feasible to remove it, you MUST have processes in place to prevent it being utilised. In the case of data backups that means ensuring if the data is ever restored the non-compliant elements are removed as part of the restoration.
I would hope the ICO would take a very dim view of anyone using data restoration as an excuse for non compliance. Technical explanation of what happened, yes, but not an excuse. Either the restoration process was non compliant, or was not followed (I.e. You failed to look after the customers information properly)
This is IME experience exactly the correct method of operation. An offline backup (the only good sort...) may indeed contain bit patterns that encode for personal data that the Data Controller is not allowed to process. That's fine, because unless and until the DC causes the backup to be restored the data are not being processed. Processing personal data is what the Data Protection Acts regulate. I would expect that the sequence (i) restore from backup, (ii) repeat compliance deletions, (iii) make database live, would satisfy the ICO.
It means that you can't actually delete any records - the point about backups is that they're write once, thereafter read only. The data is erased only when the medium is erased for reuse.
To delete a record under those circumstances would be to flag it as 'deleted' until you're sure that all backups prior to that data have been destroyed / reused. Since this is governemnt/lwe enforcement its a good bet that nothing ever really gets thrown away.
Unless you have inside knowledge that you are not sharing, I am calling bullshit.
The database at the core of the PNC is not the complete system. It is linked to a wide number of ancillary databases, such as DNA database, Fingerprint Database, etc. It is quite possible that the PNC was backed up. However, when the purge job goes ahead it will tell each of these other databases to delete the records as well. So they go ahead, in the sure and certain knowledge that the "Are you sure?" question has already been answered by the PNC operator.
Probably about a second after saying "Yes", the PNC operator has that "oh shit" moment we have all had in our past. Normally we would fake some form of error, shit the system down, restore the last backup and roll forward to about a minute before the mistake. Simples.
This is a real problem in any distributed system. Because now you have to get each of those system to restore and roll forward to the same point in time. And this almost never happens because the backups, transaction logs, even the timebase, is not synchronised across all these system. So you get left with dangling references between systems.
Its a problem for distributed systems. Its a real problem when you have distributed systems run by different organisations, on different hardware, OS, databases, and most probably with different backup strategies.
I agree is a monumental cock up. But a backup would probably not have been the simple solution you suggest.
> that "oh shit" moment we have all had in our past.
The problem with having been a commercial pilot is that (for the most part) we don't tend to have those moments. Typical reaction is more like "OK, this will be interesting".
Not sure whether that's good or deeply worrying.
At least you get to train for those deeply worrying (but mostly rare) events.
With the advantage of decades of hindsight I do wonder if my hands-on HND shouldn't have included a module describing techniques for handling 'oh shit' moments and for covering up any consequences or reallocating blame. Would the 17-year-old me have been intrigued to see that in the college prospectus?
Maybe advanced user management (blackmail and cattle prods) should also have been in the module.
"The problem with having been a commercial pilot is that (for the most part) we don't tend to have those moments."
This line reassured me that I am in good hands when flying.
"Typical reaction is more like "OK, this will be interesting".
This line dashed my hopes and instructed me that pilots simply have a different vocabulary to IT.
Given that the deletions have been going on for months, some my never be restored. Also for the same reason, you cannot just restore and roll forward. The time lapse is just too great. Probably the only thing that can be done is separate restore of the affected bits and extract the records that should not have been deleted for import into the live system.
Recovery from these sorts or problems where there is significant time lapse are a right pain in the arse. The bigger the system, the more difficult the task.
Automation is great until it goes awry. Humans are equally capable of the "Oh shit" moment but usually it is a point-in-time calamity that is easier to recover from. This has been purging stuff for ages, some probably legitimate and some clearly not.
"So why are the police keeping those records anyway?"
If the police delete all files relating to arrests that result in no further action, then one important casualty would be the ability to hold the police to account for said arrests. Because, of course, there would be no record of them happening.
So some information needs to be kept, at least.
At the very least, you need to keep a record of who has previously been interviewed on an ongoing case, and then released, so that someone doesn't come along later, look at the case and say, "hey, it looks like this guy has been involved, let's bring him in for questioning". Because, you know, that person would then have a pretty strong case for police harassment.
The pendulum swings both ways, and such.
Reading between the lines, it seems the data that was deleted was data that should have been kept, so pertains to ongoing investigations. At the very least, plod is going to have to do some extra work to eliminate people from enquiries. For example, if they have accidentally removed fingerprints of a person who has been burgled, used to eliminate them, then it's going to mean that if they do have prints from the scene-of-crime, then that person could be incorrectly identified as a suspect and have to be eliminated from the investigation again. Given how overstretched the police are from systemic underfunding, that is going to be a ball-ache for someone.
Posting anon, because I work for the company that is responsible for the PNC, although in a completely different division, so technically I should probably keep my opinions to myself.
t was my thought. Either they want to 'accidently' lose the fingerprints lifted from the door handles at Barnard Castle or someone in a camel hair coat and a fedora dropped a monkey or two to make a problem 'go away'.
I'm not a copper, but I'm pretty sure the normal course of an enquiry goes along the lines of:
1) Find any leads.
2) Follow those leads to find suspects.
3) Eliminate any that obviously didn't do it.
4) hopefully end up with one, and find enough evidence to prove they did it.
If you "accidentally delete" the records of having done steps 2 and 3, and you haven't got to step 4 yet, then you pretty much have to go and do those steps again. If nothing else, it's a waste of police time, and a potential massive inconvenience for those people who had previously been eliminated.
The process you describe applies only to reported crimes (that the police bother to investigate), which have no obvious suspect.
A great many cases *start* with the arrest of someone "suspicious", and the next step is in looking for any crimes that they may have committed. Even if it is found that no crimes had in fact been committed and the person is released with no further action, their fingerprints, DNA, mugshot and other personal details remain on the PNC database, ostensibly for 3 years if the person has no criminal record (but no guarantee that they will in fact be deleted after that time).
Plod has to delete the data after 3 years anyway. What's more, keeping a backup after this time would mean the data wasn't deleted, which would be illegal. To be honest, I'd be more worried if they _were_ able to restore a backup after this cock up...
What if I was never Charged or Convicted?
Charged but not convicted
If you were charged but not convicted of an offence, at any age, then your DNA and fingerprints can be retained for three years, plus a two year extension if granted by a District Judge, or indefinitely if you have previously been convicted of a recordable offence which is not “excluded”
Very likely they're violating the law by keeping them. Who is going to enforce the rules if no one is checking.
Another possibility is that 150k records are for the last three years - now that'd be even worse.
I suspect, but have no evidence, that the normal process had had to be "adjusted" to cope with the changes to data sharing with the EU and that "adjustment" was done in a cack-handed fashion, most likely because the requirements weren't known until the very last moment, and it had to be done at some point between Christmas and the New Year by some poor sod who would have preferred to be drunk.
This would be a perfectly reasonable opinion on the Guardian comments as everything is either the fault of Brexit, Cummings or both. I suggest it is almost certainly not that. We in Britain had laws and systems of knowledge recording long before our European brethren gathered together to invent civilisation and we continue to have them.
I too have no evidence to support such a belief, but opinion trumps [pun intended] facts every time.
> long before our European brethren gathered together to invent civilisation
Civilisation arose in the Near East (and possibly India and China) and only imported into Europe thousands of years later. It most definitely wasn't "invented" here.
Whether the "b" word is a good or bad thing is entirely moot in this context. The PNC will almost certainly have held data obtained from EU data sharing agreements, to which we are no longer party. I find it very unlikely that the EU agencies that provided that data would not have required it to be no longer held as part of both the "oven-ready" withdrawal agreement, the transition period for which expired at the end of December, and also the trade agreement rushed through parliament with very little scrutiny in the dying days of 2020.
So, the only question here, is whether someone screwed up applying those new rules, which given their likely complexity and the short timescale between when the exact details were known and when they had to be applied, seems like a reasonable hypothesis.
We've probably all done something similar but then we've probably just restored the deleted data from a backup prior to the issue (using our DR system). Surely its possible (perhaps not cost effective) to reinstate the data, in fact I'm surprised they don't have a procedure for doing so just for instance like this or cases where it has been deleted with malicious intent. The other option is of course to mark the data for deletion so it is in effect in the wastebasket and can't be seen/used for a period so you can then recover it when you realise you have got it wrong, once the safety period is passed its deleted.
Bit worrying its that easy to lose data from the PNC.
I guess the EU data deletion went a bit to far.
Its not that simple, because of the links to distributed system. Having to restore and roll forward one system is bad enough. I've done it on mainframes and fun is not the word I would use. Add in that this is not a relational system, its ADABAS, and you have a whole new circle of hell to navigate.
Then, its not one database. Its many, with the PNC at the heart. You would have to restore and roll forward all the linked database. Now you REALLY are in the dark and smelly stuff.
Depends on numerous things, perhaps the Home Office Procedure was correct, and the Vendor/maintainer did the wrong thing, or as its likely something just designed to cleanse EU data since BREXIT, the Home Office may have correctly stated what they wanted to happen and the Vendor/Maintainer got it wrong. In either case the design and implementation of the system seems flawed for something so important that data can not be easily offlined and recovered.
I hold a lot of contempt for the organisations that UK Gov seems to prefer, especially the like of Crapita and DXC, but Fujitsu go far and beyond incompetence in my professional opinion and experience.
For having staff, presumably groom bit their or the Post Offices legal teams, prepared to lie and send people to prison, through to suing the NHS because they didn't win a project, to a lot of the staff I've came into contact (having being forced to work with them, sorry, I mean carry them)
Their perm and temp staff (and I expect more from contracted staff then permies due to contractors) for the most part are effectively users with admin rights. They don't receive training and the contractors are paid nowhere near market rates so instead of getting a DBA or messaging SME, you get someone with a SC clearance who's a friend of someone who last week was working on a helpdesk, but is now running the installation of a technology they have no comprehension of.
With a few exceptions, most of their staff would not survive outside the world of Gov contracts where expectations and deliverables need to be met, and can't be just explained away and further cash thrown at them to fix the issue that they caused.
It is misleading to describe the event as "a Buncefield oil depot fire". A colleague of mine was at home in bed when it happened, several miles away, and his entire house shook with the force of the explosion. According to the BBC it was the largest explosion in the UK since the second world war, and was audible in parts of the Netherlands:
"On the morning of 11 December 2005, the UK experienced its largest explosion since World War Two. The huge blast at the Buncefield fuel depot in Hemel Hempstead was heard as far away as the Netherlands and shrouded much of south-east England in smoke. "
"[...] and shrouded much of south-east England in smoke. "
Looking out of my bedroom window in a nearby town - the smoke was visible as a horizontal plume in the distance. Unusual yes - but certainly not sufficient to justify that BBC report's description.
Didn't somebody awhile back say they could not comply with GDPR to delete information on people innocent of any wrong doing, as too complicated and time consuming and they got a whitewash get out of jail card. Seems they can be delete very easily !!
More lies and deceit from the Police ?
The PNC is old and creaky - I mean the clue is there in the word mainframe. I'm entirely unsurprised that it has proven to be impractical to retro-fit GDPR into something written several decades ago in a niche language.
Blame the government for failure to provide adequate funding for a replacement though. "The party of law and order". Pffft.
the clue is there in the word mainframe
I'd have thought that having all the data in one place, rather than scattered across the cloud, should make it easier to manage, not harder, in principle - but, admittedly, that's not much help if you can't find anyone who still has a clue how it works.
Presumably the volume of entries increased significantly after the Blair government decided to make every offence "arrestable". There had been too many complaints previously about officers making a formal arrest for an offence that was not legally "arrestable". Can't remember the criteria that distinguished offences previously.
It could be argued that then gave the police free rein to arrest anyone who was claimed to be suspicious - or just loosely associated with someone else.
If the backup system was destroyed in the Buncefield oil depot fire1 (Dec.2005) then it's an impressive level of useless management to go 15 years without a replacement.
1 A series of massive explosions starting with several thousand cubic metres of petrol vapour which produced quite a large pressure wave.
"If the backup system has been replaced and is operational, what's the point of mentioning the fire in the story?"
1. It demonstrates the age of the system I.E. "it was around in 2005 to have this happen".
2. It demonstrates that the operators are likely to have backups I.E. "we know they had backups in 2005, so they probably have them now even if they don't give us the details".
3. It demonstrates that there has always been a problem with maintenance of this system I.E. "as far back as 2005, it's been known that recovery will be difficult".
4. It is intended to suggest that the police should have learned by now I.E. "they should have learned in 2005 that massive data loss events will happen and built a faster recovery system accordingly".
5. It is intended to suggest incompetence I.E. "at one point, they built their backup system near a place that can explode. Who knows if they've done something as risky now".
6. It's an interesting event that happened, and they think the story might intrigue the readers.
More options are available.
I suspect that they have the backups but they need to be done as a full restore and it's probable that it took a few days to notice and now they have more data added.
So now some poor individual is trying to work out if they do a delta on the data added, restore from backup and add the deltas which would mean having to halt data import at some point. The other option is to find some similar hardware, configure it so they can do a restore, work out the deltas and import it back into the PROD system. As the Irish saying goes, when asking someone in remote Ireland how to get to Dublin, "well, you don't want to be starting from here".
Either option is a world of pain and because of this the "leaders" will be avoiding making a decision, like they do. At some point when everyone has forgotten about it they will decide that it's too much work to do (as so much time has passed not making a decision) and will sack it off, hoping that the Official (covering our mistakes) Secret Act will stop people talking about it.
I suppose that if you are legally required to delete data, then you would be legally required to remove it from backups as well. (For example, my Mac has a command "delete from all backups", that you would use for things that you really, really, really don't want, and especially for things that you were legally not supposed to have.
If I had to design that system, I'd design it so users can remove something from live data, and at the same time enter a date in the future where the data has to be removed from backups, for legal reasons. So if you are wrong, you can restore the backup, if you're right it will be gone from the backup when you ask for it. And a very strongly worded message, e-mailed to you and your boss, if the "delete from backup" date is too close in the future.
Our backup system makes a complete system backup every night, and only keeps a weeks set of backups. The system administrator has access to the data and can restore anything if we need it. A separate backup system makes a complete backup every week and only keeps six weeks of backups but logs everything, a third backup system is inaccessible on the network but has access to the daily backup and purges itself once a year. In twenty years, two floods, multiple hurricanes, and a few disk failures we've never lost anything - the backups backups backups mean that nobody worries.
I'm widely acknowledged in certain circles to be Britain's leading IT guy, and for a small fee I am willing to act as an expert witness in any subsequent court case.
This happens all the time, a standard procedure to free up resources due to defragmentation, consolidation and bygones being bygones.
(Two thumbs up to whoever burned the data, now that my thumbprints aren't on record.)
All that sensitive data should be encrypted everywhere, the best way to delete that data is to delete the keys, then the live and backup data is irretrievable regardless of where it is stored, effectively erased.
Obviously you could back up the keys, but if your going to legitimately delete the data then you’d delete the back up keys as part of the process.
If their deletion process is credible then that data should be irretrievable. The deletion process should be something like, Mark as deletion candidate for x amount of time, remove from active access for x amount of time (allowing for data to be remarked to not delete), irretrievably delete data.
IF they can restore it then what’s the point of deleting it?
Backups should be for restoring the data in event of a problem to the primary, not because the deletion process was followed by mistake as the deletion process should have built in safeguards against mistakes.
Backups should be for restoring the data in event of a problem to the primary, not because the deletion process was followed by mistake as the deletion process should have built in safeguards against mistakes.
Logical corruption, data loss (whether due to accidental deletion or otherwise) is problem at primary and DR won't help as the issues would have likely replicated to any secondaries.
In modern replicated DR environments backups are more to safeguard against logical rather than physical issues, as for physical issues you can fail over to a standby and then resolve issues in primary.
"It was early January, when Jambat (not her real name) decided that it was finally time to do some housekeeping. 'I need to get rid of all this old vinyl' she shouted, looking at her sprawling collection of Elgar concertos, failing to realise that her ever-attentive but somewhat sub-standard home hub assistant subsequently relayed this to her Justice Hotline as 'delete all records'. Trained never to question Orders From Above, the minimum-wage intern duly set the process off, and then went off to start his evening Deliveroo shift..."
To be continued...
Everyone here is working on issues around databases (ADABAS) and backups and distributed systems. Oh...and the assumption that the report provided to the press by the authorities is factually accurate. But sometimes there can be much simpler explanations. For example:
$ sudo rm -f -r *
I think we should be told!
Logic would say that a system that revolves about individuals needs to be able to save, update, delete and backup each individual in a format that allows each individual's associated data to be restored at the same time.
When you delete a master record you cannot just restore that record using the same index. If you only have a infrastructure backup restoration still requires application level dump and restore.
What's worrying is that terms like Fujitsu mainframe and ADABAS take me back to work I did on disaster recovery in 1996 (business continuity). Which suggests that the technology that drives the PNC. is 25 years out of date.
Today we are delivering cutting edge large data solutions but not via AS400s of ADABAS, but via docker, elastic search, kubernetes, cloud internal and external.
Viewing the PNC today from a professional point of view is like walking into the science museum..
Don't know about anyone else, but when I've purged old data from the current system the last I do before I punch the button is to take a full backup, just in case things go wrong- seems like the kind of basic precaution anyone would have scribbled down in the CAB request.
The purge has been happening over time, That is much of the problem. There is no single point of reference.
What you describe is dead easy but in the event you discovered in even 1 week's time on a busy system that something had not behaved as expected your point-in-time backup is not a great deal of help.
You have 3 choices:
You can either restore to the point-in-time and then roll forward all the transactions since the backup, excluding your purge.
Restore out of place, export and collate the missing date then import
Do nothing and hope.