
Blockchain ?
Worth noting that once data is in a blockchain, it's there forever, unless the blockchain was designed to remove data before creation.
The best you can do is post a correction/addendum later on.
"The right to erasure is not absolute," the UK Information Commissioner's Office told us as the question of the backup tech industry's exposure to the EU's General Data Protection Regulation was raised in the week after it came into force. The concerns Just last week, Curtis Preston, chief technical architect at Druva, raised …
Worth noting that once data is in a blockchain, it's there forever, unless the blockchain was designed to remove data before creation.
Yes, all the banks and other people getting excited by blockchains recently mostly haven't considered this at all. Other than suggesting that blockchains should be exempt from GDPR!
https://www.theverge.com/2018/4/5/17199210/blockchain-coin-center-gdpr-europe-bitcoin-data-privacy
No. A majority of miners can rewrite the chain. This was done, for instance, in response to the Etherium hack. For an internal blockchain, a rewrite is more-or-less trivial.
Deleting data would be very similar, I believe, to purging proprietary data out of a git repo.
Hard, but finite.
A minor point but doesn't Article 17 use the word erase, not delete?
My understanding is that just changing the data to something useless ("wibble" would be my weapon of choice) complies with the letter and spirit of the regulation.
I can't think of a situation where erasure would be accomplished better by obfuscation than deletion but my point is the rules don't say you have to delete.
In the context of data, erase == delete
erase
VERB
[WITH OBJECT]
1 Rub out or remove (writing or marks)
‘graffiti had been erased from the wall’
1.1 Remove all traces of; destroy or obliterate.
‘over twenty years the last vestiges of a rural economy were erased’
‘the magic of the landscape erased all else from her mind’
1.2 Remove recorded material from (a magnetic tape or medium); delete (data) from a computer's memory.
‘the tape could be magnetically erased and reused’
‘the file has been erased from the hard disk’
Not really feeling it, are you?
In computing jargon, erasure is as simple as unlinking. The data's still there, but there's no direct path to its retrieval. It's the electronic version of 'in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.”'
In legal jargon, erasure has a very definite meaning, closer to the common sense concept of erasure: the data's gone. This type of erasure in computing jargon means scrubbing; whether that's overwriting with zeroes, or "dd </dev/urandom >/that/guy" in a loop fifteen times with an upside-down chicken on your left elbow.
Of course if you scrub something, you should log it...
Erase-on-restore is probably a nonstarter because it is technically trivial to *not* erase-on-restore, so the PII is still definitely available and identifiable. Likewise you've get to the root of the problem in that you need to be storing a unique (i.e. not anonymous) identifier to perform the erase-on-restore in the first place.
Anonymisation of your backups through something like tokenisation or classic data mastering techniques is really your only option. If you delete the tokenisation key or the master record, the record in the backup becomes (to some extent) anonymous. However even this is thorny because simply removing explicit PII is not necessarily enough to anonymise the data. Depending on the data context it may be trivial to reconstruct the identity, even if all of the unique keys and identifying fields are now random garbage.
Yes, this is hard. I suspect that, based on what the guidance eventually says, static/cold backups will have to be strictly time limited to a period less than what we're currently used to and justified as legitimate business purposes. As long as we're all perfectly clear with our data subjects that we're doing that, we should be fine.
Storing a list of erased people is legitimate. There are plenty of reasons to do it (protection from non-compliance claims is the obvious one).
Just because it's trivial not to erase on restore doesn't make it non-compliant. It's technically trivial to make your s3 bucket public visible but as long as you don't do it you're OK.
"Erase-on-restore is probably a nonstarter because it is technically trivial to *not* erase-on-restore"
It's equally technically trivial to not act on the request in the first place. No difference.
"If you delete the tokenisation key or the master record, the record in the backup becomes (to some extent) anonymous."
How do you handle the restoration of the backup of the key?
Keep a log of those people who have successfull requested deletion.
If you restore a backup, re-run deletions from the time of the backup.
That log would be covered by legitimate interest.
Not sure your last point applies but I note only someone restoring data needs to be able to read the log and entries can be removed after the retention period for the data is reached.
Seems like a pragmatic solution to me.
> My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?
One suggestion elsewhere that sounds reasonable, is to one way hash their identifiers (email, etc) and store those. The original identifiers can't be recovered, however newly restored records could then be hashed and compared. If any records match, nuke those ones.
On a side note, it would be kind of amusing (not in a good way) to see a hash collision play out through various systems. And if someone's details have a hash collision on one system, there's a reasonably chance they'll also collide elsewhere too.
In some cases it is possible, say you've identified a row in a RDBMS that was covered by the RTBF request, if that row has a unique reference number (say customer or order ID), then you could add that unique reference number to your "future_forget" list. If the only way of identifying the row is by using the persons personal information, adding that to your "future_forget" list would have its own obvious GDPR problem, although you might be able to argue that that information was necessary in order to comply with GDPR and therefore lawful as long as you weren't using it to influence decisions. If the law requires you to retain info, then a GDPR request cannot compel you to delete it. Of course in this instance the data only exists because of the GDPR request, but surely you need to track RTBF requests, to show you have complied with them, and to do that you have to store the requesters personal info in your RTBF tracker. I think it's fair to say that this whole area is somewhat unclear.
@3 hrs Aladdin Sane
"But deleting on restore would seem to be the most logical way to go about this. "
That is what a GDPR readiness group told me a few weeks ago. and that a log of what's been deleted should be kept? then that remaining data should be backed up again...
Lovely!
I'm currently writing GDPR cleardown procedures for people that no longer need processing.
When the cleardown is run, the unique id for the person is stored, and the date it was deleted. This cannot be tied to any personally identifying data by itself. Nothing else is retained.
In the very unlikely event where a customer accidentally sets a person to have left years ago, dated some time ago, and the data are mistakenly deleted we can tell them when it was deleted. Can't get the data back, of course.
"When the cleardown is run, the unique id for the person is stored, and the date it was deleted. This cannot be tied to any personally identifying data by itself. Nothing else is retained"
This only works for very, very simple organisations. Most organisations will be worried about being sued or regulator investigations. Those organisations will have to store somewhere the identity associated with the unique id in order to defend themselves.
As soon as you do that your data subject can be reidentified from information which can reasonably be expected to be... whatever the precise wording is.
This is for anything but a simple organisation. Data are held for third parties. They define personal data that are no longer required, the age of data to be cleared down, and when it should be activated. If the third party has a need to retain the data for their own internal or legal purposes, that is their choice. They know that once the clear down is activated, historic data cannot be recovered. The minimum retention period should be sufficient to comply with potential legal requirements, HMRC requests, and so on, if the third party has any sense.
Remember the 'right to erasure' does not apply 'for the establishment, exercise or defence of legal claims.'. Neither does it apply if the personal data is necessary 'for the purpose which you originally collected or processed it for'.
I'm not involved in the legal side of the business, but the above seems to indicate that if there is a potential legal liability, data retention up and until the limit of that liability is fine.
The solution is pretty obvious to me. Deletions are only performed on live (current) data, BUT a record or log of all such deletions is kept. If and when it is ever necessary to restore from a backup archive, that deletion log is used to immediately delete the same data on the newly restored records before going live with the restored media.
Something that could be trivially automated so that it is applied automagically after any restore script is run.
Well this IS my field of expertise, so I'll chime in. :)
Many backup systems have added (or are in the process of adding) features to delete personal data from the backup -- to a certain extent. For example – depending on how your backup is stored – it is technically feasible to delete all spreadsheets, word processing files, or PDFs with a certain person's identifier in them. But asking that same backup product to delete the person's data from within a file or database – while keeping the rest of that file intact – is venturing into extremely dangerous waters. If that's what we're asking, I'll have to agree with the quote in the article form Linus Chang, "deleting data from a backup is a terrible idea because it risks corrupting the backup, breaking referential integrity, breaking applications that were expecting that data to be present, and importantly, breaking any checksums on the data that would prove that a restore was successful."
That leaves the "delete on restore" option. My opinion is that a RTBF journal/database that stores ONLY the unique identifiers (and no other data) – while it sounds on the face a direct defiance of the RTBF – is the best way to ensure the person "stays forgotten" if there is ever a restore from older backups. It's even possible to have the backup system trigger the "make sure these people stay forgotten" process after a restore.
The RTBF article of the GDPR says you can keep data required to defend against a legal claim. In addition to being used for this "make sure they stay forgotten" process, this database I'm proposing can also be used to prove when someone asked to be forgotten, when they were forgotten, etc, in case of a GDPR claim. In addition, the use I'm proposing is also to protect against a legal claim – that you said you forgot somebody that you ended up restoring from backup. Ergo, I think it should be OK to have a RTBG journal/database. I am not a lawyer and am not giving legal advice on GDPR. I'm just spitballing here.
My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?
GDPR allows you to keep PII which is being held for a good reason. You couldn't, for instance, forget the delivery details of an order which is yet to be despatched. On this basis one should be able to hold the forget request until all the backups that the real data may be on have been superseded and wiped.
"how do you remember to forget them on a restore?"
Ironically, you'll probably need to "remember" them in a "forget" data set. Google will have to "remember" them in a "forget" algorithm, too, so that search results don't show whatever it is that's supposed to be "forgotten". And don't even get started on www.archive.org [it's a GREAT resource for 'forgotten' data files and web pages that you might want to research, but are no longer available].
I guess www.archive.org could just make it impossible for EU people to access it... or "remember" to "apply the 'forget filters'" if you have an EU IP address or something.
I like the idea of GDPR in principle, but I think the actual mechanics of it will have too many unintended consequences [like for backups].
>>> once you've "forgotten" about somebody, how do you remember to forget them on a restore?
Tag the data subject records with a unique identifier (a meaningless but unique number - MBUN). When I forget someone and delete the records that MBUN is no longer linked to a an identity, Keep a list of all the MBUN identifiers whose records you forgot - when you restore delete any records on my MBUN list.
Of course encrypt my backups - many authorities consider that in the event of a breach encrypted data is not a disclosure since the risk of re-identification is very small. Whether all this is sufficient is open to guidance and findings by individual DPAs but may well be a defensible position and unlikely to incur a huge fine.
I had a person reply to one of my twitter comments that this exact thing happened to him. The company "forgot" him, then a while later he started getting emails from them. Upon investigation, they realized the restored the marketing database to before he was forgotten. They had to institute the kind of process we're talking about.
I don't understand the problem. If a person's data is deleted then subsequent backups will not contain it. If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied. Yes. it's theoretically possible to restore the backup and then do something nefarious, but if you're that sort of organisation you won't care about complying with GDPR in the first place.
This is all being overthought, even though the Information Commisioner has repeatedly made it clear that enforcement will be appropriate to the organisation and circumstances and that those making an honest effort have nothing to fear.
Maybe - just a thought - there are those trying to stir up GDPR FUD for financial gain. Oh, surely not?
This is another version of erase on restore. The problem is that an old copy of a database can be restored WITHOUT applying the later transactions (and this may well be the case for debugging a problem) in which case the persons data is accessible again.
The question that the "right to be forgotten" legislation has to take into account is whether a commitment to delete a persons data if restored from a backup is sufficient.
There is also the problem that an old backup of a users files may contain personal data that was not identified in a search for such data because it was only in backups. (An example - a user had a spreadsheet with names and addresses that was deleted before GDPR came into force but which still exists on old backups.)
@Stungebag
People keep backups. That's the bit you're missing.
Not everyone is on one week/month retention then overwrite.
We have old backups going back years. Deleting something from *all of them* would require significant time, and probably 5 or so old types of type drive somehow being connected and bought back to life.
"do you really have a backup?"
It depends on the requirements. If it's for a legal requirement with very infrequent and non urgent access then yes. There are specialist data recovery firms that maintain infrastructure to support just this type of access.
There is a world of difference having to retrieve something specific from a particular time, to having to restore data from all of the backup tapes over a period of time. I've know multiple tape drives being worn out during massive data disclosure exercises. Incidentally if the tape movement logs show a bunch of your old tapes have been retrieved and updated, data disclosure becomes even more of a world of pain.
Where I work we're on an 11 year retention policy for every single piece of data. We have data stored on DDS DAT tapes, some of it is so old and has been deemed applicable to audit and must be kept. I remember an episode 2 years ago where the purchasing team had to source a working DAT tape drive from eBay or Craiglist or some such nonsense.
This raises the question, how can I erase your data if I cannot read it back? I appreciate the argument that, "Ignorance is no excuse." and "It's you're perogative to ensure you can read backed up data." but that argument will be tested at some point. You have Fred's data on a tape backup that you know you cannot dump in the bin but at the same time you can no longer read. What do you do? Break the law on retention or break the GDPR law? I suppose you pick the cheapest in terms of the fine and hope none of your customers find out! Ha ha!
"You have Fred's data on a tape backup that you know you cannot dump in the bin but at the same time you can no longer read."
This raises questions about the sanity of the audit or about your failure to migrate the old data to new media once the old one becomes obsolete. It also raises the question of whether you have effectively forgotten everything on the old media already.
"Where I work we're on an 11 year retention policy for every single piece of data. We have data stored on DDS DAT tapes, some of it is so old and has been deemed applicable to audit and must be kept."
I see this kind of misinterpretation of "kept" all the time.
Keeping the data is not the same as keeping the media. Migrating backed up data to new media is critical to ensure that your backups actually remain accessible.
If someone had to go find a drive on Ebay, that's prima-facie evidence that you haven't bothered actually periodically checking the integrity of those backups - which is a necessary requirement for any properly working backup system. As such the system should have a big red FAIL stamped on it.
Two worst cases I can think of off the top of my head:
1: The BBC micro and their Domesday book.
2: The academic I work with who has a garage full of 1970s-era NASA 9-track tapes full of raw data from earth observation satellites he wants restored one day - for shits and giggles I found an outfit which can do it. They want over £250 per tape - not "because they can", but because the equipment is so fragile (and head wear such an issue) that it costs about that much just to keep it running (people scrounging old electronics to find working bits have to be paid)
When I told him how much it'll cost, it put a dampener on his restoration plans. He'll never afford to be able to do it but he won't admit defeat and bin the tapes either. Every year he delays his decision the per-tape cost continues to climb and one day they won't be able to be restored at all (My suspicion is that the original data is still online inside NASA somewhere anyway, they seldom discard things)
"I don't understand the problem. If a person's data is deleted then subsequent backups will not contain it. If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied. Yes. it's theoretically possible to restore the backup and then do something nefarious, but if you're that sort of organisation you won't care about complying with GDPR in the first place."
GDPR means you shouldn't have that information any more. Not 'well, it's all the way at the back of the filing room, and I never go that far, so no worries, right?'
GDPR and backups, without case law to guide, is an issue.
I'm not sure the filing room analogy works here. In this analogy, the filing room represents the main "production" copy of the data. And yes, you WOULD have to delete it in that case.
But imagine, if you will, you had a scanned JPG of every piece of paper in each drawer, stored in a completely separate system. Continue to imagine that you don't have OCR, so you can't scan the contents of each JPG without physically pulling each one up, reading all the words in it, then moving on to the next image. Now imagine being asked to redact info from those JPGs without the ability to search them. Now you're a bit closer to what we're talking about here.
I agree with your last comment. I look forward to further guidance from the ICO.
"I'm not sure the filing room analogy works here. In this analogy, the filing room represents the main "production" copy of the data. And yes, you WOULD have to delete it in that case.
But imagine, if you will, you had a scanned JPG of every piece of paper in each drawer, stored in a completely separate system. Continue to imagine that you don't have OCR, so you can't scan the contents of each JPG without physically pulling each one up, reading all the words in it, then moving on to the next image. Now imagine being asked to redact info from those JPGs without the ability to search them. Now you're a bit closer to what we're talking about here."
Just because it's difficult, maybe even impossible without deleting your backups, doesn't necessarily mean you don't have to do it under the law. If your backups are required by law, then you might have to restore then, delete the data, then re-archive.
Nobody said that new legal requirements would be easy, but you know, people had two years to think about this. Why has it taken until a week after the law started for someone to say 'what about backups'?
We brought it up before. It's getting coverage now because the law is now in effect. Such is life with news.
There are actually sections in the GDPR that speak to technical infeasibility and undue burden as a defense against certain requirements of the law. In addition, the need to keep the data for other valid business purposes is also a defense.
As to what you're proposing (restore, delete, backup again) for every single request? The cost is so high that most companies would just pay the fine if the law were to be enforced that stringently. We're talking costs in the tens of millions every single time you get a request. My opinion is that is never going to happen. Not to mention the risk of doing something wrong and doing damage to the company.
The ICO said they will provide guidance on this soon, and I for one am looking forward to it. I'm willing to bet the advice is going to be closer to what Robert Wassall said in the article. The data needs to not be accessible to production systems, not be used for any decisions, etc. To that i would add that a company must commit to deleting it if it ever DOES come out of the backup system via some kind of restore.
My opinion so far.
"As to what you're proposing (restore, delete, backup again) for every single request? The cost is so high that most companies would just pay the fine if the law were to be enforced that stringently. We're talking costs in the tens of millions every single time you get a request."
For now, it would cost a lot. My guess is that new backup software will be developed with extra functionality for GDPR and RTBF-based queries (which are of course different), and it will become 'best practice' to have that sort of backup, with the largest organizations being the first to be punished if they don't, and trickling down as technology is refreshed in companies.
And all this will happen again (will it?) when ePrivacy Mk2 comes into force.
I can only speak for myself. My blog was a response to comments I was seeing out there that suggested that GDPR RTBF was absolute and everyone should be able to delete personal data from production AND backup systems. So I suggested that wasn't possible given current technology (nor do I think full RTBF from backups is coming any time soon), and suggested the kind of process you mentioned in your comment. So I feel like I'm trying to clear up FUD, not create it.
...people will be asking the question: "Why exactly are we doing this? What is the cost/benefit ratio for this kind of work? Is this the best way to improve humanity's lot, or is it a completely excessive response to something which was a non-problem in the first place..."
Because your non-problem has via Facebook and CA contributed to Brexit, Trump and god knows what else. Because we never gave these people consent to record our personal information, track and profile us or the purpose of showing adverts. Because given free rein the ethically challenged, Randian CEOs running most big IT firms will exploit you as far as they can just to gain an extra dollar. I could go on.
You may not have recognised the problem but that does not mean there was no problem.
@Dodgy Geezer "cost/benefit ratio"
Pretty simple - for each offence face being fined up to 4% of your global turnover. That's the sort of numbers that concentrate the mind of even the most glutinous data hoarding spammers. The carrot approach didn't work, now time for the big stick.
Consider a school.
Current government advice for various data held is published in a nice compact table with type of data and years you need to keep it by law.
Some of it brings the limit up to legally requiring storing certain personal data for 25 years after they were last a pupil. And, no, you can't anonymise it.
As such, "right to be forgotten" for many such pieces of data is basically non-existent.
Even financial records tend to hover about the 4-7 year mark for even the smallest business, and no, you can't just anonymise them (the taxman may have something to say about that should you be audited for, say, VAT or income tax for private contractors as in IR35).
The right to be forgotten is a way off for most people and requests handled on an individual basis. But GDPR hasn't really considered it in terms of practical solutions.
Depends what you read.
Going by:
https://www.education-ni.gov.uk/publications/disposal-records-schedule
Then, yes. But other places give differing advice as that's the MINIMUM required (and some of that goes up until the pupil is aged 30!).
If the table in that document isn't enough to convince government to set a single data retention standard, then nothing will.
In regards to Social Services retention periods...
The following records are subject to statutory requirements:
Case records relating to children who have been placed, to be retained until the 75th anniversary of the child’s birth or for 15 years after death if the child dies before age 18 (Arrangements for placement of Children (General) Regulations 1999). GDPR that one!
1. Inform user you have encrypted backups that may hold data on them
2. You can't delete the data in those backups as it's not technically feasible or practical, you can refuse a delete on these grounds.
3. You will remove them after X months/years or whenever the backups go into rotation.
4. If you do restore a backup after the period you removed the user, but before the user was removed in the data, just re-run the delete function again.
However to re-run the delete function again, you need to keep some personal data of who to delete, so in theory you can't delete them if you've forgotten them... Now I'm pretty good at forgetting things according to my wife, so I've deleted her and I now get excited to see a strange woman in my bed at night
Surely you wouldn't make them read/write and modify and go all horribly wrong like that as suggested?
You'd do the delete on the primary instance, then take a new snapshot to replace the existing one with. Most places would do snapshot creation automated on a schedule anyway so it would probably just sort itself out overnight. You'd then just delete the old one.
I believe at a time up until around the late 80s or early 90s the examination boards for schools, colleges and universities just deleted all their records and binned the old exam paper a couple of years after the student did their exams. And this lead to the situation where people were claiming that they achieved a particular qualification but has lost their certificates and the examination boards had to take their word for it and issue a new certificate with whatever grade they wanted on it.
Back at the time know people who did this and went from average students to star pupils overnight.
How come this is being presented as though it were a new problem? There's been a data protection act in Britain since, since, ..... was it 1983? I can't remember any more. That was 35 years ago, and it also contained the right to have false or outdated data deleted.
Organisations and manufacturers have been arrogantly violating this legislation for decades, since respecting it would be expensive and inconvenient, and the legislation was toothless.
Now I look forward to massive penalties to be suffered by those who've been holding unlawful backups for all these decades. It won't happen though. It never does with data protection. :-(
I love the 'just delete on restore' answers - implying that the posters are working in wonderfully organised IT shops, with everything QA'd and stored on one monolithic central IT system. Maybe come down from the tower occasionally and meet the real world of personal data scattered in Excel spreadsheets, Word documents, pdfs and for all I know coded into C# objects.
Now a sane person probably wouldn't lose much sleep over someone finding a list of invitees to a conference held 5 years ago, stored on a CD marked 'My C:\drive backup' - but so far there's little evidence of sanity around. The asylum's latest suggestion is a 'clear desk' policy, in case personal information is compromised, applied to workers who don't ever handle personal data - but I guess you can't be too careful - after all that scientific report you are referring to to do your job was written by a person, and that person's name is written on the cover, and where they work...won't anyone think of the children?
"meet the real world of personal data scattered in Excel spreadsheets, Word documents, pdfs and for all I know coded into C# objects."
If that is how a company handles personal data, they will soon meet the real world of massive GDPR fines.
That why the more forward-looking organizations have spent the last two years changing from the "real world" you sketched to a world in which GDPR compliance is actually possible.
If it was HR data spread around the enterprise then any GDPR grief would be deserved - but your 'forward-looking' organisations either haven't been around for more than 2 years, have amazingly simple and tied down IT systems or are wilfully self deluded if they think they are on top of the 'list of names and emails in a spreadsheet/printout/filing cabinet' that seems to exercise the GDPR purists so much. Or they just took a 'nuke it all from orbit' approach, which might work if the past means nothing to you - might not work so well if your business is all about managing data and knowledge that has been collected in the past.
"Maybe come down from the tower occasionally and meet the real world of personal data scattered in Excel spreadsheets, Word documents, pdfs and for all I know coded into C# objects."
If this is the primary data storage then they have other problems already. If this is secondary storage - look for it particularly in Sales and Marketing or possibly HR - it needs to be dealt with. Audit the business and delete any of it you find. Permanently. Even if it means going through old file system backups (not the same problem as RDBMS as regards data integrity). In the real world it's this sort of secondary storage in the hands of users that's most likely to cause damage.
This reminds me of a time, recently, when my partner showed me some video or photographs of an event held in the assembly hall of the School in which she works. I asked what the white rectangles were on every single piece of child's artwork on the walls of the room.
Her answer blew my mind; "It's the name and class of the child-artist, we have to cover them for GDPR and child-protection reasons..."
...wtf?
In recent weeks I've seen quite a lot of people pointing out that "data necessary to provide the service" is exempt from GDPR. A reliable backup system is *definitely* fairly and squarely in that category, particularly if there are legal consequences to delivering wrong answers. So I can see no GDPR angle on this.
As for the right to be forgotten, well, IANAL but wasn't all this discussed at length some weeks or months ago?
"As for the right to be forgotten, well, IANAL but wasn't all this discussed at length some weeks or months ago?"
Weeks and months ago. And still we have numpties crawling out of the woodwork asking about which law trumps which when storage is legally mandated.
I'm comfortable with people wiping my data from a production system, and keeping it in backup, provided those backups don't stay around forever and aren't sent to the NSA at end of life...
Paranoid? Moi? Well, I prefer "Sabotage", or "Volume 4" actually, but "War Pigs" is still a great song...
I first wrote about the issue of data backups re GDPR a year or so ago and many times since. Amazing how many times I wrote about it and yet it is only getting attention now. I mentioned it in forums here on El Reg (a well read tech site), on Facebook (a well read social site), in my own blog posts on my own sites (sites that attract techies), on Quora in answer to GDPR and EU related questions, in other places around the web, to my clients (who understood my advice), and to every day people when GDPR came up for discussion. I doubt I am the only person to consider the implications of GDPR. What did I get for my reward? Take a guess.
This is not a newly recognised issue. It is an obvious issue that has been willfully ignored. How did people miss this?
If the ICO is now of the view that compliance being technically difficult "is not going to wash" isn't it about time she acted against the Home Office's refusal to remove mug shots of innocent people from Police Databases?
see https://www.theregister.co.uk/2017/02/25/custody_images_review/
Really they're not. Well, not technically. Legislatively, perhaps.
It's restoring them that is the problem. Or it's backup to disk, then mere access can be a problem.
Luckily, some bright spark mentioned in the article thought of that:
The only practical thing to do is to detect and erase the information on restore, he suggested, which would be a big task but, in principle, doable.
Erm, yeah, but I've deleted everything about Joe Bloggs of Wankstain, Essex, including his request to be deleted. So how do I know not to restore him?
And once you've worked that one out, my favourite backup tool is rsync. Because it's bloody fast. You can even backup/restore an 80G server remotely over a shitty ADSL line in an hour (as long as the data on the server doesn't change much). If you want me to filter out Joe Bloggs from the restore then that is going to turn something fast into something slow, or at least require me to access his details in the backup so I can delete them manually before I do the restore, which legally I probably am not allow to do. Also can I do a full restore and delete him before I make the data live?
The devil's in the details.
"Erm, yeah, but I've deleted everything about Joe Bloggs of Wankstain, Essex, including his request to be deleted."
Two points. If you have some central record ID and that gets used as a foreign key in every other table affected then retain that foreign key. Otherwise retain the request. It will be needed to re-delete on restore. Without it you can't do as he asked so if you deleted it it you were doing it wrong.
As an American, I've pointed avoided most of the GDPR discussions. But this discussion has most of the commenters sounding like LEAs on encryption.
Consider the following scenario:
Company A has PII on individual X.
Company A dutifully keeps backups of data.
Company A is merged into company B.
Company B dutifully maintains company A's backups.
Company A's data bases are migrated to company B's schemata.
Individual X applies to company B to be forgotten.
The data in company A's backups is not indexed in any meaningful way in the current schema. A restore of this data cannot be automatically purged.
Or how about this?
A company acquires a dataset, and backs it up.
The company merges the dataset into its existing databases.
A de-dupe process is run on the merged data.
Someone demands erasure.
Again, the de-dupe and merge processes make automatic deletion of restored data effectively impossible.
And these are about perfectly run shops. Real world is going to have much more trouble. Criminalizing less-than-perfect behavior is not going to encourage innovation. "Best effort" is really the only standard that can work. Unless you love selective enforcement.
"The data in company A's backups is not indexed in any meaningful way in the current schema"
You've merged the data into B's schema. Why are you keeping backups you can't use?
"Again, the de-dupe and merge processes make automatic deletion of restored data effectively impossible."
Why is it impossible? Haven't you indexed it? On de-dupe you already deleted an entry so why should deletion of another be a problem?
Both your examples are, in fact the same: merged data sets. If the merged data set is usable it would need proper indexing and should, therefore, be possible to delete as required.
I'm talking about backups of the pre-merged data. Identification of pre-merged data relevant to a deletion demand requires tracing the current data back through the merge process. But the merge process itself is not a process that is supposed to be reversible. Scripting an ill-defined process...is problematic.
Usually it is only applicable if processing is based on consent.
so providing you are storing data for a legal, contractual or other allowed method (not requiring the individual's consent) and only retaining it for as long as is nessacary, then RTBF does not apply
The statement by the ICO spokesperson "data protection law is technology-neutral" is ingenuous.
The European clots who put the Regulation together are, at best, technologically ignorant, and at worst not doing their job properly by creating regulations which cannot be complied with and enforced.
Our new Policy says we won't delete backups because they are our DR lifeline, and as a micro-business we don't have the ability or resources to do anything different.
So we've made it perfectly clear to anyone who wants to be forgotten. What more can we do?
I can think of an obvious way to handle this:
Step 1: when backing up a user's data, encrypt it with a unique key for that user.
Step 2: when a request-to-forget comes in, delete the key.
Your backups will remain immutable but the user's data is inaccessible.
Of course you'll need some system to manage all the encryption keys and that system is going to need its own backups, so this isn't a completely trivial solution, but encryption keys are small and it should be much easier to design a non-immutable backup system for them (especially since the keys themselves aren't PII, and you don't need to keep historical backups of them, just backups of the currently-valid set).
The industry sources may have been somewhat limited. Was very surprised to read the summary position: "Vendor software across the board doesn't know what it is backing up and won't necessarily easily or practicably find the data the subject has requested to be erased, according to industry sources.".
Some vendors, such as Commvault, have considerable discovery and insight into the contents of data being backed up, including personal data, across their enterprises. They also have the ability to use that profiling to
* enact and automating data policies
* flag content for review
* support prioritization of risk management according to various risk profiles and criticalities
* perform proactive risk mitigation on the data from backups and directly from the data source
* and support data subject requests including the right to access and right to be forgotten (with removal from backup sets and data sources).
Points well taken from the article though. Many customers are challenged with how to deal with both the discovery and remediation of personal data within their environments. The market has some interesting options, so don't lose hope!
I think that most people miss the key distinction between backups and archives. As other comments point out, backups are generally not indexed at a granular level. Archives are.
Backups are an operational tool that allows for recovery in a disaster. As such, there should be no need to keep backups for longer than you would need to recover from in such a disaster.. say a month or two.
Archives are generally kept online and provide fast access to specific records.
If this regime is followed then the right to be forgotten is simple to implement. You delete or mask records in the archive and wait for backups to expire. And the records are gone in the 1 to 2 month period.
"Backups are an operational tool that allows for recovery in a disaster. As such, there should be no need to keep backups for longer than you would need to recover from in such a disaster.. say a month or two."
You'd think so, but we get a constant trickle of people asking to restore a file they realise has been deleted and find that it went away a year ago, or need to go back to an old dataset for some reason.
None of this is PII stuff, but it raises another problem in many backup environments where policy is set to avoid backing up PII, but users do stupid things and PII data gets placed in the areas which _are_ being backed up.
The counterpoint to this is when non-PII/personal data (such as statistical data or source code) is placed in personal space and someone else needs to access it long after the person in question has left. Not enough organisations have procedures in place to ensure that "business" data is not locked away or lost in this manner.
It amazes me that people are posting here without understanding the GDPR rules.
"right to be forgotten" applies only to data that are held because the data subject has consented.
It does not apply to data for which the organisation has (and has notified) another legal basis for processing the data.
Thus complying with a request to delete personal data is not as simple as deleting all that subject's personal data (or making it anonymous by deleting the subject's name from the master index as some have suggested); it requires deleting a subset of data held. That also means that maintaining a list of requests to remove personal data IS allowed, because it is necessary to allow audit to show that you have complied with the request (and hence complied with the law, should it come to that).