back to article UK Data Protection Bill tweaked to protect security researchers

The United Kingdom has revealed amendments to its Data Protection Bill to de-criminalise research into whether anonymised data sets are sufficiently anonymous. The legislation, first floated in August 2017, gave rise to worries that researchers would commit a crime if they broke whatever measures were used to anonymise …

  1. A K Stiles
    Joke

    So essentially..

    You manage to identify people from a supposedly anonymised dataset, but as long as you didn't do it with malicious intent and then flag it to the ICO or the publishing organisation's data controller (or both simultaneously) within 3 days, you probably don't get prosecuted (successfully) for doing so. Best not do any of that sort of research just before a bank holiday weekend then!

    At first glance it does seem to be a more sensible and pragmatic approach...

    1. Paul Kinsler Silver badge

      Re: within 3 days

      ... one thing that strikes me is that it might not be clear when the counter starts. A researcher might start off with a nagging suspicion, run a few trials, getting more sure over a period of weeks or months, until eventually setting up the test that nails the problem. Does the three days run from "nailed it!", or from "I'm 90% sure" a week before, or what? What would a hostile prosecution case like to claim, and how might that differ from the researcher's view?

    2. David Gosnell

      Re: So essentially..

      (b) where feasible, not later than 72 hours after becoming aware of it

      Guess it depends on interpretations of feasible.

    3. Christoph

      Re: So essentially..

      Yes, you 'probably' don't get prosecuted and jailed. As long as you haven't made the slightest slip up that a malicious prosecutor can seize on and blow up out of all proportion.

      And this is for professional security researchers. What happens if someone else notices by chance that due to someone's oversight, some data can be trivially de-anonymised? Do you know exactly what procedure to follow and who to notify, having never thought about this before? Can you find that out with certainty, within 72 hours of first going "Oh, that's odd"? Or would you do better to keep quiet and let the information leak out, rather than risk jail for trying to warn people?

    4. Doctor Syntax Silver badge

      Re: So essentially..

      "Best not do any of that sort of research just before a bank holiday weekend then!"

      Write yourself a memo coming to the conclusion that re-identification is possible and date it. Include your suspicions that it might be possible plus your explanation of why you've just come to that finalconclusion. Send the ICO a message - email or letter, of the same date. If you're worried about the effects of non-working days do it on a date that gives the message sufficient time to be delivered. With documentation it becomes difficult to claim you were definitely aware earlier.

  2. Adam 52 Silver badge

    Seems reasonable. Although I now expect the creation of the Google Reidentification Research Team, because there was nothing in there are discarding the research results.

  3. Anonymous Coward
    Anonymous Coward

    Crucial point here, it doesn't become public knowledge. Hush, hush now.

    Does the ICO react to anything in 3 days? Or ever?

    The fact they allowed Microsoft's 'Get Windows 10' programme to go unpunished (I mean that in the sense of programme, not program) shows their complete lack of approach towards data consent/end-users on a mass scale.

    And of course, it's all worked out swimmingly for Marcus Hutchins...in making his knowledge available to authorities. Head above parapet and all that...drilling through into the wrong coalface seam, into that, of the security services.

    Let's face it, Amber Rudd (Elmer Fudd) with, or without her own technical incompetence, essentially let him travel to America, knowing he was on their watch list. He'd have flagged the pre-watch list in booking the flight, they let him travel to let them snare him. He'd embarassed the UK Government.

    Honest, moral as one is, blabbing to the ICO will do you no favours, this is Catch 22. Either way they have you in a noose, ready to kick the chair, if they so wish.

    1. teebie

      Re: Crucial point here, it doesn't become public knowledge. Hush, hush now.

      It doesn't say you can only tell the Commissioner or the controller.

      Although I'm sure someone will have fun with "intending to cause ... damage ... to a person", which doesn't specify that the person has to be the subject, it can be the person who failed to anonymise the data.

    2. ArrZarr
      Devil

      Re: Crucial point here, it doesn't become public knowledge. Hush, hush now.

      If the data from the Get Windows 10 programme was sufficiently anonymised, stored correctly and documented properly with the ICO, what's the problem?

      I'm not defending the programme, but this is outside of the ICO's remit, surely?

      Icon: Devil's advocate

    3. Doctor Syntax Silver badge

      Re: Crucial point here, it doesn't become public knowledge. Hush, hush now.

      "Does the ICO react to anything in 3 days?" It's up to you to inform within 3 days. After that it's up to them and nothing to do with you.

      "Or ever?" Shall we have a check to find out? Oh, look https://www.theregister.co.uk/2018/01/10/carphone_warehouse_slapped_with_400k_fine_after_hack_exposed_3_meeellion_customers_data/

      1. Anonymous Coward
        Anonymous Coward

        Re: Crucial point here, it doesn't become public knowledge. Hush, hush now.

        £320,000. CPW get a 20% discount for fast payment (let's hope the ICO doesn't lose their bank details).

        For all the regulatory pen pushing bullshit, I bet those Carphonewarehouse customers/card payment details will be pleased to know all the personal data is worth a mere 10p, to anyone and everyone.

        Let's face it - £400K doesn't go far in terms of securing data.

        Easier not to bother, hey?

        Most of the data is already out there, how will they prove different? Given the level of pointless, ineffectual ICO regulation you'll get for leaking it in the first place.

  4. Anonymous Coward
    Anonymous Coward

    This is all a bit silly.

    I would have thought the best way to ensure anonymous data is anonymous is to talk to someone that knows data and determine what data you can't include and what data you can't store together. When analysing the data split it off into two section, the granular which is what you want to see and the identifiers which are not important then strip them out and replace with a random generated unique identifier for that data. Store locations as the initial letter/s of the postcode and no more.

    Just my two penneth.

    1. ArrZarr

      With a sufficiently specific set of parameters, a malicious user could theoretically use known details about a person to find out more information about a specific person, no matter what the identifier may be. Further anonymisation is required.

      "talk to someone that knows data" In what context? Different data is required across multiple fields. If you're proposing an exhaustive list, that would be damn near impossible to draw up. Far better to build a framework to sit within. This creates the risks that the researchers that this amendment protects are searching for.

      1. Anonymous Coward
        Anonymous Coward

        I suppose what I'm getting at is that there are some parts of the data that should never be passed. There is always going to be a risk no matter what you do but you can limit those risks. Examples of this are, like I said location being first letters of postcode, only ever supplying year of birth and only when required, no names of course, if there is a subset location (for example a health provider location) then that is removed and replaced by an identifier. These are just some ideas but at a top level you could make it harder. I agree with the amendment because there will be mistakes that is guaranteed and you want these highlighting so they can be fixed, I do think it's a case of shutting the stable door once the horse has bolted though. The better way would be to not pass data around like a bag of sweets.

        1. The Mole

          The problem is that this is hard, very hard.

          Take for example of only including the first half of the postcode, that's pretty anonymous, unless of course you have multiple postcodes (home and work, home and holiday home) at which point you will start getting unique or near unique combinations - particularly when you start adding year of birth in.

          In isolation that data set may not be a problem, but combined with another one (land register maybe or just knowledge from facebook/friends) you can start to identify some classes of people.

          With those people you may then be able to de-anonymize your health provider location (presumably it is a consistent mapping otherwise it is useless), at which point you can then start to identify more people.

          Your main point is correct though, unless it has been successfully aggregated and combined much of data should just not be passed.

          1. Blotto Silver badge

            balance

            they need to balance the identifiers with anonymity, each identifier could be unique to that field value only re-identifiable by secured DB lookup but then the value of the data set could be worthless if the original identifiable information is missing or obfuscated to the point of being meaningless. Yes the resolution of the identifiers could be reduced but doesn't help when looking for things that are so rare that casual observation of its occurrence in a data set identifies the person it references.

            Its hard.

  5. ctdh
    Angel

    Did I miss something...?

    The EU GDPR regulation appears to specifically place our of it's scope fully anonymised data, it also makes the distinction of 'pseudonymised' data which remains in scope. This makes sense.

    The EU GDPR Recital 26 says "...The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes. "

    However the draft Data Protection Bill makes no reference to the use of anonymised or 'pseudonymised', it simply refers to 'any' data 'relating to' a data subject. The draft Bill states: “Personal data” means any information relating to an identified or identifiable living individual."

    Anonymised, 'pseudonymised' or plain data may all 'relate' to an identified or identifiable living individual. It does not say the individual has to be identified or identifiable from the data.

    However the draft Bill goes on to discuss the criminality of re-identification of de-identified personal data in section 167. There is a loose implication of the benefits of de-identified of personal data, but it goes no further. Also de-identified data held by one person may be pseudonymised data to another person because they have further confidential information required to re-associate the records to an individual.

    So under the proposed Bill is it still possible to process properly anonymised data, is such data out of scope of the draft Bill? The problem seems to be in the original definition in the draft Bill of 'personal data'

  6. Dodgy Geezer Silver badge

    Why don't they just ....

    ....make EVERYTHING legal, just so long as you inform the authorities that you're going to do it.....

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2022