back to article How do you anonymize personal databases and protect people's privacy – over to you, NIST

How do you protect people's privacy when you have big databases of personal records you want to share? That's the question that the US National Institute of Standards and Technology (NIST) has dug into in an extensive review [PDF] of the different methods that government departments and other organizations use when publishing …

  1. Graham Marsden

    Great, but...

    ... they miss one vital point:

    Corporate America DOES NOT WANT THIS!

    You are the product. Data about you is worth money. Your information can and will be bought, sold, traded, folded, stapled and mutilated in any way they want and all the "standards" you can name won't make a damn bit of difference if they are not backed up with serious legal penalties.

  2. This post has been deleted by its author

    1. Adam 1

      It's easy. You just disable paste.

      1. Anonymous Coward
        Anonymous Coward


      2. Anonymous Coward
        Anonymous Coward

        Disable paste

        And the governemt is proposing that these people keep logs of your browsing history for 12 months?

  3. Speltier

    Who is Bradley Cooper?

    And why should I care? Can't you pick someone better known to show getting into a NY cab, like Elvis?

    1. James Micallef Silver badge

      Re: Who is Bradley Cooper?

      "Can't you pick someone better known to show getting into a NY cab, like Elvis?"

      Elvis is the one driving the cab, but shhhhhh don't tell anyone, don't want to blow his cover!

      1. Trigonoceps occipitalis

        Re: Who is Bradley Cooper?

        And he hands over to Lord "Lucky" Lucan at midnight.

  4. Mark 85

    The NIST might come up with a standard, but which agency gets to enforce it and will the others abide by it? Given the way the government is be run, I'd say that whatever they come up with will be totally ignored by every department with the excuse from each department "we know how to do this better".... and then they hacked....

    I really wish we had a cynic icon.... instead, I'll use the result of this whole thing....

    1. Anonymous Coward
      Anonymous Coward

      Based upon past performance, FTC. They're the designated whiner over our (total lack of) privacy enforcement here. Not that they can DO anything about the NSA, FBI, DHS, DEA, DIA, .... Which is the, crux of the issues of Safe Harbour.

  5. I. Aproveofitspendingonspecificprojects

    Wow! Six comments already...

    Does that mean everyone elseis busy working on it?

    Or is everyone working on the obvious conclusion: "Getting into NSA will make me famous."

    1. Eddy Ito

      Re: Wow! Six comments already...

      Perhaps we just recognize mental masturbation when we see it.

  6. keristinium85

    If people are interested there are a few papers over IPC. The paper I've linked below covers a lot of the examples used by NIST and looks to challenge the view that de-identification of information will simply just lead to the data being later re-identified through the aggregation with other data sets.

    It's an interesting area of research, one that is very much in it's infancy but is so important to progress with.

  7. John Smith 19 Gold badge

    All of this is *far* too complicated for the UK government to understand

    Who will continue to wap out any dataset they can with virtually zero privacy protection and trust "the market" will "Do No Evil (TM)" with it.

  8. Displacement Activity

    Pseudonymised NHS data

    I wrote some software a few years ago to let GPs/PCTs/CCGs/etc (ie. UK family doctors and the people who pay them and fund medical care) identify anomalies in referral patterns, hospital admissions, length of stay in hospital, "GP performance", "over-referrals" (largely a myth, BTW) and so on. It was funded and used by local GPs - ie. the NHS itself - and the base dataset was the NHS Spine data.

    The software was great, but it was useless for the first year or so, because no-one would let me (ie. the GPs) see the raw data with DOBs and gender in it, and you can't do the stats without them. It took a year to get the authorisations, but without postcode (or, equivalently, deprivation) data. Much better, but you can't really be sure what's going on without post/zip code, which makes the data identifiable. I spent about a year trying to get the additional clearance, but there was so much politics in the local NHS that it was next to impossible. The whole system then imploded with the PCT/CCG changeover, and everyone's access to the data was withdrawn, and the funding went, and the NHS disappeared up it's own backside.

    So, the software has been unused for 2 years, and no-one in this area (and probably any other area) has any statistically valid way of finding out what's going on in primary care. The govt has now apparently decided that this is important again, so other people are now going to spend a couple of years dicking about trying to get the Spine data, before losing it again. And the whole pointless cycle will repeat again in another 5 years. And the base Spine dataset cost going on for a *billion* to create, plus maintenance.

    So, if you're worried about the privacy of your NHS data - don't be. Everyone in charge is so stupid and paranoid that no-one's ever going to see it anyway.

    1. Just Enough

      Re: Pseudonymised NHS data

      So the basis for you calling everyone in the NHS stupid is that they are paranoid about data protection?

      And that's a bad thing?

  9. Primus Secundus Tertius

    Anonymous aggregates

    I cannot imagine that lay people, i.e. politicians and journalists, will ever understand the difference between aggregated data and anonymous data.

    Aggregated data would say, for example, the average blood pressure in postal area GU99 is P with standard deviation (*) Q. 'Anonymised' data says that Mr Z of GU99 9ZZ has blood pressure P. When the marketing droids also establish that Mr Z drives a red car and owns five computers, it all becomes uncomfortable.

    It would be nice if database queries were restricted to aggregate data, but I don't see that as practical.

    (*) Another term I have yet to see any journalist or poitician understand.

  10. UlfMattsson

    Urgent need

    I agree that "Given the growing interest in de-identification, there is a clear need for standards and assessment techniques that can measurably address the breadth of data and risks," but standards may take an additional 10 years to agree on and enforcing regulations is always difficult.

    We know that NIST is concluding that "Many of the current techniques and procedures in use, such as the HIPAA Privacy Rule’s Safe Harbor de-identification standard, are not firmly rooted in theory." It may take many years to fix this issue.

    We know that "the risk depends upon the availability of data in the future that may not be available now." So we need a policy driven approach that can be easily adjusted over time as more data is available.

    I like to consider employing "a combination of several approaches to mitigate re-identification risk. These include technical controls." I've seen two interesting technical approaches that can provide a balanced combined solution to address the growing issue of privacy and access to data. The first approach is based on a service oriented privacy-preserving data publishing. This service oriented approach can provide policy driven control over how combinations of different data is accessed and the accumulated volume of data that is accessed. The second approach is based on data tokenization and dynamic masking, can secure the data itself against misuse and theft.

    I think that a balance between the first and second approach can provide an attractive data centric solution for different sensitivity levels.

    I agree that we need a "balance between providing privacy and useful data," and we are running out of time to fix this growing issue.

    Ulf Mattsson, CTO Protegrity

  11. Novatone

    Quantization would be a good first step.

    Researches don't really need a birthdate accurate to one day, quanize to 1 or 1/2 month.

    Group small zip code areas into slightly larger areas.

    Quanize other location and time information.


    1. Just Enough

      Quantization only works if the researchers know exactly how it was done while analysing.

      If you quantize, for instance, birthdate by month, and then analyse births by day of the week, you're going to get false results.

      If you group your zip codes, then analyse by latitude divisions that intersect your groupings, again the results will be meaningless.

      Hopefully the results of these would be obviously weird. But other quantizing may not be so obvious and missed.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like