back to article CompSci boffins propose scheme to protect privacy in database searches

From stock searches to map directions, any time a user queries a database, they tell the database owner something valuable. A group of researchers from MIT's Computer Science and Artificial Intelligence Laboratory is offering a way that queries can be made private, by breaking them into pieces and having different (identical) …

  1. Neil Barnes Silver badge

    But is this not security through obscurity?

    In the database example, the issue as I understood it resolves to basically grabbing huge chunks of the database and discarding the bits you're not interested in, swamping it/them with irrelevant queries.

    Or on an internet search or map search, starting multiple random searches with your particular search buried in there somewhere (as has been proposed previously to upset the large search engines' prediction algorithms.

    I don't suggest it doesn't work, but it does seem to need either a lot of databases, or a lot of local storage, or a lot of bandwidth - possibly all three.

    1. Pascal Monett Silver badge


      DBadmins are going to wince while reading that.

      Engineers and experts have spent the past thirty years trying to lighten the load imposed by database queries, and here these guys come proposing to apply each and every query to every record in the database - in multiple ways.

      The performance impact is going to be ugly.

    2. Warm Braw Silver badge

      Re: But is this not security through obscurity?

      That's not how the paper suggests it works. Simplifying the example given:

      SELECT COUNT(*) FROM items WHERE ItemId = ?

      you devise a series of functions f1(x), f2(x), ... and ask each parition to calculate the sum of their own f(x) over all the rows in the table. You devise the functions in such a way, for example, that the sum of all the functions returns "1" for each x where ItemID has the requested value and "0" otherwise. You can then pass each partition a different function, but if you add the results together you get the query result you're looking for.

      So although you execute multiple queries, only an aggregate result is returned from each partition.

      Each f(x) is a point or interval function (i.e. it returns 1 if x is either a selected single value or within a range of selected values) and the claim is that although each parition of necessity knows f(x), it cannot determine the magic values that cause f(x) to return 1, thereby making the original query parameters undiscoverable. That's the point that maths turns into magic as far as I'm concerned...

      1. Doctor Syntax Silver badge

        Re: But is this not security through obscurity?

        "That's the point that maths turns into magic as far as I'm concerned..."

        ...and where the load goes through the roof as the DBA is concerned.

  2. Anonymous Coward
    Anonymous Coward

    So SELECT * FROM *

    Get a copy of ALL the data and bin what you don't want just to keep the DBA guessing as to what you were looking for?! Or in this case repeat for multiple databases of the same data (who even uses such a thing)

    Sounds a huge waste of time and resources against a very unlikely real world scenario.

    Its mastabatory maths at best.

    1. Doctor Syntax Silver badge

      Re: So SELECT * FROM *

      "Sounds a huge waste of time and resources against a very unlikely real world scenario."

      Unlikely? It happens all the time. It's how the likes of Google work out how to show you ads for stuff you bought last week. They've still not worked out that the information they glean from it can go stale PDQ.

      The shortcoming of this whole scheme as far as I can see is the the people who'd need to make this work are the very people who wouldn't want it to work so it's not really going to happen. Anyone who wants to offer a search service where they don't know find out what the user was looking for has a much simpler solution. Don't look.

  3. David Roberts


    First you have to have a fully distributed data base, I assume without automatic load sharing so that you can guarantee where each bit of the query goes.

    Then, especially with maps, you have to maintain consistent dummy areas in your searches to prevent analysis of multiple requests finding one consistent area amongst the noise.

    Always remember that if you have a distributed database then some DBA is likely to have access to all instances.

    Edit: your IP address probably says where you are anyway.

    1. Ian Michael Gumby

      Re: Tricky?

      Its not the DBA, but that the query hides data from you such that you don't know its missing.

      Essentially you only get to see the data where all parts agree. If one out of the N parts doesn't you don't get to see the data.

      There are other simpler solutions to do this... essentially cell based encryption also works, however there's more overhead.

  4. MJB7

    Re: But is this not security through obscurity?


    This will be similar in principle to Shamir secret sharing, where you give n people shares of your secret (or more plausibly, write the shares onto n smartcards), and it then requires k of them to come together to recreate the original secret. The magic of the algorithm ensures that if only k-1 collude dishonestly with an attacker, the attacker is no better off than trying to brute force the secret from scratch.

  5. druck Silver badge


    How about; if you don't trust the database owner, don't use the data. A solution looking for a problem, if I ever saw one.

    1. allthecoolshortnamesweretaken

      Re: Duh

      "if you don't trust the database owner, don't use the data"

      Not always an option.

  6. Anonymous Coward
    Anonymous Coward

    Performance Drop

    Definitely not for MSSQL then.

    That lumbering bastard already returns results as predictably as a stubborn toddler.

    On another note, I obscure my intentions by scrambling the layout of the keys on my keyboard and picking a language I dont know the layout of.

    This means it takes dozens of attempts to type something correctly thus masking what im trying to do.

    Ive been fired 18 times because apparently my code looks like a jumbled mess, Ive tried explaining that thats the point but nobody gets it.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020