back to article Microsoft leaks 6.5TB in Bing search data via unsecured Elastic server. *Insert 'Wow... that much?' joke here*

Microsoft earlier this month exposed a 6.5TB Elastic server to the world that included search terms, location coordinates, device ID data, and a partial list of which URLs were visited. According to a report from cyber-security outfit WizCase, the server was password-protected until around 10 September, when "the …

  1. chivo243 Silver badge

    Must be all of Bing data?

    I think I may have been duped into using Bing once or twice on a fresh install of windows, who used it a lot?

    1. druck Silver badge

      Re: Must be all of Bing data?

      I think most people just use it to search for google.

      1. David 132 Silver badge
        Unhappy

        Re: Must be all of Bing data?

        I wonder what fraction of the searches are for things like "freec", "word", "handbra" etc... the only time I've ever Bing searched is inadvertently, when I type the first few letters of a program name into the Win10 start/search box, and the braindead OS searches the web for it instead of, you know, the local hard drive.

        And the necessary hacks/voodoo to disable web search get trickier and trickier with each new build of Windows.

        1. Zippy´s Sausage Factory
          Meh

          Re: Must be all of Bing data?

          I just install OpenShell and have done with it. Sends nothing to anyone, fortunately. That said, I only use Windows on a works PC anyway. I haven't yet had a fight with IT over it but if I do, I'm not looking forward to it.

        2. Anonymous Coward
          Anonymous Coward

          Re: Must be all of Bing data?

          "...braindead OS searches the web for it instead of, you know, the local hard drive."

          Remember the huge uproar when Ubuntu tried to encorporate Amazon adverts into the Unity Dash?

          https://www.theregister.com/2012/09/24/ubuntu_amazon_suggestions/

          I love this bug report filed about this issue:

          https://bugs.launchpad.net/ubuntu/+bug/1070111

          Bug Description:

          "The new spyware feature of Unity Dash in 12.10 is a welcome improvement, but needs to be extended to all Ubuntu packages.

          Ever since I switched my home computers to GNU/Linux, closed my Facebook account and stopped using Google Search, I have been suffering. Finally now with Dash, some of my private information is once again being disclosed along with my IP address to Facebook, Twitter, Amazon, BBC and Canonical, but this is limited.

          Dangerous gaps still remain in my files in the mass surveillance databases of several governments. Plus I find myself at the supermarket wondering what to buy, not having seen enough targeted advertising."

          This was quickly removed soon after Linux users expressed their outrage.

          But sadly, with Windows 10, victims expect this kind of bullshit.

    2. Anonymous Coward Silver badge
      Linux

      Re: Must be all of Bing data?

      Oh come on, we know how efficient Microsoft is when it comes to storage space.

      That 6.5TB is probably just 20000000 copies of an installer and corresponding DLL files.

    3. Dinanziame Bronze badge
      Holmes

      Re: Must be all of Bing data?

      If you only count desktop searches in the US, the market share for Bing is around 13%. If you only count mobile searches, it is just above 1%... There's your answer. A lot of people just use the default search engine without changing it; and on Desktop, which more often than not means Windows, that default is Bing.

      There's a reason Google is paying billions to Apple to be the default search engine on iPhones.

    4. Anonymous Coward
      Anonymous Coward

      Re: Must be all of Bing data?

      Bing is better than google for porn

  2. Anonymous Coward
    Anonymous Coward

    How much of it was for Google or Firefox?

    What percentage? I'd wager 25%.

  3. karlkarl Silver badge

    Often when using Tor, Google is pretty hostile in terms of "anti-robot" tests.

    In that case I alternate between Bing and DuckDuckGo depending on my exit-point and if they work or not.

    Usually when the only thing that works is Bing... I just restart Tor to get a different exit point haha!

    1. Dan 55 Silver badge
      Holmes

      DuckDuckGo uses Bing as a provider, as do other search engines. Maybe they're included in this dump as well? I.e. DuckDuckGo searches are less anonymous than thought?

      1. fidodogbreath Silver badge

        My understanding is that DDG strips identifiers before passing the query to Bing. They also do not log IP or browser-agent strings, and by default do not set any cookies. DDG is as anonymous as you're likely to get for free.

    2. Bruce Ordway

      Tor... Google is pretty hostile in terms of "anti-robot" tests.

      Yes....Google search also routinely objects to my use of a VPN.

      The anti-robot testing seems very erratic too. Sometimes the challenge may be matching 3 images and the next time it could take 20 image matches before a successful "Verify".

      Due to this, I also use DuckDuckGo occasionally. In general I'm not a big fan of their search results or the connection to Bing.

      1. fidodogbreath Silver badge

        Re: Tor... Google is pretty hostile in terms of "anti-robot" tests.

        Startpage uses Google for the underlying search, but strips out the data that would enable personalized results. So Google results, but as if every query is a first-time user connecting with a freshly-installed browser from a shared IP.

        In addition to the privacy boost, it is helpful for avoiding personalized filter bubbles and echo chambers.

        Personally, I don't want Google / Bing / etc pandering to my perceived biases in order to "boost engagement" aka ad impressions. If I decide that the tech bros should dictate what I'm allowed to see, I can always go directly to Google or Bing instead of filtering queries through SP or DDG.

  4. IGotOut Silver badge

    Slag of Bing...

    but a least it doesn't rig image searches when you don't use their own browser.

    Hint: Do an image search with a n other browser on Google, then compare with Chrome. Note the lack of filtering options when not using Chrome?

    1. Flocke Kroes Silver badge

      Re: Slag of Bing...

      I noticed the lack of everything with javascript disabled. Bye bye google image search.

  5. Doctor Syntax Silver badge

    "simplified privacy controls"

    None.

    What could be simpler?

    1. overunder Silver badge

      Not using it... ? But I guess that's cheating in this context.

      Unrelated, I will say the API to scrap higher res images from Bing is much friendlier than that "other" one that wants to get to own know you. DuckDuckGo's is great too, but somehow Bing's returns higher res ones more consistently. NOTE: This only includes 4k'ish image sizes or greater as anything smaller in the neighborhood of 1080p is easy everywhere, but also as stated further above, Bing is TOR/VPN blind so you can really abuse the hell out of it when scraping (I've probably scraped 6.5TB in high res images from it :-/).

  6. Pascal Monett Silver badge
    Flame

    "The data was, apparently, not encrypted"

    That is just about as damning a sentence as one can write in this kind of case. In what world does a major multinational behemoth create a database of user-identifiable data and not encrypt it ?

    There should be a law on that.

    That, and the fact that the authentication was removed (why ??) means that I am quite happy to have never used Bing and won't be using it any time soon.

    At least not until my aneurysm. After that, no guarantees.

    1. Anonymous Coward
      Anonymous Coward

      Re: "The data was, apparently, not encrypted"

      Why was it designed that such a database was visible to the internet? Password or no password, it shouldn't have been accessible this way at all.

      1. Anonymous Coward
        Anonymous Coward

        Re: "The data was, apparently, not encrypted"

        Its in the cloud, and well if you look through your windows, bingo you will be able to the beautiful vista, with the azure skies filled with the scattering of clouds.

      2. Anonymous Coward
        Anonymous Coward

        Re: "The data was, apparently, not encrypted"

        Accessible or not accessible, it should have been encrypted. And the access monitored.

        1. teknopaul Silver badge

          Re: "The data was, apparently, not encrypted"

          Not sure what you are asking.

          Maybe it was encrypted, at rest, but if the GUI is visible data will be decrypted before sending to users.

          You can't encrypty all data in a db so that only clients can read it because then the db cannot search or index it.

  7. Anonymous Coward
    Anonymous Coward

    6.5TB is quite a lot of data. I bet the internet bill for those two people is massive.

    Had to be done.

    1. Ken Moorhouse Silver badge

      Re: I bet the internet bill for those two people is massive.

      What those two people would be more concerned about is if their drug-dealing, bomb-making searches have been uncovered.

    2. overunder Silver badge

      Well, unencrypted doesn't mean it wasn't compressed. 6.5TB of compressed, mainly human inputted text is... well a lot. Also considering the text is realistically only using a variance of what, ~64 characters?...? The way I look at it, if you start searching something and you have to start worrying about not just RAM but actual real time, you've got a lot of data (I will not be piping that).

      1. Circadian

        The majority of the data concerned “customer” ids (ad id, device id etc.) which are not very compressible...

  8. N2 Silver badge

    I was unaware

    Dido Harding worked for them.

    1. Captain Badmouth

      Re: I was unaware

      Dido Harding hung around their offices some time back.

      Fixed.

      1. David 132 Silver badge
        Happy

        Re: I was unaware

        You could both be onto something there. Maybe that's Dido Harding's new, "world beating" strategy for NHS track-and-trace.. multiple Bing searches?

        "Does Aaron A. Aaronson have COVID?"

        "Does Aaron A. Abraham have COVID?"

        "Does Aaron A. Acheson have COVID?"

        (repeat for the next 5000+ pages of the telephone directory... by next June, funding and further gongs/peerages permitting, she hopes to have moved on to the B's...)

  9. David 132 Silver badge
    Coat

    Here's a snippet

    At great personal cost, the bribery of several Eastern European border guards, a death-defying sprint across the North Korean DMZ, and ultimately when all else had failed, two hours in a squelchy field shifting marker pegs a few but occultly incredibly significant metres, I can present an extract from the leaked data, showing people's searches:

    "How to uninstall siri"

    "how to uninstall siri"

    "stop edge taking over as my default browser"

    "help I can't uninstall Siri"

    "comment à désinstaller Siri"

    "how do I get Windows 7 back"

    "www.google.com"

    1. yoganmahew

      Re: Here's a snippet

      You forgot "where's my ducking document gone?"

    2. TXITMAN

      Re: Here's a snippet

      "Download Chrome"

  10. yuhong

    CompatTelRunner

    In the meantime, I am still focusing on CompatTelRunner and what is now called Desktop Analytics, which is shipped in both Windows 10 and Windows 7/8/8.1 updates. I believe that Desktop Analytics (formerly Upgrade Readiness and Upgrade Analytics) works by associating the collected data with a "commercial ID" that is entered into the registry. The most likely reason why CompatTelRunner is causing high CPU and HDD/SSD usage (which even MS employees not working in the Windows division complain about) is Appraiser doing for example application and device inventory, which has "enterprise" and "indicator generation" modes that to this day MS refuses to document.

  11. astoundingaardwark
    Stop

    GDPR

    Do El Reg's journos know if it has been reported to EU authorities re: GDPR? ...and can we expect you to follow up on this in the future, when they're given a huge fine?

    1. diodesign (Written by Reg staff) Silver badge

      Re: GDPR

      We'll check it out!

      Edit: Bear in mind it had no direct personal information so there might be no interest from the ICO and no relevance to GDPR.

      C.

  12. Anonymous Coward
    Anonymous Coward

    Manure

    "A high level of trust is required, and this kind of incident is damaging to that trust. "

    Or, just wishful thinking?

    The deal is, if you put it on the internet, it's in the public domain. Don't pretend otherwise. You may think you have privacy now, but all that you think of as secret, at this current fork in the multiverse, is public domain as soon as quantum computing matures. So, measure that how you like, but what you have is not secure.

    If you want to unplug, unplug, but don't pretend you're owed, or can have, some kind of privacy. You can't. That horse is off and way down the field.

    Posted Anon, because, it's all public domain, and El Reg has a bullet proof security policy and I feel that my trust is safe with their IT systems :D

    1. Anonymous Coward
      Anonymous Coward

      Re: Manure

      Posting as AC pretty much invalidates the entire thing.

      Do you believe that shit you posted or not?

  13. TheProf Silver badge
    Headmaster

    Really?

    " the database was vanished from public view"

    Was vanished? WAS VANISHED??

    My mental gears are making that unfunny grinding noise again.

    (Cue thousands of English historians pointing out that Shakespeare used the very same words to describe some bear hastily exiting to the left.)

    1. Gerhard den Hollander

      Re: Really?

      Bear was vansihed , exit, chased by Antigonus ....

      1. David 132 Silver badge
        Happy

        Re: Really?

        Like an old oak table...

  14. DwarfPants
    Big Brother

    Valuable data

    It is obviously very valuable to MS seeing as it periodically just said Meow and nobody noticed.

    Or it is usually full of cat picture searches and it saying Meow was not out of the ordinary

  15. heyrick Silver badge

    not precise enough to get an address

    If you live urban, no.

    If you like on the outskirts of a town, perhaps.

    If you live rural, definitely.

    500m around me is... fields. Maize, the odd wild boar, the trees that the sociopathic neighbour farmer hasn't yet cut down, and some kittens.

    But however you look at it, a GPS coordinate with a half kilometre accuracy will identify one property. And from that, an address. Which can lead to a name. Me.

    1. Androgynous Cupboard Silver badge

      Re: not precise enough to get an address

      I'll think you'll find 500m is an approximate way of saying "approximately". No one has, to the best of my knowledge, worked out what the 95% confidence interval is from a reported location and the actual location. As your GPS position is not included with a search, your actual location is unknown, so drawing a 500m circle around it is somewhat problematic.

  16. Anonymous Coward
    Anonymous Coward

    I thought bing was just google + Microsoft css

    Who knew?

  17. Bonegang

    Not. Encrypted.

    In this day and internet age?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020