back to article Microsoft calls AI privacy complaint 'doomsday hyperbole'

In early February, Microsoft accused the plaintiffs suing the software maker and its partner OpenAI over alleged AI privacy violations of evoking "doomsday hyperbole about AI as a threat to civilization." A variation on that phrase, which appeared in the Windows giant's motion to dismiss [PDF] the privacy lawsuit, surfaced in …

  1. Howard Sway Silver badge

    Nowhere do they say what of their private information Microsoft ever improperly collected

    I thought that they had proof of the AI spitting out NYT stories verbatim? If that's so, then MS are claiming that if it's on a public website, it's not "private" and can therefore be scraped and regurgitated, despite the copyright notices on that public website. If that's the case, the NYT could create a Large Operating System model, and scrape copies of Windows from the public Microsoft site and then regurgitate them too.

    1. that one in the corner Silver badge

      Re: Nowhere do they say what of their private information Microsoft ever improperly collected

      There is a big difference between the information contained in an NYT story, which is (now) public and can be freely repeated[1], and the expression of that information - the specific words and word order - which is what is copyrightable.

      Spitting out verbatim (too large a chunk of) an article can be a copyright violation.

      Repeating information from that article, using a different form of words - and especially if combined (even badly) with information gleaned from another source, is not.

      In terms of a human, one is plagiarism, the other is research.

      Not that any of this is supporting - or denigrating - LLMs, btw. Just - be accurate in thine attacks, lest ye be taken for an LLM thyself.

      [1] until we get to questions of "right to removal"/"right to be forgotten", as appears to be the actual point of the complaint. Which, unless someone can provide a reference, seems to be aimed clearly at the LLM but not at the source of the information, the NYT articles, even though the latter are easier to get the info. Strange, that[2]

      [2] cos MS is the Big Bad with deep pockets?

      1. Anonymous Coward
        Anonymous Coward

        Re: Nowhere do they say what of their private information Microsoft ever improperly collected

        NYT information isn't free, you need a subscription, so I suspect there's a Terms breach hiding in there somewhere as well.

        1. that one in the corner Silver badge

          Re: Nowhere do they say what of their private information Microsoft ever improperly collected

          > you need a subscription

          Again, that is an issue around copyright.

          For example, the majority of information (as opposed to entertainment) I acquire comes from paying a fee to get a copy of the writng that contains the information[1]: once I have learned that "'for' loops in C/C++ are just semantic sugar over the humble 'while' loop" I may be restricted from repeating that precise sentence[2] but I am in no way restricted in using that information nor in passing it on to all and sundry.

          > I suspect there's a Terms breach

          Putting in "you are limited as to how you use/replicate the information" is going to be really tricky and unenforceable - probably impossible: including all sorts of proofs that they "own" the information (which for a "news report" would mean that they made it up out of whole cloth). That sort of thing is closer to patent than copyright...

          [1] I am old and still very much attached to reading physical books; easier and so much more enjoyable

          [2] just go with me here; I know that specific example is too short and too generic to make a copyright claim stand up in any reasonable court.

      2. Alumoi Silver badge
        Joke

        Re: Nowhere do they say what of their private information Microsoft ever improperly collected

        Spitting out verbatim (too large a chunk of) an article can be a copyright violation.

        Repeating information from that article, using a different form of words - and especially if combined (even badly) with information gleaned from another source, is not.

        So, theoretically, if I'd get my hand on Windows source code, remove all the bugs, telemetry and useless crap, add a lot of impovements, it wouldn't be a copyright violation, right? After all, I'd be cutting it down to under 50% so it won't be a too large chunk of it.

      3. bigtreeman

        Re: Nowhere do they say what of their private information Microsoft ever improperly collected

        Research has to reference the original works.

        Plagiarism just steals.

        AI just steals and puts it all in a big mixer

  2. Pascal Monett Silver badge

    Not surprising

    Borkzilla automatically brands any privacy concern as doomsday fodder, because truly respecting our privacy would indeed be doomsday for it and many others.

    1. Snake Silver badge

      Re: Not surprising

      You are blaming Borkzilla, without laying blame & waste to the rest of the tech industry for doing the exact same thing??

      You know, like Samsung, Vizio, Apple, Ford, Google, Snapchat, Facebook, Uber, Amazon, Twitter, NewRelic, Akamai, LinkedIn, Verizon, Tesla, Waymo...

      I mean, just to name a few... -_-

  3. theOtherJT Silver badge

    the only way...

    ...not to surrender all of our personal information, family photographs, copyrighted works, art, and more would be to cease using the internet altogether.

    Um... yes? Good. You finally get it. Were they not paying attention for the last 25 years? The Internet is a public space, and where it's not public, it's a space where you are permitted to exist by the companies that own it. It always has been this way. Unless you run your own server and set your own terms about who can access it there has never been any expectation of privacy here. We were telling people 25 years ago that you need to be careful what you say online and who you say it to because once something is "out there" it's basically impossible to get it back. This isn't new, it's just something that - for lack of a better term "regular" - people are waking up to now because it's in the news. We've known this since the beginning. The internet isn't safe. It's never been safe.

    1. Mike 137 Silver badge

      Re: the only way...

      "Unless you run your own server and set your own terms about who can access it there has never been any expectation of privacy here"

      That depends on your definition of 'privacy'. If you define it merely as 'confidentiality' you argument stands. However, at least in Europe, privacy is defined as data subjects' right to control over who does what with their data. That right specifically exists even where the data in question has been posted in a public space, just as copyright does.

      1. theOtherJT Silver badge

        Re: the only way...

        Sure, but the internet doesn't exist only in Europe. Europe (or for that matter any other legislative body) can issue all the laws they want about the internet, but since it spans the entire planet by design you really can't trust that the specific place this particular bit of the internet is located gives a shit about laws where you happen to be.

        It's something no government has managed to get it's head around. The internet makes international boundaries basically invisible and irrelevant. It's the unregulated wild west out there, and we can all pretend that laws apply to it but they just don't because anyone who doesn't like said law can simply host their service somewhere where that law doesn't exist. I mean do you check that every service you're using is hosted in a jurisdiction where your rights are respected?

        1. 43300 Silver badge

          Re: the only way...

          "Sure, but the internet doesn't exist only in Europe. Europe (or for that matter any other legislative body) can issue all the laws they want about the internet, but since it spans the entire planet by design you really can't trust that the specific place this particular bit of the internet is located gives a shit about laws where you happen to be."

          True - but where mega corporations are concerned, countries or blocs such as the EU can impose rules on them.

          1. Alumoi Silver badge

            Re: the only way...

            They can only impose fines which bribes will make certain won't be too big. And the fines will be passed to the sheep, as always.

        2. Zippy´s Sausage Factory

          Re: the only way...

          The second you do business in a country, you're subject to that country's laws. The USA are rather sticklers on that point, as are the EU - rightly so.

          You can easily argue that acquiring the personal information of a person living in country X can only have been obtained directly or indirectly from country X, thus rendering you liable to the jurisdiction of the privacy laws in that country, whether you like it or not.

          The fact of the matter is that California privacy laws, Canadian privacy laws, the GDPR et al are going to apply to anyone who does business with anyone who lives in the country where those laws are on the books, and corporations will eventually realise that they have to pay attention to that and obey those rules.

          1. theOtherJT Silver badge

            Re: the only way...

            Laws only exist that can be enforced. They can claim the law applies to you, but if they have no way to enforce it, it doesn't. Corporations might eventually get slapped down hard enough to be forced to abide by some law or other but until that happens the only safe thing to assume is that they will continue to violate it. Even then let's be honest, they probably will. I've lost track of the number of stories on this very site of "Corporation gets fined approximately 4 minutes profit for being very naughty. Promises not to do it again." or "Government found to be in violation of it's own data protection rules. Agrees that this is unfortunate." I can't remember reading a story that said "Corporation forced to dissolve by bankrupting fine" or "Executives / government members jailed" in relation to the same.

            1. Anonymous Coward
              Anonymous Coward

              Re: the only way...

              > Laws only exist that can be enforced.

              Hopefully your intent is more "Laws are only useful and effective if they can be enforced".

              Pretty sure you[1] can find laws passed - let alone those proposed but failed[2] - that literally can not be enforced.

              [1] "you" being anyone who is better at search engines than myself; not really a high bar.

              [2] Ah, good ol' Indiana: House Bill 246 – to legally change the value of the number pi to 3.2

    2. Watashi

      Re: the only way...

      I guess a bigger question is this:- have companies used personal data that's not in the public domain to train their AIs? For example, if you wanted to train AI on, say, the kind of NYT articles people in various cities like to read, then you're not just using public data on the NYT website you're also using private data held in non-public company systems.

      Now, who here trusts the tech giants to strictly limit their AI training to pubic data?

    3. ecofeco Silver badge

      Re: the only way...

      Well we see at least 3 downvotes STILL don't get it.

  4. mark l 2 Silver badge

    Considering at one point you could prompt Chat GPT to generate Windows activation keys which were obviously scraped from publicly available sources, but then as soon as MS got wind of that they shut down that function to replace it with a notice on how piracy = bad.

    So MS clearly don't like their own works being in Open AI's LLM if it might loose Microsoft money, but are happy with other peoples work being scraped without permission.

    1. Anonymous Coward
      Anonymous Coward

      Please provide the regex that MS can provide to post-filter your information from the model's output.

      MS are happy to do what is trivial and crow about it.

  5. Alan Bourke

    Everything but everything is hyperbole

    with the AI hype train of venture capital bullshit.

  6. Dan 55 Silver badge

    Why the US doesn't get privacy episode 239,203,829,178

    MS argued:

    Plaintiffs do not plead any facts plausibly showing they have been affected by any of the supposed 'scraping,' 'intercepting,' and 'eavesdropping' they allege.

    As this is the US and there is no concept of privacy, you, the little guy, have to somehow prove monetary loss because your individual items of data were scraped, but big tech is allowed to scrape everything and use it all to make a new product which brings in billions.

    Where "little guy" means any person or business with a market cap smaller than Microsoft's.

  7. Doctor Syntax Silver badge

    I don't say they're wrong to bring the action but it would be a good idea to do a bit of prompt engineering to prove that the data can be regurgitated. Courts like evidence.

    1. IGotOut Silver badge

      They did. See other articles.

  8. Watashi

    GDPR

    Hmmm. So if I ask MS to tell me what data of mine they hold and how they are using it, are they obliged under GDPR to tell me what is included in the ChatGPT data sets? And is it the case that I already have the right under GDPR to ask MS to remove that data? What about data that they scraped from sites where I'm not explicitly identified? Can I ask for that data to be removed too?

    1. cyberdemon Silver badge
      Terminator

      Can I ask for that data to be removed too?

      Not if it's already been subsumed into GPT model weights. Then I believe it's impossible to remove without re-training the model from scratch.

      The best they could do would be to add another 'guardrail' to prevent questions or outputs specifically about you, but those are easily bypassed by anyone determined or privileged enough, and having too many of those makes their system shitter and slower, so they are unlikely to do it unless forced to by a court

    2. Mike 137 Silver badge

      Re: GDPR

      "are they obliged under GDPR to tell me what is included in the ChatGPT data sets"

      Unfortunately, even under the GDPR, Article 14.5(b) allows that if the effort to provide you with the information is 'disproportionate' or its provision 'is likely to render impossible or seriously impair the achievement of the objectives of that processing' they can refuse.

      Fun, isn't it!

  9. Anonymous Coward
    Anonymous Coward

    "Plaintiffs do not plead any facts plausibly showing they have been affected by any of the supposed 'scraping,' 'intercepting,' and 'eavesdropping' they allege. Nowhere do they say what of their private information Microsoft ever improperly collected or used; nor do they identify any harm they individually suffered from anything that Microsoft allegedly did."

    It's Microsoft.

    I'm certain they will eventually blacklist Wireshark because it's just too easy to prove privacy violations exactly so.

    Heck, there are whole articles written about it.

  10. Omnipresent Silver badge

    move along, nothing to see here.

    Microsoft sounds more russian every day now.

  11. ecofeco Silver badge

    Microsoft says...

    The rest of us respond, LOL wut?!

  12. johnrobyclayton

    I have an idea

    Lots of random number generators that are based on quantum effects.

    These are effected by every previous quantum event in history in their light speed cone.

    I dare say that there are a lot of people who have had their quantum data used by all of these random number generators to generate random numbers for a large variety of nefarious and profitable purposes.

    Could I get class action status to get some compensation for this?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like