back to article OpenAI claims New York Times paid someone to 'hack' ChatGPT

OpenAI has accused The New York Times Company of paying someone to "hack" ChatGPT to generate verbatim paragraphs from articles in its newspaper. By hack, presumably the biz means: Logged in as normal and asked it annoying questions. In December, the NYT sued OpenAI and its backer Microsoft, accusing the pair of scraping the …

  1. IGotOut Silver badge

    In summary....

    ...we were caught stealing other people's stuff but they ignored the Do Not Trespass signs on the lawn, so it doesn't count.

    1. Richard 12 Silver badge

      Not even that

      We were caught stealing other people's stuff after we invited them to look around our house, but they looked in the cupboard where we keep our stolen stuff, so it doesn't count.

    2. Catkin

      Re: In summary....

      I think you're restricting yourself to the pop culture definition of hacking, rather than the broader tech understanding of the word. The NYT exhibit could be regarded as something of a hack because an individual who has not accessed the NYT's site is unlikely to have access to the URLs or opening paragraph.

      Therefore, if the legal argument for the NYT rests on regurgitation, they'll have a difficult time demonstrating that anything approaching a typical user would be exposed to a copy of a given article. I don't think Open AI have ever denied ingesting articles but I also don't think the whole NYT case hinges on regurgitation; it seems more like overly expensive lawyer showmanship on both sides of this particular exhibit/counterargument.

      1. doublelayer Silver badge

        Re: In summary....

        They need the regurgitation demonstration to prove it unequivocally to the court. OpenAI and its adherents have frequently come up with arguments as to why their output is the result of convergent evolution, it's just like any brain, etc. That is more difficult to argue when the quotations are direct. The issue is as important whether the regurgitation is total or partial, but that may not be as obvious to everyone, so showing that it can be made to quote the whole thing demonstrates the violation more clearly.

        1. Catkin

          Re: In summary....

          I think it excessively focuses the scope and leaves a window for OpenAI to turn around, add in a few guardrails (ironically, by directly storing NYT articles) and say "see? we fixed it". In early January, OpenAI described regurgitation as a "rare bug" but please do cite where they repeatedly denied it could happen at all.

          In my view, a better strategy would be to use data on referrals from search engines to compare and contrast their articles against the output of ChatGPT to show how it might dis-incentivise visiting the NYT, since the primary and recurring complaint is unfair competition without compensation. Another component of their initial legal complaint was that their content was given "particular emphasis" so it wouldn't be hard for the defendants to point out that the URLs had to be included (I don't imagine their legal team would have done this if it weren't necessary).

          Then again, I'm not a highly paid (or in any way paid/qualified/registered) lawyer.

          1. Anonymous Coward
            Anonymous Coward

            Yeah, no

            The whole issue about how much prompt engineering it took is a straw man, to distract from the more damning issue that they scraped the NY times for free content. They then claimed it didn't matter because ML models were either "fair use" or some new category because even though they jacked some else's IP without paying/asking, it was transformed into something new that didn't contain the original material.

            That's horseshit as the Times has now proven at some considerable effort. The next step to exempt all of the other smaller players from having to repeat the same exercise at great personal expense to force OpenAI and all of the other ML pirates to disclose what they stole and either licensee it or purge it and retrain their systems. OpenAI does not like that answer of course, so they keep lying and deflecting.

            Ironically their "steal all the data and make a model so big no one else can afford to replicate it" plan turns out to be shit, as a model trained on all of the toxic stupidity of the unfiltered internet is critically flawed on so many levels that even the full corpus of every reputable newspaper ever published can't bleach the stain out of the systems output.

            1. Catkin

              Re: Yeah, no

              As I said initially, I don't think the case hinges on it, I just wanted to discuss the relative merits of the use of the word 'hacking'. But, while we're discussing this.

              They then claimed it didn't matter because ML models were either "fair use" or some new category because even though they jacked some else's IP without paying/asking, it was transformed into something new that didn't contain the original material.

              The word you're looking for is "transformative" and I don't think you fully understand how it applies in copyright. I'm not claiming ChatGPT is or isn't transformative, that's precisely why there is a legal case but your explanation doesn't match the legal definition. Here, I do think the NYT legal team has it right, because their main argument is that it's non-transformative so, if they can prove it, they will have a case.

      2. Doctor Syntax Silver badge

        Re: In summary....

        The relevant issue here is that ChatGPT is providing a service which is based on what is essentially a database which is derivative of the training materials. The regurgitation is simply a means of providing proof of that. The value of ChatGPT and hence of OpenAI depends in part on their software for which they've (presumably) paid their developers but also on the training material for which they might or might not have paid but the use of which will exceed the T&Cs attached to it.

      3. JoeCool Silver badge

        Re: In summary....

        this idea is lacking in merit. OpenAI is using the term in exactly the context you are suggesting they are not.

        otherwise, they would be admitting that the times article is retreivable to any customer, from a supported and intentionally designed way to use their product (Eg., using info posted in the yet to be written lifehacker article "get a times article for free")

  2. This post has been deleted by its author

  3. charlieboywoof

    Strange

    you couldn't pay me enough to read the truly disgusting NYT

    1. Catkin

      Re: Strange

      No problem if that's your actual opinion but could you perhaps be thinking of the New York Post?

      If you do mean the Times, could I ask why they're so objectionable to you?

      1. claptwice

        Re: Strange

        "They don't harm the people I want harmed' probably.

        1. Catkin

          Re: Strange

          Ah the ol' Stalin vs Trotsky schism.

    2. disgruntled yank

      Re: Strange

      @charlieboywoof

      Well, with OpenAI, apparently we have a robot to take it over for you.

  4. vtcodger Silver badge

    A Ghastly Thought

    It just came to me in a burst of insight that AI could replace not only marketeers, but lawyers. Wrt to the marketing types, I doubt we'll notice the difference. AI, no matter how clueless, might even be an improvement from the client (i.e. victim) point of view. Lawyers however perform a useful and necessary function in modern societies. Albeit, not all that well. But they could be worse. And AI can probably achieve that.

    I am contemplating spending my declining years in a mountain cave suitable for hermits. Anyone know of such available for a long-term lease at reasonable rates? I might even come out in a few decades (if I last that long) to see if anything is left of society after this craziness plays out.

    1. Peter Prof Fox

      Asking questions protocol

      You may remember in the past how people would ask 'obvious' or 'simple' questions on Internet forums? They would get answers such as "Google is your friend." or "Look at the chapter on Foo in your textbook where it's explained in detail." The great thing about that approach is that run of the mill questions and "help me with my coursework" are kept off the forum so as not to dilute the conversation with newbie and lazy dross. Now the same can apply to AI. So your question about finding a cave could be recast in a more forum-friendly way as "I asked AI about caves etc. and it said ... Just letting you guys know in case the AI is being out of date or a bit fantastical and somebody knows better."

      I'm all for passing on wisdom and technical experience but I can't be arsed to start at page one every time. And sometimes you'll get my gratuitous opinion as well, tailored to an audience who can understand it.

    2. Doctor Syntax Silver badge

      Re: A Ghastly Thought

      "It just came to me in a burst of insight that AI could replace ... lawyers."

      Unsuccessfully so far.

      1. Anonymous Coward
        Anonymous Coward

        Re: A Ghastly Thought

        The secret there is that you need to replace the lawyers and judges as a matched set. The resulting inscrutable legalese will flummox onlookers while sufficiently emptying their pockets.

        Just replacing the one part of the staff will obviously tip of the others, because they will know they aren't a chatbot. As long as it's all bots on bots then then only the worst screw ups will be caught by outsiders, at which point the smart move is to have a whole set of regulator bots with simple logic that causes them to rule in favor of whoever complained, issue a generous settlement that makes up less than 1/10 of 1% of the take and comes with a gag order clause to shut them up with no admission of wrong doing.

        Probably take a couple decades before the wheels fell off.

  5. Anonymous Coward
    Anonymous Coward

    If you deliberately ask for it…

    …and you get it…. …who is the one infringing?

    1. Anonymous Coward
      Anonymous Coward

      Re: If you deliberately ask for it…

      they might have a point if they just asked for it, they jumped through a huge number of hoops to force it out

      1. vtcodger Silver badge

        Re: If you deliberately ask for it…

        "they jumped through a huge number of hoops to force it out"

        Not sure that matters. At least not in the US. Providing a way -- however devious -- to bypass IP protections that doesn't fall under exceptions like fair use is, I'm pretty sure, illegal and actionable. Might be different elsewhere. Caveat: I am Not A Lawyer (And wouldn't want to be).

        1. Anonymous Coward
          Anonymous Coward

          Re: If you deliberately ask for it…

          sine qua non works both ways here, if you examine their method, you will understand why

      2. doublelayer Silver badge

        Re: If you deliberately ask for it…

        It doesn't matter how many hoops I jump through. If I find that a certain set of queries to OpenAI's servers gives me the source code to GPT, that does not make it my source code. I had to jump through a lot of hoops to retrieve it, but my willingness to take that action does not change the legal situation I'm in. The law considers the results and the intent, not the bit in the middle. Taking the code or the articles without permission is illegal. The only reason why getting the quotations is necessary is that OpenAI would, without them, be willing to lie that they were not in the training data. This action does not create or cancel the crime. It only provides evidence of it.

    2. doublelayer Silver badge

      Re: If you deliberately ask for it…

      The one who gave it to you without the permission to do so, and if you don't have permission to have it, you to. The newspaper does have permission to have their own articles, so in this case, it's just OpenAI who are infringing.

    3. Doctor Syntax Silver badge

      Re: If you deliberately ask for it…

      The infringement is that it's in there. The hoop jumping is simply a means of proving that.

  6. Brewster's Angle Grinder Silver badge
    Joke

    So asking the wrong question is now "hacking"?

    We asked 100 Authoritarian dictatorships whether asking the wrong question should count as hacking? And 1012% said yes!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like