In summary....
...we were caught stealing other people's stuff but they ignored the Do Not Trespass signs on the lawn, so it doesn't count.
OpenAI has accused The New York Times Company of paying someone to "hack" ChatGPT to generate verbatim paragraphs from articles in its newspaper. By hack, presumably the biz means: Logged in as normal and asked it annoying questions. In December, the NYT sued OpenAI and its backer Microsoft, accusing the pair of scraping the …
I think you're restricting yourself to the pop culture definition of hacking, rather than the broader tech understanding of the word. The NYT exhibit could be regarded as something of a hack because an individual who has not accessed the NYT's site is unlikely to have access to the URLs or opening paragraph.
Therefore, if the legal argument for the NYT rests on regurgitation, they'll have a difficult time demonstrating that anything approaching a typical user would be exposed to a copy of a given article. I don't think Open AI have ever denied ingesting articles but I also don't think the whole NYT case hinges on regurgitation; it seems more like overly expensive lawyer showmanship on both sides of this particular exhibit/counterargument.
They need the regurgitation demonstration to prove it unequivocally to the court. OpenAI and its adherents have frequently come up with arguments as to why their output is the result of convergent evolution, it's just like any brain, etc. That is more difficult to argue when the quotations are direct. The issue is as important whether the regurgitation is total or partial, but that may not be as obvious to everyone, so showing that it can be made to quote the whole thing demonstrates the violation more clearly.
I think it excessively focuses the scope and leaves a window for OpenAI to turn around, add in a few guardrails (ironically, by directly storing NYT articles) and say "see? we fixed it". In early January, OpenAI described regurgitation as a "rare bug" but please do cite where they repeatedly denied it could happen at all.
In my view, a better strategy would be to use data on referrals from search engines to compare and contrast their articles against the output of ChatGPT to show how it might dis-incentivise visiting the NYT, since the primary and recurring complaint is unfair competition without compensation. Another component of their initial legal complaint was that their content was given "particular emphasis" so it wouldn't be hard for the defendants to point out that the URLs had to be included (I don't imagine their legal team would have done this if it weren't necessary).
Then again, I'm not a highly paid (or in any way paid/qualified/registered) lawyer.
The whole issue about how much prompt engineering it took is a straw man, to distract from the more damning issue that they scraped the NY times for free content. They then claimed it didn't matter because ML models were either "fair use" or some new category because even though they jacked some else's IP without paying/asking, it was transformed into something new that didn't contain the original material.
That's horseshit as the Times has now proven at some considerable effort. The next step to exempt all of the other smaller players from having to repeat the same exercise at great personal expense to force OpenAI and all of the other ML pirates to disclose what they stole and either licensee it or purge it and retrain their systems. OpenAI does not like that answer of course, so they keep lying and deflecting.
Ironically their "steal all the data and make a model so big no one else can afford to replicate it" plan turns out to be shit, as a model trained on all of the toxic stupidity of the unfiltered internet is critically flawed on so many levels that even the full corpus of every reputable newspaper ever published can't bleach the stain out of the systems output.
As I said initially, I don't think the case hinges on it, I just wanted to discuss the relative merits of the use of the word 'hacking'. But, while we're discussing this.
They then claimed it didn't matter because ML models were either "fair use" or some new category because even though they jacked some else's IP without paying/asking, it was transformed into something new that didn't contain the original material.
The word you're looking for is "transformative" and I don't think you fully understand how it applies in copyright. I'm not claiming ChatGPT is or isn't transformative, that's precisely why there is a legal case but your explanation doesn't match the legal definition. Here, I do think the NYT legal team has it right, because their main argument is that it's non-transformative so, if they can prove it, they will have a case.
The relevant issue here is that ChatGPT is providing a service which is based on what is essentially a database which is derivative of the training materials. The regurgitation is simply a means of providing proof of that. The value of ChatGPT and hence of OpenAI depends in part on their software for which they've (presumably) paid their developers but also on the training material for which they might or might not have paid but the use of which will exceed the T&Cs attached to it.
this idea is lacking in merit. OpenAI is using the term in exactly the context you are suggesting they are not.
otherwise, they would be admitting that the times article is retreivable to any customer, from a supported and intentionally designed way to use their product (Eg., using info posted in the yet to be written lifehacker article "get a times article for free")
This post has been deleted by its author
It just came to me in a burst of insight that AI could replace not only marketeers, but lawyers. Wrt to the marketing types, I doubt we'll notice the difference. AI, no matter how clueless, might even be an improvement from the client (i.e. victim) point of view. Lawyers however perform a useful and necessary function in modern societies. Albeit, not all that well. But they could be worse. And AI can probably achieve that.
I am contemplating spending my declining years in a mountain cave suitable for hermits. Anyone know of such available for a long-term lease at reasonable rates? I might even come out in a few decades (if I last that long) to see if anything is left of society after this craziness plays out.
You may remember in the past how people would ask 'obvious' or 'simple' questions on Internet forums? They would get answers such as "Google is your friend." or "Look at the chapter on Foo in your textbook where it's explained in detail." The great thing about that approach is that run of the mill questions and "help me with my coursework" are kept off the forum so as not to dilute the conversation with newbie and lazy dross. Now the same can apply to AI. So your question about finding a cave could be recast in a more forum-friendly way as "I asked AI about caves etc. and it said ... Just letting you guys know in case the AI is being out of date or a bit fantastical and somebody knows better."
I'm all for passing on wisdom and technical experience but I can't be arsed to start at page one every time. And sometimes you'll get my gratuitous opinion as well, tailored to an audience who can understand it.
The secret there is that you need to replace the lawyers and judges as a matched set. The resulting inscrutable legalese will flummox onlookers while sufficiently emptying their pockets.
Just replacing the one part of the staff will obviously tip of the others, because they will know they aren't a chatbot. As long as it's all bots on bots then then only the worst screw ups will be caught by outsiders, at which point the smart move is to have a whole set of regulator bots with simple logic that causes them to rule in favor of whoever complained, issue a generous settlement that makes up less than 1/10 of 1% of the take and comes with a gag order clause to shut them up with no admission of wrong doing.
Probably take a couple decades before the wheels fell off.
"they jumped through a huge number of hoops to force it out"
Not sure that matters. At least not in the US. Providing a way -- however devious -- to bypass IP protections that doesn't fall under exceptions like fair use is, I'm pretty sure, illegal and actionable. Might be different elsewhere. Caveat: I am Not A Lawyer (And wouldn't want to be).
It doesn't matter how many hoops I jump through. If I find that a certain set of queries to OpenAI's servers gives me the source code to GPT, that does not make it my source code. I had to jump through a lot of hoops to retrieve it, but my willingness to take that action does not change the legal situation I'm in. The law considers the results and the intent, not the bit in the middle. Taking the code or the articles without permission is illegal. The only reason why getting the quotations is necessary is that OpenAI would, without them, be willing to lie that they were not in the training data. This action does not create or cancel the crime. It only provides evidence of it.