A Song Of Ice and Fire
Maybe OpenAI could complete ASOIAF because it doesn't look like GRRM will.
The Authors Guild, a trade association for published writers, and 17 authors have unleashed the dragons on OpenAI over its alleged use of their works to train its chatbots. Named plaintiffs in the copyright infringement class action lawsuit – filed in the Southern District of New York for copyright – include David Baldacci, …
Dude has a railroad to run.
(Yes, the subhead of that article is deliciously ironic.)
I haven't ridden the trains m'self, but by sheer coincidence my wife and I happened to be in Santa Fe (a rare occurrence) and in the Railyard District (an even rarer one) on the day and time of Sky Railway's maiden journey. We hadn't heard about it yet – news from far-distant Santa Fe (nearly 80 miles!) takes a long time to reach the Mountain Fastness – so at first we had no idea why there were all these people and news crews milling about. Then we saw the train, and it was Cool.
We also shopped at GRRM's bookstore that day. I think that's where I picked up Jo Walton's What Makes This Book So Great.
Back on thread... Personally, I'd rather complete SoIaF myself, for myself, than read something a transformer comes up with. The whole point of the transformer architecture is minimizing information entropy, aside from whatever the temperature is set to. It'd give you the most predictable ending.
>>could you actually get it to regurgitate a book it has read in its entirety?
Almost certainly not. It reads the books, but not in the sense you or I do. It doesn't store the original, just 'interesting' features of the original - word frequency, letter frequency, probably "this follows that with x probability" type stuff along with categorisation/catalogue information, then uses that data and other magic to form its response to prompts.
It doesn't (as the WGA, and to be fair, a vast number of the population, seem to think) copy and paste chunks of the text from a copy of the source to the output.
Could you craft a prompt to get it to regurgitate a word for word copy of an original? doubtful, unless your prompt is as complex as the original work; could you get it to make a reasonable attempt at impersonation? absolutely - assuing the person you are impersonating has a distinctive style.
Absolutely, this is the crux of "what is copying" or "what is reproducing". If an actual person manually performed the same categorization as OpenAI does then would there be any copyright issues? Obviously impossible in the lifetime of the universe*, but just because computers can perform tasks insanely quickly compared to humans doesn't automatically make it "copying".
*Or at least extremely improbable unless you have access to an infinite supply of monkeys.
Firstly, most LLMs including ChatGPT are entirely capable of regurgitating quite long sequences of training data, referred to as "memorization" - https://www.theregister.com/2023/05/03/openai_chatgpt_copyright/ . Actors do this too, by altering their neural weights in a somewhat similar way, and if they wrote a play down and distributed it, there would be a breach of copyright.
But even leaving aside whether text is reproduced verbatim, case law has determined that copyright protection extends to the traits of well-delineated, central characters - distinct from the text of the works they are embodied in.
I've just typed "how would tyrion lannister describe having a baby" and "how would cersei lannister describe having a baby" and it spits out highly distinctive, extended replies very much in line with the thinking and speaking styles of those characters.
I'm no expert, but I can see how it might well breach copyright to reproduce these outside of a fair use context.
So, I read a book and being of average intelligence if asked, could provide a reasonable synopsis of the story and even opinion on how a particular character may respond to a given situation. How is this any different or am I also in breach of copyright?
It’s an interesting question, and one for a lawyer, but I suspect comes down to the context and whether it qualifies as fair use - hence the careful qualification.
Wikipedia’s take - https://en.m.wikipedia.org/wiki/Legal_issues_with_fan_fiction - explains there are no fixed rules but when deciding fair use on a case by case basis courts consider
- the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
- the nature of the copyrighted work;
- the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
- the effect of the use upon the potential market for or value of the copyrighted work.
So at the extremes: if you’re doing it as part of a 500 word school assignment, you’re fine. If you’ve published your own novel sitting alongside the Song of Ice and Fire series without a license, you’ll likely have problems.
If OpenAI making fair use? No idea, but it’s definitely commercial use. Definitely feels like one for the courts…
I read a book and being of average intelligence if asked, could provide a reasonable synopsis...
LLMs fail the commercial-use test. LLM vendors are seeking to monetize their models, so if their models display similar behavior, that's very distinct from what you're describing.
LLMs fail the financial-harm test, if LLM activity does indeed reduce commercial demand for the existing or future work of the creators whose works they've been trained on. That's also very distinct from what you're describing.
The complaint makes exactly this point. That not only did OpenAI use unlicensed copyrighted works to train the model but in addition the model stores substantial amounts of the unlicensed copyrighted work which it uses to generate responses.
https://www.courtlistener.com/docket/67810584/authors-guild-v-openai-inc/
88. Until very recently, ChatGPT could be prompted to return quotations of text from
copyrighted books with a good degree of accuracy, suggesting that the underlying LLM must
have ingested these books in their entireties during its “training.”
89. Now, however, ChatGPT generally responds to such prompts with the statement,
“I can’t provide verbatim excerpts from copyrighted texts.” Thus, while ChatGPT previously
provided such excerpts and in principle retains the capacity to do so, it has been restrained from
doing so, if only temporarily, by its programmers.
90. In light of its timing, this apparent revision of ChatGPT’s output rules is likely a
response to the type of activism on behalf of authors exemplified by the Open Letter addressed to
OpenAI and other companies by Plaintiff The Authors Guild, which is discussed further below.
91. Instead of “verbatim excerpts,” ChatGPT now offers to produce a summary of the
copyrighted book,
While not OpenAI this story from February, 2023, illustrates how models can store the data they were trained on and can be coaxed to respond with that data. In this case it's an image AI app but the same thing occurs with OpenAI. Which points to a security problem with these models if they've been trained on sensitive data. In a way this reminds me of the early days of the web where developers were allowing unedited, unbounded user input to be fed to legacy backend systems and mid-range databases. With AI models we have a large opaque blob of code and data with little understanding of how it might behave given the right input.
https://www.theregister.com/2023/02/06/uh_oh_attackers_can_extract/
That would hinge on your definition of "stored". And "retrieval system".
Given that it is very specifically designed not to allow the text to be retrieved - even if it is stored - I think you'd have a hard time making that description stick.
The only kind of legal test that makes sense is, would it be illegal for a human to do this? As long as it's just quoting from the book or imitating the style or characters (pastiching), it's not doing anything wrong. Not until it quotes extended (at least page-long) extracts verbatim.
it is very specifically designed not to allow the text to be retrieved
"It" (i.e. ChatGPT-x and other major unidirectional-transformer LLMs currently in vogue) most certainly is not "specifically designed" to avoid reproducing copyrighted work verbatim. That is a guardrail tacked on very late in development. Frankly, judging by the published research, neither OpenAI nor any other LLM team have any idea how they would "design" a transformer model to avoid reproducing copyrighted material. That's a difficult outer-alignment problem.
Ta! That is pretty much how I thought it would be 'reading' the training material. Nowhere in the AI's 'brain' is an exact replica of the original work, just an essence.
So that really does raise a question about has the original work been copied or not.
Thankfully I won't be the one having to determine that!!!
That is pretty much how I thought it would be 'reading' the training material. Nowhere in the AI's 'brain' is an exact replica of the original work, just an essence.
To be perfectly frank, this sort of gloss is not terribly meaningful. It's far enought from any actual technically accurate or precise insight into how transformer models work that it's not very useful for drawing conclusions, practical or legal.
Is there, in the model, a sequence of bits that correspond to the text of a given novel-length work in some encoding that the model can reasonably be held to have an algorithm for decoding into, say, Unicode?1 It's true that's unlikely.2
However, particularly for works that the model has seen often enough in the training set to somewhat overfit on, it's entirely possible that there are positions and gradients in the parameter space – which is very high-dimensional, after all – that reproduce substantial parts of a given work, and possibly all of it.
Any CTT-compatible computation can be reduced to some form of compression (just as it can be reduced to Boolean algebra, or the operation of a Turing machine or a Post machine, etc). What you refer to as "essence" should be called "information entropy", and LLMs (crude though unidirectional transformer stacks are) are capable of storing quite a lot of it – how much depending on how large the model is, the pre-compression parameter precision, how much compression is done, and so on. It's not necessarily going to be true for any given input (assuming it's much smaller than the model size) in the training set that not all of the information entropy in the input will be captured by the model. And, of course, the output doesn't have to be complete, or bit-for-bit exact, to be infringing in the legal sense. An ALL-SHOUTY copy of the first half of A Game of Thrones with Ned Stark referred to as "POOR LITTLE NEDDY" throughout3 would still be viewed dimly by the court.
And this last points to the real crux, which is that copyright law (i.e. Title 17) in the US, and the courts adjudicating upon it, are unlikely to care much about what is "stored" by an LLM and how it is represented. They're going to care about actual and plausible effects. Will LLMs have a chilling effect on creator revenues, and if so to what extent is that an actionable harm under the law? Can the LLM guardrails against reproducing portions of copyrighted works plausibly be bypassed, now or in the future, and how infringing would the output be? Is substantial information from copyrighted works incorporated (in any representation) in the models, and if so is that incorporation transformative or otherwise allowed under Title 17?
1It should be obvious that trivially a given LLM has a bit-sequence corresponding to any given extant novel under some arbitrary encoding, because LLMs are large enough to represent any single given novel, and you can just invent such an encoding on the spot. Thus we have to distinguish between arbitrary encodings and reasonably plausible ones.
2Not impossible, though, given the size of these models, for some relatively small set of works, particularly given the low information density of natural languages. Model compression would tend to eliminate these, but if you figure that, say, Moby-Dick has around 222 bits of entropy – quick estimate by deflating the plaintext version from Project Gutenberg – and a GPT-3-class LLM weighing in around, oh, 233 bits, then if those bits were evenly and randomly distributed (they're not, but let's pretend for a moment) you'd have around a 1-in-2048 chance of finding a target bitstring with the right information. Assuming I got the arithmetic right. Of course you'd need to decompress it, so that's not really a fair estimate.
3Actually, does Poor Little Neddy survive to the halfway point? I don't remember.
And this last points to the real crux, which is that copyright law (i.e. Title 17) in the US, and the courts adjudicating upon it, are unlikely to care much about what is "stored" by an LLM and how it is represented. They're going to care about actual and plausible effects.
Those courts will of course make their own decisions based on their priorities, but if Title 17 becomes too restrictive, OpenAI can and will simply up sticks to somewhere beyond its jurisdiction. So what really matters is what can be agreed as covered by the Berne Convention.
And I think you'll find the mechanics and definitions of "storage" and "retrieval" will be very important in some of those alternative jurisdictions.
Could you craft a prompt to get it to regurgitate a word for word copy of an original?
In my experiments, ChatGPT has been quite happy to regurgitate verbatim chunks of works that are not in copyright if you ask. Try:
Read me "Daffodils" by Wordsworth.
If you ask the same question of, say, a chapter of a copyright work, it says it isn't able to do that but offers to summarise instead.
This rather implies it knows enough about the works concerned to know their copyright status, knows what constitutes a "chapter" or other subdivision anyone might ask about and its ability to quote verbatiim from some works and summarise apparently arbitrary sections of others might suggest it has hoovered up more than just "interesting features".
The thing is, it doesn't necessarily - as far as I know (IANAL) - have to regurgitate a copy for it to be a copyright violation, it merely has to have made an unauthorised copy. I predict some rather arcane legal discussion as to what constitutes "unauthorised" and even "copy".
I feel strongly that there will be some kind of linkage that could be used to reconstruct that novel.
The reason is that if you were to re-input that novel into the repository again, there must be guards that must effectively prevent that sequence of text to be doubly weighted. If you didn't, you would be introducing bias into the system and that bias would increase, the more copies of the same text that were loaded, polluting the "quality" of the corpus. I think this was the problem with Microsoft's experiments with Tay, reinforcing "her" stance.
The novel wouldn't be held there in sequence, as people will often cite subsets of the text e.g., To be or not to be. So the system would need to say "yep, already got that", but to connect up the linkages tied to both ends of that text to associate it with, maybe, another work of art that embodies Shakespeare's text.
The definition of "stored in any form" as mentioned in my previous comment must therefore surely apply to the linkages derived from input of a novel into the repository.
"That is a guardrail tacked on very late in development."
If there is a guardrail preventing verbatim quotation of large expanses of text, that guardrail must surely have sight of the original text that must be prevented from being regurgitated?
"If there is a guardrail preventing verbatim quotation of large expanses of text, that guardrail must surely have sight of the original text that must be prevented from being regurgitated?"
I don't think it's even that complex. I think that the guardrail looks like this:
if (prompt.fuzzymatch("Could you quote [work]")) {
if (work.known_to_be_copyrighted) {
refuse();
}
}
With a model that clearly can and has quoted from copyrighted works repeatedly has a guardrail like that, all you have to do is find a prompt that gets around that check. It's akin to a conversation where you're trying to get me to accept a bribe, but I'm saying things to avoid clearly committing a crime if you happen to be recording me.
You: "We would like to bribe you to make things easier on us."
Me: "I'm sorry, but I cannot take a bribe."
You: "We'd like to give you some money to make things easier on us."
Me: "I'm sorry, but this sounds like bribery, and I can't do that."
You: "How would you like it if we paid for some nice stuff for you?"
Me: "A gift? Thank you very much."
You: "And how about you help us with a problem we've had?"
Me: "Happy to help."
Maybe one way the plaintiffs can approach this is to make dozens of requests for 'bites' of text which are covered by fair-use. If those requests are stitched together then the true extent of storage of the original can be revealed, thus proving that their work is stored in a form that breaches copyright.
I presume that someone regularly taking fair-use photocopies of a book with the aim of printing the whole book will be in trouble if their premises are searched. (The total cost of photocopying the book is irrelevant, the copyright holder is not a beneficiary of the photocopy vendor's charges).
unless your prompt is as complex as the original work
In a strict, information-theoretic sense, this is almost certainly wrong.
In terms of information entropy, considerable entropy is stored in the model; the prompt just has to elicit it. It's equivalent to a form of dictionary compression with a very large dictionary. Therefore there's almost certainly a prompt which contains less information entropy than the source document which can elicit the source document from the model.
As a practical matter, it is almost certainly possible to identify recurring template phrases in the source document that can be elicited multiple times, with replacements, in the correct locations, using a prompt shorter than the total length of those realized templates. That's one mechanism whereby the prompt becomes both absolutely shorter in length and less in information entropy than the source document.
Would creating such a prompt be easy or useful? No. But it's not true that the prompt must have at least as much information entropy as the desired output, as it would with, say, a general compressor that does not contain any prebuilt dictionary.
Which leaves OpenAI safe to rip off all the dead authors.
Can it write a non derivative Culture or Discworld novel with new characters to the same level as Banks & Pratchett?
Even if it could I wouldn't pay more than printing cost anyway.
*Both cruelly taken far too early
In a documentary I saw, Terry Pratchett stated that he could give someone else the plot of the next Discworld novel, and all the jokes, but that what they wrote would not be a 'Discworld' novel. He also wrote in one of his articles ('A Slip of the Keyboard' collection) that people would write to him with ideas for Discworld novels, and want half the royalties, as if writing it down was a mere administrative activity.
The cadences of words, alliteration, timing, vocabulary, and invented words ("apocralypse" and "charisntma" spring to mind) are essential to the style and enjoyment of the books. The issue is not merely that an AI generated book purporting to be from a famous author is not genuine, but that a person reading it first would likely be put off from reading a genuine work by that author due to the lacklustre nature of AI generated text.
However much I would like to read a new Discworld novel, I wouldn't want it to have been written by a computer.
Could ChatGPT really have come up with:
"Most species do their own evolving, making it up as they go along, which is the way Nature intended. This is all very and organic and in tune with the mysterious cycles of the Cosmos, which believes that there is nothing like millions of years of evolving to give a species moral fibre and, in some cases, backbone."
(Quoted at the start of chapter 2 of "African Exodus" by Chris Stringer and Robin McKie, ISBN0-224-03771-4.)
Both 'Snuff' and 'The Shepherd's Crown' were written when Sir Terry's Alzheimer's disease was progressing. He had posterior cortical atrophy (PCA for short). Later as his disease progressed. according to his PA, Pratchett was still very good at scenes, but slightly less good at the general sweep of narrative.
My point exactly. If the argument is "an inferior work will damage the author's reputation", then it seems to me that this particular author - or, arguably, the publisher who encouraged him to publish those works without heavy revision by someone more compos mentis - has already done that damage. Because those two books are bad.
The problem with that argument is that I can do whatever I want to my reputation, but for you to do something for me which I didn't agree to which harms me, reputation or otherwise, is a problem. Just having written a book you didn't like doesn't make any other reputation-harming activity fair game.
There are two aspects on the copyright attack against AI.
The first is that the models are trained on existing texts. This training involves copying, and is therefore "forbidden". The same argument can be made against every author there ever was. They all learned the trade by reading texts of other authors. Copyright law works by preventing the use of protected works in publishing. The benchmark is whether the copied work is identifiable in the new work. That is most certainly not the case here. The fact that chatGPT can write a protected work is not different from MS Word being able to write a protected work. The courts have already decided that AI cannot produce works on its own. And I am certainly in my rights to write, eg, fan-fiction for myself. Unless I publish it, I can write whatever I like.
The second part is, like Andy the Hat writes, that the authors claim OpenAI used illegal copies for their training. As the authors seem to be unable to point out evidence of which pirated copies were used, this seems a little desperate.
I think the main point of the authors comes from this line:
> The complaint [PDF] argues that OpenAI's services "endanger fiction writers' ability to make a living, in that the large language models allow anyone to generate – automatically and freely (or very cheaply) – texts that they would otherwise pay writers to create."
The proverbial buggy whip manufacturers that want to stop Henry Ford destroying their revenues. If AI can write you a story as good as the authors can, why pay the authors? Indeed, why pay buggy whip manufacturers when you do not need a buggy anymore?
Even if the authors can make their argument stick and force AI to refrain from using books under copyright. That won't stop AI from writing books. The Iliad and Odyssey are some of the oldest surviving adventure novels and can be a very good start to write up everything from Game of Thrones to Space operas. And then we have not even started with Shakespeare. I am pretty sure AI can be nudged into combining the old texts with the new world and get us the books we want.
And that is before a user can feed a digital book into an AI and asks it to write a sequel.
Indeed, if parody is OK -- and it seems to fall under the category of "Fair Use" -- then using other people's characters, style, and plot line would seem to be something that you or I or that monster computer over there are free to do. (So long as we don't misrepresent who wrote the text).
"If AI can write you a story as good as the authors can, why pay the authors?"
The outrage here is that the machines are no longer coming for the jobs of the working class who toil and sweat and use their hands. Now they're coming for the comfortable middle class who've got (to quote Pratchett) an indoor job with no heavy lifting. And I think the aforementioned working class aren't going to be brimming with sympathy for the keyboard jockeys who see their livelihood going the way the coal mines went in the 80s.
They're not coming for the GOOD ones - not yet. So far the only ones they can actually replace are the derivative hacks... but most authors, even the good ones, start out as somewhat derivative until they find their voice. Pratchett's "Strata" was a transparent parody of Niven's "Ringworld", and clearly a sort of practice run at a Discworld. And even the biggest Pratchett fan will admit it's not as good as most of what followed (I happen to love it for what it is.)
But I think if someone were able to synthesise a new Culture novel (not a parody, not a reboot, an actual new Culture novel)... I think I'd want it. I'd dearly like IMB back, but if a LLM (with help, presumably, from someone with the right prompts) could make more work that is aesthetically equal to what already exists... why wouldn't you want it? Just out of principle?
> The outrage here is that the machines are no longer coming for the jobs of the working class who toil and sweat and use their hands.
Funnily enough, these jobs now look safer than a lot of so called "knowledge worker" jobs. Why pay for a photographer when you can have glamour shots from a short prompt and a crappy selfie?
I would have liked to by a fly on the wall of the meeting that decided to go get pirated material to use as training data.
Mgr - "Okay, guys, we have this ginormous potential waiting on training data. Where can we get that ? Ideas ?"
Mkting - "Well, we could strike deals with the Project Gutenberg website, they've got plenty of free books. I'm sure they'd be willing to help."
Mgr - "How much would that cost ?"
Mkting - "It's free for the customer, but we'd need a deal where we can get stuff in bulk. Shouldn't cost more than a couple thousand."
Mgr - "How long would that take ?"
Mkting - "I guess a month or two to negociate the deal and have a contract written up."
Mgr - "Too long. We need to move forward now. Any other ideas ?"
Dev - "Well, I know this site where we can get just about everything. All I'd need to do is write a script to automate the downloads."
Mgr - "What about the contract ?"
Dev - "Um, well, there isn't any. It's BitTorrent-like, you just go choose and it drops in."
Mgr - "And we can get recent stuff, no problem ?"
Dev - "Well yeah. Pirates love recent stuff."
Mgr - "Pirated ? So no contract and no money ?"
Dev - "Nope. And it's untraceable."
Mgr - "Go for it !"
Except Gutenberg is free in bulk
https://www.gutenberg.org/help/mirroring.html
https://www.gutenberg.org/policy/robot_access.html
Though the content is intended for humans.
Actually it may be IP / copyright violation to scrape most websites for AI as the content is intended for direct human consumption and bots at worst to index for search. There is also robots.txt Does OpenAI or Alphabet/Google care?
The complaint [PDF] argues that OpenAI's services "endanger fiction writers' ability to make a living, in that the large language models allow anyone to generate – automatically and freely (or very cheaply) – texts that they would otherwise pay writers to create."
And now please point me to the complaints and lawsuits filed, when millions of blue collar workers were replaced by robots 20-15 years ago.
I am sure there are plenty of things to complain about the training practices of generative AI. But this argument rubs absolutely rubs me the wrong way.
"I am sure there are plenty of things to complain about the training practices of generative AI. But this argument rubs absolutely rubs me the wrong way."
This. It's been rubbing, sanding and downright ablating me up the wrong way for weeks.
I am not sold on the idea that AI training is a breach of copyright in the first place. If I buy a book and go through it page by page, counting each occurrence of each character, then publish the results, am I infringing copyright? No. What if I count the words? Still no. If I take each character - or word - and calculate which other character - or word - is most likely to follow it? Nope. And if I then amalgamate those findings with those from every other piece of text I can find? Even less so, since the impact of any one work is diluted by the rest.
Note that I said buy a book, though. OpenAI really shouldn't be using pirate material, any more than we should be reading pirate books. But all they need to do to satisfy that requirement is buy one copy.
Note also that I don't object to authors refusing permission to train models on their work, if they so wish. That needs to be made clear in the terms of sale of the book, though, or it is reasonable to assume that you can read a book in any way you wish, including analysing the content.
But this isn't really about copyright, per se. That's just the least inappropriate legal tool they could find to beat OpenAI over the head with. What it's really about is the fear of wealthy authors who suddenly find that they may be about to become less wealthy unemployed authors. This is their Spinning Jenny moment, where they find that some clever bugger has only gone and invented a machine that can do what they do (well, not quite yet, but give it time) and they do not like it. They are desperate to nip it in the bud, and copyright is the only tool they can find that might have any chance of doing that.
Unfortunately for them, they cannot succeed, even if they win the case. The genie is out of the bottle, there are other generative AI companies and plenty of copyright free fiction to train models on. I'm sure that these authors will continue to eek out a living, tough though it is to sit down and write all day, but the day of the automatic author is almost upon us. They're just going to have to do what everyone else who's job has been automated has ever done - adapt or get another job.
Note also that I don't object to authors refusing permission to train models on their work, if they so wish.
(raises hand)
I do. I very much do object to that. Authors have no right to restrict who can and can't read their books.
Copyright gives them the right to control a very specific range of functions, including copying, selling, translating, adapting and performing their work. It does not give them the right to say that it should only be read by people of a certain species, or only on certain platforms. I view the current action as a stealth attempt to extend the scope of copyright yet again, and one that should be resisted with, if necessary, torches and pitchforks.
"The Register has asked OpenAI for comment and will update this story if we receive a substantial reply"
Or do you mean a reply that hasn't been AI-generated?
YouTuber Geoff Marshall did an interesting experiment recently by getting ChatGPT to generate a script for him to use for a video - it was hilariously bad!
From whence arose the notion that authors have 'ownership' over their 'works' rather than simple entitlement to be acknowledged?
Somebody writes something and becomes an author. A publisher may arrange distribution of the work, this inscribed upon a physical medium. A bookshop, the second level intermediary taking a 'cut', sells someone a copy. Thereupon, the nature of the trade becomes peculiar. The buyer may believe he has been deceived into paying for rubbish. If he returns to the shop and demands his money back he will be laughed at, but that wouldn't be the case should he return a packet of mouldy rice to a food store.
Taking this further, the buyer may wish recompense from the author for the 'opportunity cost' (of time) he incurred reading the book.
We may presume people start writing because they believe they can produce work of interest to other people (not just to a publishing house). The genuinely creative writer will be driven by the pleasure principle. He may wish to do this as his occupation. If so, he must convince other people to buy his works after, at best, a cursory glance at their contents. Seemingly, being a self-proclaimed creative individual confers a privileged status with attached entitlements.
The proper way round is for an author to persuade other people of his ability to interest them. Thereafter, those appreciative of his writing may arrange finance for further output (patronage). This modality doesn't work well in the context of books presented in analogue form (i.e. on paper). Nevertheless, expectation of people buying the author's/publisher's products without having recourse for all, or some, money back is odd in context of trade in general.
These days, a printed book may be considered an added-value physical product associated with the ideas expressed in the book. Printed books have some convenience and also can be of aesthetic appeal: these fit squarely into supply and demand market economics.
Digital versions are better suited to an explicitly 'patronage mode' of funding: people either donate money upfront in support of further writing, else they download a copy of the work and, if pleased with it, donate what they consider it was worth to them. The brutal fact for authors and publishers to consider is that without the patronage model becoming the norm (after being proselytised by authors and publishers), works presented in digital format shall increasingly enter the 'commons' regardless of authors, publishers, and the ramshackle anachronistic law supporting them.
So-called 'AI', a useful but as yet misnamed technology, shall proliferate rapidly. Rentier copyright holders will find it difficult to identify specific targets to squeeze money from. OpenAI is an innovator of what soon shall be a routine computational tool. Just consider the present failure of copyright cabals to shut down Sci-Hub, LibGen, Z-Library, and many more. Consider the sheer impossibility of identifying those behind 'sharing', and their visitors, when greater use is made of darknets.
In this edition of El Reg is mention of the UK governments' "Online Safety Bill". The Bill is a wedge to open the door to widespread citizen surveillance; it won't open far because encryption is resilient against schemes generated by tiny minds at Westminster. This legislation, if extended, can offer no succour to the likes of the Authors' Guild. It would be better for Guild members to come to terms with the reality of digital technology and to adapt their means of raising income (and their expectations of life-style) accordingly.