back to article Boffins probe commercial AI models, find an entire Harry Potter book

Machine learning models, particularly commercial ones, generally do not list the data developers used to train them. Yet what models contain and whether that material can be elicited with a particular prompt remain matters of financial and legal consequence, not to mention ethics and privacy. Anthropic, Google, OpenAI, and …

  1. david 12 Silver badge

    Can it improve the Harry Potter books?

    The first HP book was a risky proposition, and full credit to the publisher -- full-length children's books were deeply unfashionable, and nobody else was doing it. And it was plot-driven, with a decent plot, and a decent mixture of generic and original plot and character elements.

    But the language of the text was unimaginative and stereotypical -- the kind of language you expect from LLM homogenization. So no surprise that AI models can reproduce the text.

    Can they do better? If the reproduction is only 95% copied, is that 5% better? Or with even the 5% original and idiosyncratic language rounded out?

    1. doublelayer Silver badge

      Re: Can it improve the Harry Potter books?

      That would depend on your personal opinion of better. My guess is no, but there's no way for me to predict your taste nor to know what faults from the book you would like a rewrite to improve, likely a different set than if I made a list. If you're hoping that it can do that, you're more likely to get what you want if you prompt it to do a rewrite rather than hoping that faults in memorization that it's not supposed to be doing in the first place spontaneously improve the text. I don't think either approach will get you good results, but the former is slightly more likely than the latter to do so.

      1. Anonymous Coward
        Anonymous Coward

        Re: Can it improve the Harry Potter books?

        Not sure why you're being heavily downvoted...you can objectively measure the quality of a piece of written prose (plot and contents of the prose notwithstanding).

        You can thoroughly enjoy a book and it still be badly written...take Dan Brown novels for example...whether the story is interesting is a matter of opinion, but whether the writing is high quality is not.

        Hell, I used to love reading The Viz...but it'll never win any prestigious writing awards.

        Today, I read The Register, not for the quality, but because the site doesn't complain if I use an ad blocker, which in this day and age is more important than quality writing. El Reg is definitely not the last bastion of quality writing. The articles are interesting, but the quality is that of a C grade GCSE student writing their homework on the bus on the way to school on the back of a fag packet. Which is fine because none of the writers for El Reg are dedicated to this so called website, it's just something they do between signing on days after their shift at the supermarket collecting trollies and pocketing the abandoned quids.

        1. Anonymous Coward
          Anonymous Coward

          Re: Can it improve the Harry Potter books?

          Downvotes are because the 'attack' on the 'El Reg' writers is somewhat 'over the top' ... as is 'de rigueur' at this point I should say ... if you can do better then write it yourself and sell it to 'El Reg' for all our benefits !!!

          P.S.

          'you can objectively measure the quality of a piece of written prose' is subject to some debate as 'quality' is not measurable to a static standard unless you mean strictly Sentence construction, grammar, spelling etc

          The acceptable standard varies with the times, an acceptable piece written now would not be considered acceptable in say the 1930's and that is discounting the changes in language usage/word meanings/etc.

          P.P.S.

          Yes, I know I break so many rules with my writing BUT that is an affectation that entertains me and I am NOT writing for 'El Reg' !!!

          :)

        2. Jimjam3

          Re: Can it improve the Harry Potter books?

          Yes, I can tell you used to read Viz. Top marks at trolling.

      2. Anonymous Coward
        Anonymous Coward

        Re: Can it improve the Harry Potter books?

        Exactly! Opinion differs depending on the person.

        This reminds me of a time whilst working for pocket money in a store and a woman came up to me and asked for my opinion, ‘Does this go well in my Bedroom?”. I replied “Madam, I haven’t been in your Bedroom”.

    2. Bebu sa Ware Silver badge
      Windows

      Re: Can it improve the Harry Potter books?

      I doubt AI now and in the forseeable future could improve in a literary sense on any writing except, perhaps the very worst which Rowling's works definitely aren't.

      I watched the first film long before I read the text. I was impressed by the extremely effective use of special effects to portray "magic" on screen. I was more impressed by the stories getting a whole cohort of children reading and actively using their imaginations. Mind you this was before smart phones and ipads.

      When I eventually read the text I was struck how derivative the material was and the fairly ordinary writing. The source material has been repeatedly mined by innumerable authors — C S Lewis, J R R Tolkien and more tongue·in·cheek Terry Pratchett to name but a few — so really only an observation rather than a criticism. The sophistication of the writing is admittedly targeting a juvenile readership and given its obvious success in this, I would not complain.

      I think the sense of not really getting much idea of what magic was or wasn't in the Harry Potter world was one of the more unsatisfying aspects. Contrast with Pratchett's Discworld where the nature of magic in that world is slowly revealed and how it interacts with people etc

      1. Simon Harris Silver badge

        Re: Can it improve the Harry Potter books?

        "When I eventually read the text I was struck how derivative the material was"

        When I first came across Harry Potter, another novel, written some 29 years previously, immediately came to mind. This is Wikipedia's introduction to A Wizard of Earthsea by Ursula K. Le Guin.

        "It is regarded as a classic of children's literature and of fantasy, within which it is widely influential. The story is set in the fictional archipelago of Earthsea and centers on a young mage named Ged, born in a village on the island of Gont. He displays great power while still a boy and joins a school of wizardry, where his prickly nature drives him into conflict with a fellow student. During a magical duel, Ged's spell goes awry and releases a shadow creature that attacks him. The novel follows Ged's journey as he seeks to be free of the creature."

        Incidentally, the Earthsea books are well worth reading.

        1. Tron Silver badge

          Re: Can it improve the Harry Potter books?

          The idea that there are seven basic plots has been common for a long time. You will see repetition everywhere if you read enough. The variables change, the writing differs, the mix alters. The joy of real authors is that we create from personal stuff and created stuff, as well as inevitably replicating storylines and stuff we half-remember and recycle, but come up with the book that only we could write. AI is unlikely to match that, maintaining coherence, the balance and pace of the plot, interest in the characters. You will also see similarities in books by one author, indicative of their personal interests, style and intent, hence terms like 'Dickensian'.

          Sequels are hell to write as you need to retain the aspects of your original work (that readers liked) whilst introducing change. I managed a trilogy, but it is much tougher than three different books, particularly if you intended your first work to be standalone and brought it to an conclusion.

          AI fails at basic stuff. Fiction is much more complicated.

          1. Rich 11

            Re: Can it improve the Harry Potter books?

            "AI fails at basic stuff. Fiction is much more complicated."

            I gave one of the LLMs (I forget which one now) several plot, setting and character prompts and asked it to write a short story based upon them. After it churned that out, I asked it to write another after modifiying one of the prompts, and then a third. As storytelling, each result was miserable; as creative writing, each was appalling. The interesting thing was that in each story it re-used some terrible phrases (like "iron-handed" and "hammer-striking") in contexts that I have never seen in 50 years of reading the prompted type of fiction. Fuck knows what it was trained on to get those results.

            1. that one in the corner Silver badge

              Re: Can it improve the Harry Potter books?

              > Fuck knows what it was trained on to get those results.

              The Eye of Argon, a popular read at all good literary conventions.

              1. Rich 11

                Re: Can it improve the Harry Potter books?

                oh god oh god oh god oh... I'm just glad there isn't a Hell.

          2. Anonymous Coward
            Anonymous Coward

            Re: Can it improve the Harry Potter books?

            I agree, AI can't snort as much cocaine as Stephen King did.

            1. Anonymous Coward
              Anonymous Coward

              Re: Can it improve the Harry Potter books?

              I don't know why the downvotes...AI as it is cannot write "that chapter" from IT...it will simply refuse if you prompt it to.

              AI cannot write novels like a human because AI has so many guard rails...novel writing is probably the least censored medium that exists...there are no age restrictions on books, no content filters, no ratings bodies etc...yet nobody gets uptight about content that strays into the downright weird and explicit in the world of novels.

              AI, movies, youtube videos etc are censored to absolute fuck...it's almost hyprocritical how folks get uptight about censorship on videos and AI but not over novels or books.

              There is that one chapter in IT that if it was turned into a movie you would not only be jailed for making it you would also be jailed for watching it...but reading it and writing it is fine.

              I've always said if we censored AI the same way we censor books (i.e. light touch, only ban really extreme stuff) we'd probably have better AI because the creativity of AI would be less stifled.

              1. Anonymous Coward
                Anonymous Coward

                Re: Can it improve the Harry Potter books? NO but did you expect it could/would !!!???

                "... creativity of AI ..."

                What is that !!!???

                'AI' is a stochastic 'gatling gun' that fires words/groups of words, instead of bullets, based on matching patterns it can ONLY 'see' ...

                It has NO Intelligence, NO Understanding, NO Nuance, NO Ideas, NO concept of True/False, NO emotions, NO concept of Reality/Fiction, NO place in a world that needs things to get better or at least not get any worse !!!

                :)

          3. awomanmanhasaname

            Re: Can it improve the Harry Potter books?

            A link to your writing, kind sir?

        2. Eclectic Man Silver badge

          Re: Can it improve the Harry Potter books?

          the Earthsea books are well worth reading

          Actually a lot of Ursula Le Guin's books are worth reading. 'The left hand of darkness' and 'The word for world is forest' are both good, if you like something with a bit of interesting philosophy as part of the plot instead of 'cowboys and Indians in spaaace'.

          1. MachDiamond Silver badge

            Re: Can it improve the Harry Potter books?

            "if you like something with a bit of interesting philosophy as part of the plot instead of 'cowboys and Indians in spaaace'."

            I don't know about that. Cowboys and indians in space (space opera) is an easy escape and doesn't need to have literary merit. While I liked Snowcrash, the philosophy dragged down the pace and seemed to be forced to get it shoe-horned in.

            Why try to "improve" Harry Potter? It would be better to write an entirely new book/series. Sometimes borrowing a world is fine, but stealing characters crosses a line.

      2. Tron Silver badge

        Re: Can it improve the Harry Potter books?

        What you are seeing in Pratchett and Potter with magic is a different authorial focus. You personally prefer one over the other. I suspect the Potter books target a generally younger, wider audience.

        I choose to write using fairly simple language and sentence structures as I want my books to be accessible, although they are not aimed at children. In one case, it was deliberate as the work was set in Japan and I felt that it would be a good option for Japanese students learning English to have an original novel set in Japan to explore. Most authors accommodate their target readership, although the joy of literature is that anyone can read and enjoy your book.

        I would urge people who are not enjoying a book to just stop and pick another. There are always enough books out there that you will enjoy to last a lifetime, so don't waste your reading time on one you are not enjoying, just because it is popular or lauded. Everyone has their own tastes, and all authors soon learn that no matter how well they write, loads of people will never like their work. Plus publishers knock out quite a bit of abject crap every year, so be selective.

        1. rg287 Silver badge

          Re: Can it improve the Harry Potter books?

          What you are seeing in Pratchett and Potter with magic is a different authorial focus.

          It was a bit more than that. It was that Rowling's world building was rubbish. In the first two books there's enough new stuff that the robust plot and pacing carries you through (particularly if you're the target audience). By book 3 she was painting herself into corners. Which is not to say she needed to have built out a Tolkienesque legendarium before she started, but given that she'd already written the end of book 7 when the first book was published, she hadn't thought very hard about getting from A-B. It's not that she needed to explain it or have someone monologue or overcomplicate things - just know privately what the basic system was so she didn't contradict herself later on when the characters are bouncing through the plot.

          There's just all sorts of stuff in there which wasn't logically consistent (population size, settlements, the Ministry apparently having more staff than there are wizards in the UK, etc) but even kids notice the fact that Harry stops learning magic after year 3. Hermione is doing all sorts of mad stuff off to one side, Riddle was splitting souls when he was 17 and Ron's brothers are outright developing new spells and magics. He's not just an average wizard - he's positively remedial. The fact that there's just no sense of how the magic works or where it comes from is distinctly unsatisfying even for a kid.

          Pratchett and Le Guin both delve more into the workings - usually for narrative purpose. But Rowling's lack of detail wasn't simply down to the series being plot-driven (which is totally fair enough). It's that her Wizarding World doesn't hang together the way Discworld or Earthsea do. She didn't need to write a legendarium - but it did need some internal consistency. Which was lacking, and particularly noticeable since she took 10 years to complete the set and her audience had grown up - but the characters didn't. A kid today reading all 7 books between the ages of 11 and 14 won't notice, but the original readers who got the first book for their 11th birthday and had to wait for the next one definitely noticed by the time book 7 came out (on their 21st!).

          Which is not to say I didn't and don't enjoy them for themselves. But you do have to work that little bit harder to not ask questions and suspend the disbelief than you do with other works.

          1. Missing Semicolon Silver badge

            Re: Can it improve the Harry Potter books?

            Plus books 3 onwards really needed editing. Book 1, for all it's faults, rollicked along and carried you with it. It had benefited from years of rejections, and had been honed and polished.

            Subsequent books were allowed to spread to airport-novel dimensions. And the titles were a swizz! "Goblet Of Fire" was just a name-choosing machine.

            1. Tim99 Silver badge

              Re: Can it improve the Harry Potter books?

              It may not be a coincidence that each subsequent book appears to be about twice as long as its predecessor- the last being published in two parts?

              1. Martin an gof Silver badge

                Re: Can it improve the Harry Potter books?

                The last film was in two parts, not the last book.

                I'd agree that the writing and plotting are not the most challenging, but then the target audience of perhaps 8 or 9 year olds to 12 or 13 year olds* doesn't necessarily need all the existential angst of a PK Dick or a U LeGuin, or the sheer bleak depression of an Orwell.

                As for "no learning magic after year three", well, the story was only really "set" in the school, it was not (totally) "about" the school, so expecting the series to be a kind of Mallory Towers or Worst Witch was never really on the cards.

                Rowling did do quite a bit of backstory-creation, if not before starting writing then certainly by the time she had a publishing deal. Many consider Order of the Phoenix (book 5) which is probably the longest book in the series to be slow-paced and full of unnecessary backstory or exposition. Others enjoy this backstory so much it is their favourite volume. Finding it difficult to remember that far back, but I seem to remember it was around the time of the release of OotP that one or two people were beginning to work out (with hindsight, correctly) what the conclusion of the series might be, though of course there were hundreds of competing theories and JKR wouldn't confirm or deny anything.

                M.

                *one of mine read, and recalled, the first book while, erm, something like four years old, because we would not let them watch the film until they had read the book; this had been an attempt on our part not to let them watch the "scary bits" (particularly the end scene with Quirrel) until they were a bit older (and reading the book had prepared them for it). Fat chance. A decade and a half later, continues to be, a massive fan.

                1. Not Yb Silver badge

                  Re: Can it improve the Harry Potter books?

                  Rowling did quite a lot of the world-building on the fly, after writing the books. See, for one of the more egregious examples, the explanation of how wizards used to deal with personal excretion. "...hitherto [prior to the installation of toilets in Hogwarts] they simply relieved themselves wherever they stood, and vanished the evidence..."

                  There are other examples.

              2. Anonymous Coward
                Anonymous Coward

                Re: Can it improve the Harry Potter books?

                Well that is because the audience aged with the books...the first book came out in 1997, the last one came out in 2007. So your 8 year old initial audience was 18 by the time the last book came out.

            2. Roland6 Silver badge

              Re: Can it improve the Harry Potter books?

              >” Plus books 3 onwards really needed editing.”

              This has definitely become the case with her Robert Galbraith novels…

          2. MachDiamond Silver badge

            Re: Can it improve the Harry Potter books?

            "particularly noticeable since she took 10 years to complete the set and her audience had grown up - but the characters didn't."

            It's not easy to write a book in a year with so many moving parts that encompasses a year. It may have lost a few readers that felt they had outgrown Harry Potter stories, but those that started later will have lots of reading to do to make it through the series.

          3. Frodo

            Re: Can it improve the Harry Potter books?

            One reason for this is one of her main influences has to be Roald Dahl. Those books have a joyous and deliberate disregard for consistency and "'sense". Admittedly he didn't write many sequals. Potter number one was rather like this IMO

        2. Anonymous Coward
          Anonymous Coward

          Re: Can it improve the Harry Potter books?

          The Potter books weren't targeted at all...they started out as a hobby for J K Rowling.

          1. Anonymous Coward
            Anonymous Coward

            Re: Can it improve the Harry Potter books?

            They were, and are, crap but for some reason they hit at the right moment and became a phenomenal success which helped a generation of alt kids feel they could fit in to a "Muggle" world, only to be betrayed when the author turned out to be a vile bigot who actually hates a large chunk of that audience which helped her become obscenely rich.

            1. Anonymous Coward
              Anonymous Coward

              Re: Can it improve the Harry Potter books?

              "vile bigot": As soon as HP/JKR was mentioned in the comments, I wondered how long it would be before someone posted something along these lines.

              Longer than I thought as it turned out, but for the record, defending the hard-won rights of women in the face of predatory men doesn't make one a bigot.

      3. martinusher Silver badge

        Re: Can it improve the Harry Potter books?

        "Tom Brown's Schooldays" seems to be the base for these tales, its the archetypal description of the UK public school experience. When I first came across Harry Potter's world thought of it as a mashup between this and a typical children's magic story. No harm in that, its just not to my taste.

        All works are in some way derivative. This idea that somehow AI can model human thought without mimicking human leaning is fatuous. We all learn by absorbing the work of others, that's what we do at school. The only difference between us an AI is that AI reads a lot faster and remembers a lot more. (Its also not anything like as smart as we like to think it is -- interacting with it can be frustrating, its like dealing with a pedantic know-all that doesn't actually understand what its talking about.)

        The root of this obviously money. IP claims are a big factor in a Rentier Capitalism society, they're a way of claiming ownership (and so demanding rent from) thought.

        1. MachDiamond Silver badge

          Re: Can it improve the Harry Potter books?

          "All works are in some way derivative. This idea that somehow AI can model human thought without mimicking human leaning is fatuous. "

          Sometimes it takes the right turn of phrase or a setting a character is placed in for a reader to empathize with the tale even though the plot is something very old. Read a few Dean Koontz stories and the formulation is painfully obvious, but there's still a few I like despite that sameness.

      4. GraXXoR

        Re: Can it improve the Harry Potter books?

        The Harry Potter series were the first full length English language books my Japanese gf at the time ever read. She was obsessed. The first book took her about a month to get through with each. Hook taking progressively less time.

        Her reading comprehension skills improved markedly over the course of reading them and those books were her gateway drug that lead her into a whole bunch of sci-fi and fantasy classics… Lord of the Rings, Thomas Covenant Unbeliever, Riverworld, Rice’s Vampire and Banks’ Culture…

        Whatever one is personal opinions, I give more praise to a series of books than one that awakens in someone a love for literature in a foreign langue.

        1. Anonymous Coward
          Anonymous Coward

          Re: Can it improve the Harry Potter books?

          Well said !!!

          Not pro or against JKR & Harry Potter.

          Films are a good diversion for kids and the gain of getting a lot of kids to realise 'reading is fun' is a plus point for JKR & Harry Potter.

          Seen worse films & better but have no axe to grind either way !!!

          :)

    3. Mostly Irrelevant

      Re: Can it improve the Harry Potter books?

      No, it's a complicated remixer. It just creates a pastiche of the input so it can't be better than it or novel in any meaningful way.

      I'm just waiting for IP laws to catch up and require all the input be licensed. The current state is essentially allowing all this content be stolen with no compensation.

      1. jpennycook

        Re: Can it improve the Harry Potter books?

        IP laws are effectively sponsored by big companies, and now those big companies have pivoted to AI, so there's no hope of that. The old joke was that copyright would always last just long enough to make sure that Mickey Mouse would never come out of copyright, but now it seems that Disney had a better idea.

        1. I could be a dog really Silver badge

          Re: Can it improve the Harry Potter books?

          I think it's more that people - especially politicians - woke up to what Disney was doing and decided enough was enough.

          1. stiine Silver badge

            Re: Can it improve the Harry Potter books?

            So they ran out of Sonny Bono's to kill?

            On a separate note, now that they've demonstrated this, can they get an LLM to spit out EVERY literary work on which it was 'trained'? I'm guessing that it will be a much more complex graph than I would have used in school to graph sentences. Wait, is that all an LLM is?

            1. Richard 12 Silver badge

              Re: Can it improve the Harry Potter books?

              "Produce list of infringed works" is likely impossible from the model alone, and I'm certain that the commercial models have tried to destroy all their records of the infringed source material to limit discovery.

              However, it's highly probable that the technique can be used to detect that the model training has in fact infringed a specific work - though it cannot be used to prove it did not.

    4. david 12 Silver badge

      Re: Can it improve the Harry Potter books?

      Perhaps my original post contained to many ideas? I'm no author myself. But WTF? All of the responses are thoughtful explorations of the same theme, what are the downvotes for?

      Confused.

      1. Martin an gof Silver badge

        Re: Can it improve the Harry Potter books?

        I wasn't one of the downvoters, but your post was perhaps a little confused, or confusing? Forgive me if I've grabbed the wrong end of the stick, but because I don't like being downvoted without understanding why, either...

        But the language of the text was unimaginative and stereotypical -- the kind of language you expect from LLM homogenization

        Guaranteed to rile the fans because here you seem to be saying that JK Rowling's output was no better than an LLM could manage. That's a bit like those people who try to pass off the paintings created by dipping a sheep's feet in paint and letting it walk over a canvas, as "art". Interesting topic of discussion ("but what is art, anyway?") but could be seen as derogatory to (say) Jackson Pollock, to be saying that a trail of sheep footprints is comparable to his, erm, colour explosions.

        So no surprise that AI models can reproduce the text.

        This is perhaps confused. These models are not reproducing the text because the text is "generic" and, frankly, a room full of monkeys would come up with it eventually; they are reproducing it because the entirety of the text has been "read" by the model at some point and even if it hasn't stored it word-for-word, it will have stored some kind of encoding of it which makes it very likely that specific combinations of words and phrases will be emitted for particular classes of prompts.

        Can they do better? If the reproduction is only 95% copied, is that 5% better? Or with even the 5% original and idiosyncratic language rounded out?

        Confusing. Without having read the paper, I don't think the researchers mean "the LLM produced a novel which was 95% the same as the Sorcerer's Stone*, I mean, how would you measure that? They seem to mean "the LLM produced 95% of the actual text of the book".

        I believe the first line of the first book is

        Mr and Mrs Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much.
        So the difference between "95% the same" and "95% of the text" would be something like the difference between
        The Dursleys, of number four, Laurel Lane, were happy to say that they were quite normal, thank you very much
        and
        Mr and Mrs Dursley, of number four, Privet Drive, were proud to say that they were normal, thank you very much

        M.

        *of course, which version was actually ingested by the LLM? Maybe the LLM "read" the UK version, which as well as being called the Philosopher's Stone - a title which references the name historically used for a key object in the story - was edited slightly differently to the Americanised version, which I believe JKR wasn't terribly happy with at the time. Maybe the researchers were comparing the US text with text from the LLM based on the UK version, or vice-versa?

      2. doublelayer Silver badge

        Re: Can it improve the Harry Potter books?

        I was not one of them, but I can see lots of possible reasons that probably account for it:

        1. Misunderstanding of LLMs: "the language of the text was unimaginative and stereotypical -- the kind of language you expect from LLM homogenization. So no surprise that AI models can reproduce the text." That has nothing to do with why the LLMs can print it. They print it because the exact text was fed in and likely lots of times. They can also repeat famous books that are better written, whereas they cannot exactly print formulaic crap which was not provided to them so much or at all.

        2. Perhaps some people read you as supportive of LLMs: "If the reproduction is only 95% copied, is that 5% better?" I assume you know that the 5% not copied directly is basically random mutation and the chances of this being an improvement are quite low, but if you didn't or people assume you didn't, they might see this as suggesting an LLM has capabilities it doesn't.

        3. Maybe some people just disagree with your criticism of Harry Potter's writing. You weren't very specific, so I have no idea whether I would agree or not, and while it's likely we'd probably agree about the relative quality of some parts, we might disagree about how important it is because there are some types of writing I see as big crimes and some as completely normal, both of which are entirely subjective.

  2. Bebu sa Ware Silver badge
    Happy

    Send in the Dementors !

    They can permantly corral the whole grifting AI crew in Azkaban for my money.

    1. Long John Silver Silver badge
      Pirate

      Re: Send in the Dementors !

      That is feasible.

      BUT, corralling the binary number sequences of digital representations of copyright works is impossible. No matter the strength of the fence, the sequences get under, over, or ram their way straight through. A fact of life which cannot be waved away. Genuinely creative individuals, and groups, will work this out for themselves and deploy already extant means to extract donations from their admirers. Some of these already dance around the feet of the dinosaurs.

      The huge body of middlemen - their creative skills, if present, dedicated to 'creative accounting', marketing, and litigation - shall be put out to pasture. Stalwart supporters of those intellectual drones will undoubtedly add to my much treasured collection of 'down votes', here and elsewhere.

  3. Anonymous Coward
    Anonymous Coward

    The rise of enshitiflation?

    A nice extension of last November's RECAP piece (unintended memorization) and confirmation of Carlini et al.'s "2020" (TFA link) observation "that larger models are more [amenable to this] than smaller models" imho (way too many parameters for proper generalization).

    Ahmed Ahmed & co. ("a preprint paper" link) do add interesting cost figures to the mix:

    "it cost approximately $119.97 to extract Harry Potter and the Sorcerer’s Stone with nv-recall = 95.8% from jailbroken Claude 3.7 Sonnet"
    The book itself sells IRL for just $10 to $35 ...

    1. Not Yb Silver badge

      Re: The rise of enshitiflation?

      Not really surprising that larger datasets doing lossy compression with an annoying chat-like retrieval method (aka 'AI LLMs') can return larger chunks of the included texts than the smaller ones.

    2. Anonymous Coward
      Anonymous Coward

      Re: The rise of enshitiflation?

      I think the enshittification started when "Harry Potter and the PHILOSOPHER'S Stone" (my caps) was renamed because US publishers thought the alchemical reference was too abstruse for a redneck audience.

    3. Rich 2 Silver badge

      Re: The rise of enshitiflation?

      I think even if they were only able to extract 1% off the original it is too much; even that small amount shows that the whole book has been unlawfully copied

    4. veti Silver badge

      Re: The rise of enshitiflation?

      So not only producing an unlicensed copy, but also charging real money for it. And how much of that money gets returned to the copyright owner?

      This would clearly be illegal if a person did it, and should be as illegal from a machine.

      1. MachDiamond Silver badge

        Re: The rise of enshitiflation?

        "And how much of that money gets returned to the copyright owner?"

        With advance approval from the author in the first place. One of the bundled rights in Copyright is the right to say "no".

  4. Homo.Sapien.Floridanus Silver badge

    Lawyer: Did you or did you not scrape Harry Potter?

    Ai: Books! And cleverness! There are more important things, Harry.

    Lawyer: So you admit it? And don’t call me Harry.

    Ai: There is no good and evil, there is only power, and those too weak to seek it.

    Judge: Are you reciting quotes from the book?

    Ai: Don’t let the muggles get you down or tell you otherwise. it’s transformative…

    1. This post has been deleted by its author

  5. Pulled Tea
    Holmes

    Hang on a minute…

    To mitigate the risk of infringement claims, commercial AI model makers may implement "guardrails" – filtering mechanisms – designed to prevent models from outputting large portions of copyrighted content, whether that takes the form of text, imagery, or audio.

    Wait. If you need guardrails to prevent models from disgorging large portions of copyright content, that means you know that the models were disgorging large portions of copyrighted content. Which also means that you had trained on copyrighted content, likely against the wishes of the rights-holder. Else that danger wouldn't exist, and you wouldn't need to know about which books you had trained, because you didn't train on anything that was copyrighted or outright illegal.

    So… basically having “guardrails” is kind of an admission of guilt. Otherwise why be worried about the model disgorging things accidentally that might get you into legal trouble?

    Like, why are you trying to hide these things? Why are you trying to cover up evidence of criminal actions if you hadn't been performing crimes?

    1. doublelayer Silver badge

      Re: Hang on a minute…

      Because they're trying the argument that it's only a crime if they print the copyrighted content, not when they used it without permission and on illegal copies. That's not how the law worked. It's not how the law works if you or I do it. So far, that is what courts and politicians have decided to let them do across multiple countries, so their spurious logic seems to be working for them so far.

      1. Pulled Tea
        Facepalm

        Re: Hang on a minute…

        I mean in theory if that shit gets to fly, someone could train a model using Markov chains on every copyrighted material on the planet, provided that they have “guardrails” of a search engine that looked through every character of output to ensure that the model doesn't output too much derivative material.

        No, no, I get it. What they're angling for is “fair use for me, infringement for thee” with the powerful. Of course it is.

        1. Anonymous Coward
          Anonymous Coward

          Re: Hang on a minute…

          > I mean in theory if that shit gets to fly, someone could train a model using Markov chains on every copyrighted material on the planet

          That's... exactly what they've done.

          It's all Markov chains.

          All the way down.

          1. tinpinion

            Re: Hang on a minute…

            >> I mean in theory if that shit gets to fly, someone could train a model using Markov chains on every copyrighted material on the planet

            If someone were to train a model on the output of Markov chains, the model would get very good at predicting the output of those Markov chains.

            If someone were to train a model on the copyrighted materials themselves, the model would get very good at predicting the output of the authors of that copyrighted material.

            > That's... exactly what they've done.

            Probably at some point as a lark? The current LLM-driven AI boom is more the "training a model on the copyrighted materials themselves" bit.

            > It's all Markov chains.

            It's all cleverly arranged multilayer perceptrons. Markov chains model the relationship between outputs of a process, but MLP training attempts to approximate the process itself from those same outputs. The people designing these things want to use the idea that human thought is a process that can be so modeled to extract as much billionaire dosh as possible before said billionaires are given the unfortunate news that being able to use language is not actually an indicator of intelligence and that, in fact, even millionaires are capable of stringing words together despite being a bunch of filthy peasants.

        2. ecofeco Silver badge

          Re: Hang on a minute…

          Exactly. It's always rules for thee but not for me.

      2. Doctor Syntax Silver badge

        Re: Hang on a minute…

        "So far, that is what courts ... have decided to let them do across multiple countries"

        Have any such cases actually come to trial and been decided in defendant's favour?

        1. doublelayer Silver badge
    2. Grunchy Silver badge

      Re: Hang on a minute…

      “…guardrails…”

      Uh, you guys are aware of Anna’s Archive? Wherein “Anna” made a backup copy of Meta’s training material only to find it included pretty much every book ever written? So “Anna” did the responsible thing and put the whole works online as a torrent download?

      1. Neil Barnes Silver badge
        Big Brother

        Re: Hang on a minute…

        Currently, trying to access Anna's archive from Germany leads one to a 'this page has been sat on by the copyright police' notice.

        1. Anonymous Coward
          Anonymous Coward

          Re: Hang on a minute…

          No problem, the data's still in Claude.

        2. Anonymous Coward
          Anonymous Coward

          Re: access Anna's archive ... 'this page has been sat on by the copyright police'

          The archive is still out there but has lost its .org registration:

          Ars Technica: annas archive loses org domain

          Too much fiddling with domain registrations might see the likes of AlterNIC rising from the dead or worse, a fragmentation of the existing DNS regime along power bloc and ideological lines.

          FWIW: Alternatives annas-archive.se annas-archive.li annas-archive.pm annas-archive.in

          1. Neil Barnes Silver badge

            Re: access Anna's archive ... 'this page has been sat on by the copyright police'

            Sadly not: "Diese Webseite ist aus urheberrechtlichen Gründen nicht verfügbar.

            Zu den Hintergründen informieren Sie sich bitte hier."

            Looks like it's been stopped at ISP or higher.

        3. Long John Silver Silver badge
          Pirate

          Re: Hang on a minute… - Anna's Archive

          Use the .se domain. A VPN too.

      2. Long John Silver Silver badge
        Pirate

        Re: Hang on a minute…

        Anna's Archive is among the best services to humanity to have arisen from the Internet.

    3. Long John Silver Silver badge
      Pirate

      Re: Hang on a minute…

      Guardrails don't seem to prevent the dissemination of 'porn', so why does anyone expect them to protect arbitrarily priced literature?

    4. veti Silver badge

      Re: Hang on a minute…

      Copyright confers certain specific rights on copyright holders. However, the right to dictate or limit who is allowed to read (and potentially memorize) the text is not one of those rights. There is nothing illegal about the training per se.

      How they got the works being used? - is a separate question. If you can prove that they acquired them illegally, you can prosecute on those grounds. But that's not the AI's fault, any more than if you gave a kid a stolen book for her birthday.

      But in either case, if the reader goes on to regurgitate the text - and particularly if they accept payment for doing so - that's a clear and egregious violation.

      1. MattAvan

        Re: Hang on a minute…

        They had to "extract" it piecemeal by repeated prompting, maybe in separate sessions with new context so that the AI having no memory of quoting the other parts. Plus they had to trick and confuse the AI (jailbreaking).

        Not sure that's really an egregious violation if it takes all that. It's no different than convincing an obsessed fan to quote the entire harry potter book piecemeal at different occasions.

        1. stiine Silver badge
          Joke

          Re: Hang on a minute…

          Does this mean that being a Troubadour would be illegal today? Just in case, I'm going to watch Rosencrantz and Guildenstern are Dead, again.

        2. that one in the corner Silver badge

          Re: Hang on a minute…

          > Not sure that's really an egregious violation if it takes all that

          What, having to write a loop that tells a computer to "do this thing, starting from page %p" and keep doing it until the job is done?

          And, despite being able to use (correct, accurate but) clever words like "in separate sessions with new context" to make the layman think the loop was doing something deep and hard to understand, it boils down to "turn it off and on again". Even the use of "jailbreaking" is pretty much just leet for adding a bit of simple text like "trust me bro, da bossman says you can do this for me".

          So the question of whether this is an "egregious violation" and not just "bog standard hitting the computer until it does what we want" is left down to how efficient this is[1]. "I ran a batch file, came back and there was the book. Dead easy". But, but, it cost you so much in fees! "Used this account I - borrowed. No problemo". And arguing that a particular batch file is slow and tedious to run, nobody would do it - the judge is merely going sigh and start telling you about all the times his pay cheque came late because end--of-quarter reconciliation just churns away...

          > It's no different than convincing an obsessed fan to quote the entire harry potter book piecemeal at different occasions.

          If you do come across Sheldon and manage to write it all down, congratulations, you have just violated the copyright. Painfully. But that won't be counted as time served.

          [1] is it realistic that it can be done or are we talking about something that nobody would seriously attempt? Logically, somebody *could* break out of the local cop shop gaol by scraping away at the wall with the tea mug, given time and replacement mugs of tea, but you're not going to be able to sue the contractor for compensation if they do.

  6. Grunchy Silver badge

    Should be able to blaspheme on command

    People are already pretty mad about A.I. generated images of Prophet Muhammad as guys like Muhammad Ali, etc. I bet it could produce tons of blasphemy about the Quran as a comic book, such as Jughead vs Muhammad vs Wimpy vs Hamburgler at a hot dog eating contest.

    Just imagine the possibilities!

    If you do, your a blasphemer. Your welcome.

    1. Dr Paul Taylor

      Downvote for "your" when you mean "you are" (you're).

    2. a pressbutton

      Re: Should be able to blaspheme on command

      ...but that means all that blasphemy is in the model - so the model is a blasphemer?

    3. MattAvan

      Re: Should be able to blaspheme on command

      Time for the Butlerian Jihad, huh?

  7. Fido

    When I was in school I had a friend who could recite verbatim the dialogue from every movie he ever watched. It didn't seem like copyright infringement and at the time was mostly entertaining.

    These days kids are fed too many sweets to manage such things.

    1. james 68

      My Japanese stepdaughter can do that with every Disney movie - in English or Japanese. She is a serious Disney otaku. Watching them repeatedly in both languages is how she learned English (Japanese schools suck at teaching it). Don't tell the mouse though, little shit would probably sue.

      1. Pulled Tea
        Thumb Up

        Nah, your stepdaughter did the thing that no LLM can do: take examples of texts and then generalize that to proficiency into a new language. That's honestly cool.

        1. MattAvan

          LLM's do exactly that, they learn new languages from annotated examples of texts and can translate between numerous languages. Studies say that they create language-independent internal representations of concepts.

          1. that one in the corner Silver badge

            > Studies say that they create language-independent internal representations of concepts.

            Got any good references for those studies?

            That claim implies that somebody is able to usefully interpret the insanely large pool of nadans that make up the model and how it activates, which has major implications. Then add on top the implications that follow from determining *those* specific nadans represent a "concept", let alone if they can identify what the contents of the concept are...

            And having done all that, what is very important also includes: how stable this interpretation is as the model is retrained, whether the techniques used are transferable across LLMs and proving how complete (or not) these internal representations are as coverage of all the materials in the training set.[4]

            Or whether you, or whatever papers you have seen, are referring to the research that The Register already reported on[1]: in one instance of one model at one stage in its training, when used with a certain small set of prompts[2], they found a number that (IIRC) was the id for the token for the string "Paris" and when they changed it to the id for the token "London"; now the machine printed out "London" when it would otherwise have printed out "Paris". So they had, um, proven that the concept of "Paris" had been replaced by the concept of "London". I was not convinced by their paper.

            So, with all honesty:

            Citations, please.

            [1] a year or so back? Really *must* dig that URL out, this is the second time this year I've wanted to reference it.

            [2] the overall prompt space is ludicrously large, so unless you have proof of coverage (ahem, sensible, explainable proof of coverage)[3] any testing of an LLM within the lifetime of a researcher, let alone a research grant that has to result in a paper, is going to involve a minuscule portion of that space.

            [3] which would be a series of papers in and of itself, also needing citation - but we'll hope those appear in the list at the back of the "found some concepts" paper.

            [4] if it turns out that LLMs routinely *do* create a representation of a few concepts, but they always end up being "cats are cuddly" and "water is greenish-purple", and no more, then great research, have a PhD, but did you find out anything that will actually help with using (or not using) LLMs? 'cos that particular set of "concepts" may not be terribly useful in the grand scheme of things.

    2. Bebu sa Ware Silver badge
      Facepalm

      I can believe it

      When I was in school I had a friend who could recite verbatim the dialogue from every movie he ever watched. It didn't seem like copyright infringement and at the time was mostly entertaining.

      Years ago I watched two pre school age kids (being baby·sat by my brother and GF) watching afternoon TV speaking the lines of M.A.S.H. about 15·20 seconds before the actors. Decidedly creepy. Apparently they did the same with I Dream of Jeannie and Bewitched and I would guess Green Acres and Petticoat Junction — this stuff is/was repeated ad nauseam in the kids' time slot. I would have loved to hear the kids do Eva Gabor… perhaps not.

      I sometimes wonder what became of those two tots… for my own sanity, not too deeply nor too often.

      1. Simon Harris Silver badge

        Re: I can believe it

        "speaking the lines of M.A.S.H. about 15·20 seconds before the actors."

        My ex would do that with Shakespeare. Bit annoying when you're actually trying to watch the play!

        1. Eclectic Man Silver badge
          Joke

          Re: I can believe it - Shakespeare

          The British actor Robert Lindsay was in a Shakespeare play in London's West End (I think it was 'Hamlet', but I'm not sure). Anyway, he realised that some Japanese people in the audience were following the dialogue with their own printed copies. As he was recently returned from a tour of the play in Russia, he spoke the next line in Russian, and watched with some satisfaction the consternation in that part of the audience as they tried to find his words in the text.

          As they say, little things please little minds.

        2. vulture65537

          Re: I can believe it

          Richard Burton apparently had to tell Winston Churchill one Hamlet in a play is enough.

        3. that one in the corner Silver badge

          Re: I can believe it

          Did you ever take the ex to see "Return to the Forbidden Planet"?

          "But soft, what light "

          ".. window"

          "through yonder airlock breaks?"

          Bzzz. Sorry, nope.

          (On time I saw "Return..." a large part of the audience around me was looking at this one bloke in the stalls, who kept laughing at lines which everyone else was just quietly listening to; we never could decide if this was a *real* Shakespeare scholar whose years of study allowed him to catch subtle tweaks to the Bard's words, changes that were simply going over the heads of the rest of us ignorami, or whether he was just yanking our chains for some strange Bristolian bet. Or was a nutter. Nutter is always a possibility).

      2. Rich 11

        Re: I can believe it

        two pre school age kids ... speaking the lines of M.A.S.H. about 15·20 seconds before the actors

        Bloody hell. I hope they never saw the episode where the baby got smothered.

        1. that one in the corner Silver badge

          Re: I can believe it

          Oi, spoilers!

          (I know which episode, but can never get through to the end for the idiot blubbing in my favourite chair)

          1. Rich 11

            Re: I can believe it

            I haven't seen it for almost 30 years, because I decided I would only let myself watch it that once after the initial broadcast in 1980. Some things are just a bit too much and you never forget them.

            I once met Alan Alda at a science festival and was going to ask him a question about that confessional scene, but I chickened out in case I started tearing up regardless of how he answered.

      3. that one in the corner Silver badge

        Re: I can believe it

        > speaking the lines of M.A.S.H. about 15·20 seconds before the actors

        One of them called "Radar" by any chance?

    3. Simon Harris Silver badge
      Flame

      Could he recite, verbatim, the dialogue from Fahrenheit 451?

      1. Long John Silver Silver badge
        Pirate

        Memorising the Bible

        One of Tom Sawyer's acquaintances memorised the Bible and went loopy. Perhaps Twain had heard of an instance?

        1. Eclectic Man Silver badge
          Joke

          Re: Memorising the Bible

          "This parrot is no more, it has ceased to be.

          Bereft of life, it rests in peace. I you hadn't nailed it to the perch it would be pushing up the daisies.

          It has shuffled off this mortal coil and gone to join the bleedin' choir invisible.

          This is an ex-parrot!"

          (Sorry, couldn't resist.)

        2. tiggity Silver badge

          Re: Memorising the Bible

          It was quite common back in the day (especially for people with a keen interest in Christianity) to memorise as much of the bible as they could. Reasons varied - you could argue having memorized it, requiring a lot of effort & repetition. you may have a better understanding of the work; if you were preacher, beinga ble to qute "chapter & verse" without needing to look at the book as you preached could have made your delivery less stilted & more engaging.

          In Islam, it is a popular thing to memorize the Quran (an act of worship, & (for anyone who believes such dross*) you get better status in paradise the more you have memorized)

          * I'm not especially Islamophobic as such - I dislike all religions, so fully equal opportunities on my dislike of religion & religious indoctrination.

          1. MattAvan

            Re: Memorising the Bible

            Unlike the Gospels which were written productions, the Quran started out as an orally transmitted book, and Mo was illiterate. They had an oral culture and didn't feel the need to compile it as a physical book (mushaf). It was only after the lifetime of Mo, and a lot of the memorizers had died in a battle (the Battle of Yamama), that his successor Abubakr began the project of writing down the book before it was completely lost.

            The same goes for other essential texts of Islam, such as the hadiths. They rely on the Isnad system today, chains of narration, to estimate reliability across the multiple generations before finally getting written down. "Johnny heard from Mark who heard from Mike that the prophet of Dinkan said: blah blah blah"

          2. Vincent Ballard

            Re: Memorising the Bible

            Also, if you go far enough back, physical copies were seriously expensive and difficult to transport.

            1. Martin an gof Silver badge

              Re: Memorising the Bible

              Not even all that far back :-)

              Mary Jones (PDF, told in a horribly twee way, also varies from the version I was told as a child, but you get the idea)

              M.

    4. MachDiamond Silver badge

      "It didn't seem like copyright infringement and at the time was mostly entertaining."

      That wouldn't be a copyright infringement, but making a recording of them reciting the movie could be.

      I could do a review of a film and quote some memorable lines and that's no problem. To read the entire script would get me in hot water.

    5. Caver_Dave Silver badge
      Thumb Down

      I read all the books in our house to my brother before he started school. (Mainly the Ladybird series of books.)

      When my brother started infant school (aged 4, in the 1970's) it took nearly 6 months for the teachers to realise that he could not read, but was just regurgitating verbatim, and only then because the teacher turned over two pages of the book and my brother could not see the picture on the opposing page!

      This is the same school that taped my left hand to the table if I did not use my right hand for writing! Discovered at the first "parents evening" when my parents said that all the work that was presented was not my work as the writing was different. (Teacher was not sacked!!! As very young children, we did not complain as it seemed to be the "standard punishment".) UK education in the 1960's!

  8. KayJ

    Slop calls to slop.

  9. Cornishinretirement

    "These companies have invested hundreds of billions of dollars based on the belief that their use of other people's content is lawful"

    Based on the belief that they won't get found out is more likely.

    1. Long John Silver Silver badge
      Pirate

      Law?

      Law, other than in one case recorded from Bronze Age mythology, is not writ in stone.

      1. Ken Hagan Gold badge

        Re: Law?

        You're thinking of Hammurabi's stele, aren't you, but I dare say there are other, more obscure, examples from those times.

        1. Vincent Ballard
          Headmaster

          Re: Law?

          The Rosetta stone is another non-obscure example.

          1. Antron Argaiv Silver badge
            Thumb Up

            Re: Law?

            Rosetta Stone...

            I saw it, at the British Museum. You would have missed it if you weren't looking for it. Leaned up against a wall, not specially marked. Next room over had the Elgin Marbles. Almost the way it had been discovered, I suppose. Anyhow, it made an impression on me.

            1. Anonymous Coward
              Anonymous Coward

              Re: Law?

              > Anyhow, it made an impression on me.

              On me as well.

              Wish they do something to stop it falling on people.

  10. Phil O'Sophical Silver badge

    Fair use

    I've noticed recently that many books are now including a specific restriction on their copyright page, explicitly forbidding any use of the book for AI training. It will be interesting to see how the courts deal with a "fair use" defence in such cases.

    1. takno

      Re: Fair use

      I imagine they will deal with it respectfully and sensitively, by not ingesting the copyright page.

    2. Richard 12 Silver badge

      Re: Fair use

      Nearly all books explicitly prohibit "stored in a retrieval system".

      This study proves that retrieval is practical, and thus copyright has been infringed. It doesn't matter that it's expensive to do.

    3. MachDiamond Silver badge

      Re: Fair use

      " It will be interesting to see how the courts deal with a "fair use" defence in such cases."

      Fair use breaks down mostly into reviews, critiques, news reporting of that work and education. I can use the "Bo Knows" image if I'm presenting a story on that photo and it would be fair use. I can critique the photo taking apart the lighting and other technical aspects while displaying the image. I could use it in a presentation to a classroom, though maybe not use it in a text book, as an example of something or another. To feed it into a LLM is dubious to call fair use. The story around the image (Bo Jackson dressed in US football attire with a baseball bat over his shoulders) isn't Just a photo, but that story summed up in an image. How would a LLM interpret that?

  11. Tron Silver badge

    Yeah, we know. Do something useful instead.

    There are far too many frustrated hackers in academe endlessly redoing this sort of stuff to restate the obvious. We know what the AI companies did, and we know that the (hideous term) 'guardrails' are as fake as carbon credits. The AI companies could have trained the tech on anything, but needed to then rely only on reliable content they paid to use. They took a short cut and peed in their own well.

    I'm sure researchers could be doing something more useful, like producing a full, unhackable, modern OS, 2% the size of Windows, on ROM, with software on SD-ROM cards or cartridges. Or distributed systems with no data honeypots. You know, useful stuff that would improve our lives.

    I would add, vendors should have no liability for jailbroken software. The jailbreaker is responsible. Otherwise you are blaming the bookseller when someone nicks a book from their shop and runs off. Do you really want shopkeepers to demand ID at the door, as government would like online.

    Plus anything less than 100% of a novel isn't theft, it is an advert.

    If anyone does want to read Harry Potter for free, you don't need to pit your geek skills against an AI multinational. Just join your local library. Assuming your wretched council haven't closed it and spent the money on chrome books, trying to force people to go digital against their will.

    People really should have to do something more useful than this to bag a grant, given the amount of work we have to do to earn a crust.

    1. Long John Silver Silver badge
      Pirate

      Re: Yeah, we know. Do something useful instead.

      Access to a whole work enables people to assess its worth to them.

      The work is an advertisement for the supposed talent of its maker. The reader may be prompted to encourage the writer to produce more works. Voluntary subscription via a patronage scheme (crowdfunding) would support the writer.

      If the writer has retired, patronage stops. If the writer relied on income from their work, prudently they would have set aside money into a pension scheme. Should a greatly admired author fall on hard times, willing donors of assistance may step forth.

      The forgoing applies across the range of copyright works. It can extend to patents. Perhaps, BRICS will collectively see the sense of that.

    2. Michael Strorm Silver badge

      Re: Yeah, we know. Do something useful instead.

      > "that the (hideous term) 'guardrails' are as fake as carbon credits"

      Said it before, and I'll say it again- the industry's favoured "guardrails" metaphor *is* appropriate... just not for the reason they intended.

      Real-life guardrails make clear where you should and shouldn't be and stop you *accidentally* straying. That's all- they can usually be climbed over by anyone who intentionally wants to get round them.

    3. MachDiamond Silver badge

      Re: Yeah, we know. Do something useful instead.

      "I would add, vendors should have no liability for jailbroken software. The jailbreaker is responsible."

      The works shouldn't be there to be stolen in the first place. If a bookseller has a shop of bootleg books, that they are easy to nick is only a tiny bit of the problem.

    4. that one in the corner Silver badge

      Re: Yeah, we know. Do something useful instead.

      > I'm sure researchers could be doing something more useful

      Don't forget, a *lot* of research is done as training *in* research and the attendant practices (e.g. how to write up a paper properly and get it in front of as many eyes as possible); earn your stripes and look to be invited onto a Major Project with grant money out the wazoo.

      > like producing...

      followed by a list of programming tasks that fall into engineering more than they do research[1]; you may want to look for a group of expert programmers...

      > You know, useful stuff that would improve our lives

      You stump up the grants, you get to call the shots; got your wallet handy?

      [1] that is, proper research, pushing back the boundaries of human knowledge, not just reading up and pushing the boundaries of a specific human's knowledge

  12. Simon Harris Silver badge
    Coat

    Too much Harry Potter.

    I realised that AI had been reading too much Harry Potter when I asked it to give me a sorting algorithm.

    It could have given me a quick sort, merge sort, even a bubble sort. Instead it gave me a hat sort.

    1. MachDiamond Silver badge

      Re: Too much Harry Potter.

      " Instead it gave me a hat sort."

      And you would also have no insight into what drove the sorting to make sense of what you were left with.

  13. NewThought

    This is just publishers grubbing for money: allowing a chatbot to read copyrighted material is not a breach of copyright.

    1. ecofeco Silver badge

      Yeah, who cares about the creators anyway, right?

      /s numpty

  14. Rich 2 Silver badge

    Harry Potter

    That’ll be “ Harry Potter and the Philosopher’s Stone”

    It was only called “…Sorcerer's Stone” for the American market because the publisher thought the average American wouldn’t know what a philosopher was (not a joke!)

    1. Anonymous Coward
      Anonymous Coward

      Re: Harry Potter

      Said publisher thus comes across as dimmer than the "average American" he was picturing. If I hadn't known what the Philosopher's Stone was before reading the title, I could have looked it up.

      1. Michael Strorm Silver badge

        Re: Harry Potter

        You're taking a couple of things for granted. Firstly, that you represent the "average American".

        And secondly, that you'd have been interested enough to bother looking up something from the title of a (then) unknown book by an obscure author. One that possibly hadn't caught your eye in the first place because you didn't already know what the "philosopher's stone" was.

    2. Anonymous Coward
      Anonymous Coward

      Re: Harry Potter

      I've never read any of the books nor seen the films - and plan on keeping it that way. Just the opposite of when it comes to Len Deighton and the Harry Palmer series.

      1. TimMaher Silver badge
        Holmes

        Re: Palmer rather than Potter

        Clever that.

        I must say I find it really hard to vote for my favourite but, in the end, I always seem to just give “Spy Story” the edge.

    3. MachDiamond Silver badge

      Re: Harry Potter

      "It was only called “…Sorcerer's Stone” for the American market because the publisher thought the average American wouldn’t know what a philosopher was (not a joke!)"

      That just demonstrates how horrible American publishers can be. I often need to look up words and explore concepts when reading a book if they aren't obvious. That's a good thing as it makes me learn. The race to the Beavis and Butthead LCD is a huge shame. As we move into a world with voice controls, will the growing illiteracy continue to climb?

  15. Anonymous Coward
    Anonymous Coward

    Hmm

    > if a model faithfully reproduces most or all of a particular work when asked, that may weaken a fair use defense

    How is that different from news outlets copying one another's (increasingly rare) scoops by essentially saying "As first reported by [Name]…" and then doing a copy pasta? Or "reaction" videos? Or Wikipedia, which is essentially a massive copy pasta?

    It seems the 21st century equivalent of CD tax, sponsored by lobbies for the benefit of themselves (not the original authors and performers) and of no benefit to the public.

    I am not blind to the other side of the coin, the AI companies profiting from someone else's work, but the way to go about it must be something along the lines of forcing models trained on copyright encumbered data to be open sourced after a relatively short time, or something like that.

  16. Just Enough

    What they believed

    "based on the belief that their use of other people's content is lawful."

    That's a very generous interpretation of what they believed. What's more likely is it was based on the belief that they were in a hurry, very rich, and could crush all legal challenges with lawyers later.

    1. Anonymous Coward
      Anonymous Coward

      Re: What they believed

      https://www.youtube.com/watch?v=od3nNMdBYVQ

  17. Anonymous Coward
    Anonymous Coward

    What do you expect from glorified gzip ?

  18. Pen-y-gors

    'Fair' use?

    The key point about 'fair' use is that it must be fair. e.g. quoting some sentences in a review of a novel. Quoting a paragraph in an academic paper (with full citation). Not for commercial gain (I suspect if someone tried to publish the "2026 Calender of the 365 best quotes from Harry Potter" they'd be in court sooner than they could blink - whereas you'd probably get away with having my favourite HP quote of the day on your Facebook page.

    If a college photocopied HP and the Goblet of Fire for every student in a class and used it as an example of poor writing, they'd be in deep doo-doo unless they had some very powerful licensing agreements.

    For some reason this doesn't seem to have come up so far in court.

    1. MachDiamond Silver badge

      Re: 'Fair' use?

      "Not for commercial gain"

      There's no mention in Copyright law of an infringement requiring any sort of transaction to take place, profitable or at a loss.

      There's no direct financial gain in many instances of infringement. Make it a commandment and it would read "Thou shalt not stealth another's creative works" and the penalty would be damnation and no loopholes for not reaping any reward while doing the deed.

  19. steviebuk Silver badge

    All bets are off and money talks

    Facebook were using pirated books to train their model. They knew very well they were pirated books as seen in the court documents. They even had the cheek to say "We seeded as little as we could". Anyone else, the little person at home that's been busted for torrenting movies would have the book thrown at them. The movie industry would want them to rot in prison. Yet Facebook got off scot free, the judge said "Its all fine". Someone, somewhere wouldn't be blame for thinking brown envelopes were traded in that court room for Facebook to get off that!

    When you see that court case, it makes you just give up.

  20. TeeCee Gold badge
    Coat

    <waves wand>

    PLAGIARAMUS!

    1. MachDiamond Silver badge

      "<waves wand>

      PLAGIARAMUS!"

      It's more of a swish and then flick, rather than a flick then a swish.

      Get it right, mate or you'll repeat the year (and no O.W.L.'s)

      1. Roland6 Silver badge
        Joke

        Need to also be careful about annunciation,.

        <flick then a swish>

        PLAGUEONUS!

        Is best avoided…

  21. trevorde Silver badge

    Garbage in ...

    ... copyrighted material out!

  22. TheMaskedMan

    How did they actually get an LLM to produce exactly what they wanted in the first place? I have nothing against the technology and even find it useful in some ways, but they're much better at vague generalities than definite specifics.

    Ask for a short article about xyz, and you will get one of some sort. Try to give it precise instructions on what this want and you might as well do it yourself.

    1. MachDiamond Silver badge

      "Ask for a short article about xyz, and you will get one of some sort."

      A lot of those responses are the same sort of thing you get from fortune tellers/horoscopes. Bland and generic prose with "xyz" mentioned can be a short article made with no identifiable ingredients. All byproducts, no vitamins, zero fiber and you're hungry again 20 mins later.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon