back to article OpenAI asks Uncle Sam to let it scrape everything, stop other countries complaining

OpenAI wants the US government to ensure it has access to any data it wants to train GenAI models, and to stop foreign countries from trying to enforce copyright rules against it and other American AI firms. Burning copyright symbol. Photo by SHUTTERSTOCK Writers sue Anthropic for feeding 'stolen' copyrighted work into …

  1. Mike 137 Silver badge

    Being too demanding?

    "removal of known blockers to the adoption of AI tools, including outdated and lengthy accreditation processes, restrictive testing authorities ..."

    After all, you don't have to prove it actually works, all we need is for you to adopt it.

    1. HuBo Silver badge
      Windows

      Re: Being too demanding?

      I'm surprised Kolochenko didn't pick up more on this given his creds at ImmuniWeb and in National Centers of Academic Excellence in Cybersecurity (NSA, CISA, FBI, NIST, NICE, NSF, DoD-CIO, USCYBERCOM) through Laurel MD's Capitol Technology University.

      I'd expect insecure agentic AI to have just about as high a potential for thoroughly messing up the Fed's cybersecurity as insecure Musk's bedraggled DOGE cyberattacks. After mElon, and the Manchurian Orange, it looks like both Altman and Pichai might have metamorphosed into unwitting tools of our most nefarious of cyberadversaries!

      1. Groo The Wanderer - A Canuck

        Re: Being too demanding?

        Altman and the rest of the Drumpf monkey boys need to wake up to the fact that the US does not rule the world and does not get to dictate law and policy to other nations...

        1. navarac Silver badge

          Re: Being too demanding?

          Trumpton and his con-men bully boys need to be told to piss off in no uncertain terms.

          (and stop his State Visit to the UK for starters. That'll upset Trumpton's ego big time.)

  2. Mentat74
    WTF?

    Hi ! We want to steal everything...

    And then sell it back to you...

    1. Joe W Silver badge

      Re: Hi ! We want to steal everything...

      Yes because paying for stuff is economically unfeasible...

      So let me just pirate all software, get all music from Napster (ok, that's no longer a thing, isn't it?) because actually buying stuff is economically unfeasible for me.

      1. Long John Silver Bronze badge
        Pirate

        Re: Hi ! We want to steal everything...

        You omit mentioning that the stuff people wish to 'steal' is sold at a price arbitrarily determined by the vendor. This despite digital 'content' lacking intrinsic scarcity, and anyway, lying around on the Internet for anyone with nous to discover and copy.

        Analogy with genuine 'theft' of tangible objects breaks down immediately. You may refuse to sell someone your Rembrandt, this regardless of the price offered. If someone steals the painting, you are derived of its use as an ornament. Only one person at a time can possess it.

        Suppose the thief is a kindly soul and returns the Rembrandt to you. Further, imagine the thief possesses technology enabling manufacture of perfect copies of a painting; copies indistinguishable by all known physical and chemical tests. The thief distributes copies among his acquaintances. You have lost nothing other than indisputable evidence of your painting's provenance.

        It should be obvious where this leads.

        1. FF22

          Re: Hi ! We want to steal everything...

          And you're wrong about everything, even about the digital theft thing. Because when you steal an intellectual property, that automatically devalues every other copy of the same thing, despite the original owner or all other licenced owners still also retaining their copies. That's just how supply and demand works. So, you're still stealing value from others.

          Also, and again, because of rules of supply and demand, tangible objects don't have an intrinsic value either. Their value depends completely on supply and demand, and their sale price is also set arbitrarily by the seller.

          I mean there was literally not a single word in your whole comment that would be actually correct.

          1. find users who cut cat tail

            Re: Hi ! We want to steal everything...

            You might have been brainwashed by Wall Street beyond the point where deprogramming is possible. But well…

            First, making someone else's stuff lose value (or gain, but let's focus on losing) is something you do a thousand different ways every day. None of them is stealing (unless you are an actual thief). If you want to call some action stealing you need arguments why it is *specifically* stealing. You have given none.

            Second, you argue from a purely late-capitalism-in-the-US viewpoint of what does it mean to own things (and thus steal things). Which is far from universal and frankly pretty arbitrary and bizarre. The societal conventions around ownership rights have developed in certain ways, and now people think they own numbers, colours, volcanos and other people's DNA sequences. Very little of it makes sense. Most of it is also very difficult to reform… But once it was common to own other people and we have managed to get past that – by going against the status quo, mind you. So maybe there is a hope for ‘intellectual property’ too.

            1. Rich 2 Silver badge

              Re: Hi ! We want to steal everything...

              Wall Street?

              Is that in America or somewhere?

              1. navarac Silver badge

                Re: Hi ! We want to steal everything...

                >> Wall Street <<

                Remember walls come down. Remember a famous case in 1989, that no one saw coming. East German and Soviet authoritarian Bigwigs were powerless to stop it. KGB Putin is still pissed off about it. Trumpton and his con-men won't fare any better, come the day.

          2. Long John Silver Bronze badge
            Pirate

            Re: Hi ! We want to steal everything...

            You don't appear to comprehend the simple fact that anything portrayable as a binary digital sequence, once placed in the binary domain, is devoid of monetary value; that despite the amount of money/effort that went into its making. Its value on a different metric could be immense, e.g. cultural worth.

            Nevertheless, a digital sequence can have indirect monetary value within a market. Foremost, it establishes the credentials of creative individuals and groups competing for patronage enabling further creative works.

            Additionally, it can provide a focus for the sale of 'added value' products and services, e.g. keepsakes and, access via a patronage platform, to the sequence's creator. Another tangential monetary value may lie in applying the content of the sequence to manufacturing physical items (via recipes or blueprints) or aiding the provision of a tangible service (via software); in this circumstance any exclusive 'right' defined by law is rapidly being vitiated, e.g. advanced 3D-printing (from digital recipes) shall abound in cottage industries around the globe; similar technologies are being developed for pharmaceuticals and 'wet' biological concoctions.

            Nevertheless, those people who are more strongly motivated by acquiring wealth than indulging their innovative whims further, can strike whilst the iron is hot and create a market for specific tangible and cultural products. Whether they can retain a competitive edge depends upon perceptions of quality and price.

            Taken further, a sequence may contain seeds for the creation of new strings of digits, which form the starting point for cultural advance or for marketable goods and services. A process known as 'derivation'. So-called 'ownership rights', other than entitlement to attribution, impede derivation from material and cultural artefacts.

            In essence, markets shall persist for tangible goods and services. Economic theory pertaining to supply/demand, scarcity, competition, and price discovery persists. However, intangibles drop from its purview, along with all monopoly entitlements. Anticipate profound shifts in the nature of trade, national economies, individual access to and for use of resources, and personal expectations. I posit these will be driven by former colonial nations unhitching from a yoke of demands, and international conventions, conceived by, and for the benefit of colonial masters.

            The most wayward ex-colony of all, the USA, fights tooth and nail to preserve a system of illusory 'rights', initially formalised in the UK, and carefully embroidered by the USA over two and a half centuries. The intellectual and cultural renaissance in the offing will eventually benefit ordinary residents of the USA.

            1. doublelayer Silver badge

              Re: Hi ! We want to steal everything...

              All that waffle, and all you have to say to prove it wrong is:

              1. Digital sequences, like any other product, require cost in time and often in money to create.

              2. Patronage is a pipe dream you should stop having, because everyone else here doesn't get patrons whenever they have an idea.

              3. At the moment, those things do, in fact, have monetary value because copyright exists. You will not prove it wrong by insisting that it does not. To convince people, you need to argue that it should not, and I don't have much hope that you will.

          3. JamPacked

            Re: Hi ! We want to steal everything...

            There are defensible arguments for the enforcement of an intellectual property regime, but this one is incoherent. Intellectual property is also “automatically” devalued in this way when I buy a competitor's product instead. Or when a copyright expires. Or when I don't make a purchase. Or when I haggle. The specific harm you're alleging here is “possibly lowering the demand curve for a product.”

            Simply lowering a demand curve is not a wrongful injury against the entire class of license holders. Give me a break. And even with regard to the IP holder, the alleged harm in any individual case rests on a wild assumption that the act of piracy would've been a purchase. “Not purchasing something” does not create a different demand curve than “not purchasing something”.

            > Also, and again, because of rules of supply and demand, tangible objects don't have an intrinsic value either....

            Non-rivalrous vs rivalrous goods is a real distinction that should be reflected in good policy, actually. The harm done to a shop owner by stealing a candy bar is a bad analogy for the harm done to societal creativity by IP not being compensated. The Internet Archive's digital library is not analogous to an organized ring of thieves going around the country and stealing merchandise from bookstores. This absurdity is why civilized countries focus on IP infringement that generates revenue (such as OpenAI) rather than punishing individual acts of quote unquote “theft” of non-rivalrous goods as if they're candy bars.

        2. aelfheld

          Re: Hi ! We want to steal everything...

          Don't like the price, don't buy it. Intellectual property is still property.

      2. Anonymous Coward
        Anonymous Coward

        Re: Hi ! We want to steal everything...

        > Yes because paying for stuff is economically unfeasible...

        Running an ad blocker?

        Just sayin'.

        1. Joe W Silver badge

          Re: Hi ! We want to steal everything...

          Yes, because my computing resources are mine. I do not allow just anybody to run their software on my machine. Just sayin' And I do pay for subscriptions to select content.

          1. Anonymous Coward
            Anonymous Coward

            Re: Hi ! We want to steal everything...

            But the computing resources necessary to create and deliver the content, as well as the support staff, and the talent aren't yours, are they?

            Don't content creators need to get paid?

            Of course, if you restrict your viewing and use to entirely to paid subscriptions then you're off the hook.

            Renewed your subscription to The Register yet?

            1. Anonymous Coward
              Anonymous Coward

              Re: Hi ! We want to steal everything...

              I bet you're fun at Lan parties.

              1. Anonymous Coward
                Anonymous Coward

                Re: Hi ! We want to steal everything...

                Tons.

                I may be dishonest but at least I don't rationalize my conduct.

                Doing the advertisers a favor by not getting mad at their ads? Right.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: Hi ! We want to steal everything...

                  Who gets mad at their ads? I haven't seen ads in years and why should I?

                  If you believe your content is worthy then make me pay for it. If it is then I'll pay for it. Adverts are inherently dishonest. I don't agree to that and I never did. Why should I be forced to look at something I don't want to see? If your business model is built around adverts that's not a me problem that's a you problem.

        2. doublelayer Silver badge

          Re: Hi ! We want to steal everything...

          My choice to remove some of the content before I read it is not the same as expecting to receive free copies of anything I like. For similar reasons, it is allowed if sites choose to detect my use of an ad blocker and refuse to send me the page. I won't be happy, but they can do it. My use of an ad blocker is also directly related to the advertising frequently being malicious in a way that harms me and the site on which the ads appear. These are not the same, and the argument is frequently misused by two groups of people, those who want to treat tracking as legally binding even when it may itself break the law, and people who wish all copyright infringement to be legal, to defend their incorrect points.

          1. Scotech
            Thumb Up

            Re: Hi ! We want to steal everything...

            Never mind that a responsible ad that doesn't steal my computing resources at cost to me can be delivered simply by paying the site owner a fairly negotiated fee to embed the advertiser's content into the publisher's own content. Its a novel idea, I know, but interest-based advertising is actually still possible without all this auctioneering and tracking and data brokerage, and it could actually deliver better value for content creators too. In fact, El Reg seems to be leading the way on this, judging by the red top these days!

        3. Doctor Syntax Silver badge

          Re: Hi ! We want to steal everything...

          My running an adblocker is to the benefit of the advertisers (not the advertising industry) because if I don't get their ads pushed in my face I won't get pissed off by them and if they're selling something I want I won't actively avoid them.

          1. Scotech

            Re: Hi ! We want to steal everything...

            I've actually closed accounts with companies in the past because of annoying and persistent ads that completely oversaturate my media. Most recent was IKEA, which seemed to be buying every ad slot going for about a week. I'd previously spent a small fortune with them on account of moving into our first purchased home a few years ago, and needing to buy basically everything. Around here it's quite normal for rented accommodation to come fully-furnished, often even down to the cutlery, so we had very little stuff of our own and it's one of the quickest, cheapest and easiest places to just go buy everything needed to kit out a whole house in one run. So I presume that they managed to pinpoint me as a previous big spender who'd gone cold on them and were attempting to rope me back in by targeting me, but boy did that backfire. I don't know who thinks this kind of thing up, but pissing off your customer base with the digital equivalent of daily junk-mail being fly-tipped across their front lawn is NOT good for brand loyalty!

        4. Irongut Silver badge

          Re: Hi ! We want to steal everything...

          I have no problem with adverts, my schooling was paid for by them after all. What I have a problem with is tracking. The print ads that paid for my childhood did not follow me around spying on what I did when not reading a magazine or newspaper. While the online ad industry continues to be more intrusive than the KGB I will continue to use an ad blocker with no remorse.

      3. Czrly

        Re: Hi ! We want to steal everything...

        > Yes because paying for stuff is economically unfeasible...

        Arguably, at the consumer level, paying for stuff is not only economically unfeasible, not only unfeasible in general, but impossible. I have a long list of books I would love to read but cannot, simply because I'm planning a move so I don't want to buy a hard-copy to add to my existing packing problems and buying those titles as DRM-free ebooks that will work on my Tolino (kobo variant) device is impossible in every way. They're newer titles with living authors who I support and I would love to pay those authors but I just cannot.

        The rest of the world has a very simple counter to the AI bros: all they have to do is say that violation of copyright by US companies will result in international abandonment of copyright treaty – and, similarly, for patents. This would also include dropping international enforcement of DMCA section 1201 and refusing to respect US copyrights and enforce them against people not in the EU – even if their IP addresses appear in torrent swarms. Tit for tat.

        Of course, this would only carry weight if the AI bros were in favour of copyright at all and they are not. Their end-game is the total rubbishing of copyright across the board because they are betting that it is obsolete. There is no need for copy protection for content that can be extruded at scale, on demand, without the need to pay creators to make it in the first place. Their bot-shit doesn't benefit from copyright protections because they are selling its excretion-as-a-service, not the artefacts excreted.

    2. kmorwath

      Re: Hi ! We want to steal everything...

      The "Dispossesion Cycle" (copyright Zuboff) "a new economic order that claims human experience as free raw material for hidden commercial practices of extraction, prediction, and sales", "“an expropriation of critical human rights that is best understood as a coup from above: an overthrow of the people’s sovereignty.”" (The Age of Surveillance Capitalism).

      EU instead of tariffs on bourbn whiskey or large, stupid motorcycles for a few fanatics, should hit hard exactly here - forbidding US companies to take advantage of EU data.

      And OpenAI should learn that US laws are valid in US only. USA can't make laws abroad.

      1. Anonymous Coward
        Anonymous Coward

        Re: Hi ! We want to steal everything...

        >And OpenAI should learn that US laws are valid in US only. USA can't make laws abroad.

        Plenty more room for 51st state......

        1. Long John Silver Bronze badge
          Pirate

          Re: Hi ! We want to steal everything...

          The USA has its Marines.

      2. Long John Silver Bronze badge
        Pirate

        Re: Hi ! We want to steal everything...

        How may the EU prevent people in the USA from harvesting whatever they want from the Internet?

        1. kmorwath

          Re: Hi ! We want to steal everything...

          By enforcing copyright, GDPR, Digital Act, antitrust, and fining heavily those who break the rules?

        2. The man with a spanner Bronze badge

          Re: Hi ! We want to steal everything...

          If you are aloud to steal my IP, say a book that I wrote and flagged as copyright, then I don't see why I can't avail myself of your IP, say the fine details of your world leadingTesla autonomous driving software.

          Note: some irony might be present.

          1. Irongut Silver badge

            Re: Hi ! We want to steal everything...

            No there's a thought. OpenAI can have all our copyright data but since their LLM was trained on our data it belongs to us and OpenAI can't charge for it.

            Problem solved. Sam 1000 can thank me later.

        3. Scotech

          Re: Hi ! We want to steal everything...

          By treating it as a violation of the Berne convention and revoking EU recognition of select categories of US copyrights in response? If US tech IP was suddenly no longer protected in the world's third-largest market by purchasing power, at the same time as it's starting a trade and IP war with the world's largest, that's going to damage the US a whole lot more than it would damage the EU. The US has an interest in keeping the existing status quo regarding IP in place, given that the US tech industry depends on international copyright law to maintain its competitive advantage. Look at Microsoft's difficulties in enforcing its IP rights in China through the 90's and 00's, prior to the Chinese government backing them up, and you'll get a picture of the potential scale of the problem. The switch to cloud services has insulated big tech a little from those issues, but just look at the consternation DeepSeek's emergence on the scene caused OpenAI - their cloud delivery moat isn't going to save them from determined competitors if they can't also back up their proprietary rights in court.

      3. doublelayer Silver badge

        Re: Hi ! We want to steal everything...

        OpenAI should also learn that what they're doing isn't legal in the US either. Fair use has limited allowed uses. Taking the entire content for commercial purposes isn't one of them.

        1. Anonymous Coward
          Anonymous Coward

          Re: Hi ! We want to steal everything...

          There are 2 parts to this, one is the fair use of content that can be viewed without a license but is copyrighted, so reusing that content in a fair use manor. The second is how the copyrighted material that is not available without a license that is to be paid for was obtained.

          If Meta wins the case against them, they downloaded and used a massive load of copyrighted material without paying for any of it, then why should anyone need to pay for anything (we are just training out brains with it too).

          They are doing it for profit, using other peoples work, without paying for it. No doubt OpenAI (and all of the LLMs) have done exactly the same thing, using copyrighted material that is not available without purchase.

          Even if they get the fair use exemption using it, how can they still not be found liable for copyright infringement, they never paid for it, or loaned it from say a library.

          They still made a duplicate of that product without permission before duplicating it again within the training of the 'AI', which is where the fair use would have to apply, it being the thing creating derivative works.

      4. Doctor Syntax Silver badge

        Re: Hi ! We want to steal everything...

        "USA can't make laws abroad."

        Yes, but who's going to tell them?

        1. Anonymous Coward
          Anonymous Coward

          Re: Hi ! We want to steal everything...

          Right on! Used to be the Soviets, or Russia, or something, but now that they share that same sweet glam canopy bed with the bitter Orange one ... we're screwed ... so screwed ... the coffin's shut real tight on this one, screwed and bolted shut ... kneed to the neck ... can't breathe for nuttin' ... the stench is unbearable ... turn back time a few months, please ... and let us out!!!

        2. Steve Davies 3 Silver badge

          Re: es, but who's going to tell them?

          Donald Trump 2.0 (aka the Dictator King) would just annex the country playing hardball.

          Once he has finished with Canada and Greenland, who's next? Iceland? the UK? (so that he can come and play golf after locking up every protestor within 100 miles)

          Slip him enough crypto and he will do anything (yes even that if the video is real) for you.

        3. Andrew Penfold

          Re: Hi ! We want to steal everything...

          Maybe not but they can and have "exported" their laws to 3rd countries as a condition of various free trade agreements. Including Copyright (but no fair use exemptions) and maybe the DMCA restrictions on disabling "technical measures" (but no exemptions for you, even if in the public interest).

          This was discussed here at the time, and the consensus was that fair use is a case law doctrine rather than statute, so it cannot be exported to any 3rd country. It was seen as quite unfair on the 3rd countries (although not by "everyone's a freetard" Andrew Orlowski)

          Anyway, funny how karma comes around to the US (ok US companies)

      5. prh99

        Re: Hi ! We want to steal everything...

        That works for multinationals like Open AI etc and that's about it.

  3. Jason Bloomberg Silver badge

    Not since Ringo Starr married Yoko Ono have I been so surprised

    All my lies and bullshit are already Public Domain so OpenAI are free to take them and use them.

    I expect they already have.

    1. Jim Mitchell
      Unhappy

      Re: Not since Ringo Starr married Yoko Ono have I been so surprised

      You are causing me cognitive dissonance. I'll upvote up I don't like it since you've also polluted my own knowledge of the world.

  4. Howard Sway Silver badge

    wants the US government to stop foreign countries from trying to enforce copyright rules against it

    They obviously believe that they've got a useful idiot in the White House now, and can realise their greediest dreams of plunder by appealing to his disdain for the rest of the world. It could however result in the complete breakdown of the entire international copyright system, which other US companies might want to consider before their intellectual property gets nicked in retaliation.

    1. kmorwath

      Re: stop foreign countries from trying to enforce copyright rules against it

      I think I will start to ignore copyright on any US-made software. That's what EU should rule, that copyright on US software is void and everybody is free to use them as they like without paying a dime, and no legal action can be taken against them. Let's see how one of the biggest investor in OpenAI takes that...

      1. Anonymous Coward
        Anonymous Coward

        Re: stop foreign countries from trying to enforce copyright rules against it

        So true. The glowing Orange admin's thuggery and temper tantrums make the PRC look like a humanistic Freedom fighter by comparison!

    2. heyrick Silver badge

      wants the US government to stop foreign countries from trying to enforce copyright rules against it

      "by appealing to his disdain for the rest of the world"

      They might try, but the rest of the world has noticed. Some are moving to sideline their dependence on the US, others are waiting to move into the spaces left in their absence. Either way, I can't help but feel that the second Trump presidency will go down as the biggest self-own in America's ~250 years of being.

      Which is a long winded way of saying that in time the response from the rest of the world may well be "go whistle".

      1. Jason Bloomberg Silver badge

        The rest of the world has noticed

        I did love this from Howard Lutnick, US commerce secretary - "The president has made it crystal clear that he finds this tit-for-tat really abusive and aggravating. He wants these countries to respect him. And all this showed you is that Europe and Canada do not respect Donald Trump".

        "Wants respect" - He can go fuck himself. Respect is earned. It is not given in response to economic terrorism and threats against a nation's sovereignty.

        Good to hear it's aggravating. Thanks for letting us know what's getting under the man-baby's skin.

        1. Adair Silver badge

          Re: The rest of the world has noticed

          'Respect' in the minds of the Putin's, Trumps, and Vances of this world actually translates as 'fear'.

          They don't want to be respected, they want to be feared, probably because they are fearful people themselves and their learned defence is generating fear in others.

          It's basic gangsterism, the schoolyard bully writ large—a frightened coward at heart.

        2. Dan 55 Silver badge

          Re: The rest of the world has noticed

          If he doesn't want to receive tit then he shouldn't be giving out tat in the first place.

          Or something like that.

          1. collinsl Silver badge
            Childcatcher

            Re: The rest of the world has noticed

            Shouldn't that be the other way around oh wait let's not go there oh god oh god my brain quick bring the eyebleach!

    3. DS999 Silver badge

      Re: wants the US government to stop foreign countries from trying to enforce copyright rules

      Plus appealing the useful idiot's nationalism by claiming they'll be at a disadvantage to China.

      Not sure how they can claim that respecting copyright makes AI impossible. I haven't read every copyrighted work in the western world and I think I do a pretty good job of pretending I'm intelligent. Surely if I can do it so can ChatGPT!

  5. Anonymous Coward
    Anonymous Coward

    While we're at it

    We would also like to abolish the patent office and all forms of currency because some flavor of the month tech firm says so.

    1. Long John Silver Bronze badge
      Pirate

      Re: While we're at it

      The patent office? Excellent.

      There are other ways for giving credit for invention. Also, just think of the number of lawyers to dispatch for useful labour, such as digging ditches.

      1. Jonathan Richards 1 Silver badge

        Re: While we're at it

        Patents, explicitly so in the US, are not applied for and granted in order to gain or give "credit". The time-limited patent monopoly is quid pro quo, the quo being full disclosure of the invention in the patent. This promotes the progress of technology, which is what the patent-granting State is interested in.

        1. Long John Silver Bronze badge
          Pirate

          Re: While we're at it

          How do exclusive rights to exploiting an idea promote innovation?

          1. doublelayer Silver badge

            Re: While we're at it

            Reason 1: If I have a great new idea, but to figure out if it works will take three years of my time and probably hiring some people, but I don't have any money, what can I do? I can talk about this idea everywhere in the hope that someone else says it sounds great, pays me lots of money with which I hire assistants and buy materials, I spend the years trying it, and hopefully it works. If I don't find someone who wants to give me that support with no strings attached, then I write it down, work on it on the weekends outside of my work, and maybe it doesn't get invented. If, on the other hand, I can tell the people providing the money that, if it is successful, they can get their money back, it becomes easier to get that support.

            Reason 2: Great, I've spent my time and made something new. Well, if it wasn't protected, but I want to get a reward for my work, what do I do? Correct answer: I hide everything I can about how this works so that, until you figure it out manually, I'm the only one that can provide it. If I'm good at that, maybe you never figure out how it works and you always have to come to me to get it. What does that do for someone else who has an idea for how to improve it or would have had one if they knew how it works? They're out of luck and the rest of us don't benefit from their idea. If I get the patent, I publish it. Everyone can read how it's done, and after a few years, everyone can do it using my plans. If someone comes up with an improvement, they can build that around my thing before my patent expires, they could bring it to me and add it in, or they could wait for the patent to expire and do the whole thing.

            I somehow think you were well aware of both of these things before I wrote this comment.

      2. doublelayer Silver badge

        Re: While we're at it

        The fact that you think patents' only value is assigning credit speaks volumes for why you're constantly making bad points about patents and copyright. Both exist to make it possible for someone to do the substantial work behind them and benefit from having done so, enabling people who don't have independent wealth to do this. Until you recognize that, you're never going to prove why they're bad or suggest improvements. Both systems have a lot of downsides, but while you continue to campaign for them to be shut down and worry only about credit, you will always be arguing against something you don't understand and making this obvious to most of those you're trying to convince.

        1. Anonymous Coward
          Anonymous Coward

          Re: While we're at it

          In relation to AI training (wholesale industrial-scale copyright violations) I agree ... but I'm also reminded of individuals being pestered by Sony DRM, big Corps ruthlessly suing little folks over nothing "infringements" (so-called), forcing them to relinquish internet domain names to dig huge moats around "brand" name ownership (eg. SouthButt), and eBook ownership turning into more of a temporary "license", with said books disappearing from one's device after they'd been fully paid for.

          Seems to me there are multiple sides to this here coin, beyond whether copyright or infringement, there's also whether "small" individuals can be constantly pushed around, shoved and abused by big corps, or not.

          1. doublelayer Silver badge

            Re: While we're at it

            Several of these things are the downsides I was referring to. Copyright was not intended to and does not give the people who own it a right to abuse customers. Several of the things you list or that I would add are already illegal, but the responsible bodies are not enforcing those laws. Some of them are not, and I'd like them to be. For example, if you're licensing something for a short time after which the seller is within their rights to cancel it, I think they should be forced to state that unambiguously at the point of sale rather than burying it in a contract. However, that has nothing to do with copyright; I could bury something in any other contract equally legally, and I should be prevented from doing so with equal vehemence.

            That is not what our old friend Captain Silver was on about. I'm sure they would be happy to complain about those things and would do so to convince people, but they would also complain about completely normal uses of these things because they believe that all intellectual work should be free of restrictions and of charge. They either do not understand or do not care what effect that would have on people who created it.

      3. Anonymous Coward
        Anonymous Coward

        Re: While we're at it

        It seems we have at least 7 lawyers reading until now.

  6. Baird34

    To copyright or not to copyright

    Funny, one side of corporate America has used copyright laws to maintain hegemony and crush innovation. Now another corporate American is wanting to bypass copyright laws saying it crushes innovation. I suppose this guy will copyright the stolen copyright material though. I feel they make it up as they go along.

    Think of the bosses though, if they have to abide by laws then they'd be out of a job, then where would we be? We know where they'd be, jobless or in prison, but where would we be?

    1. heyrick Silver badge

      Re: To copyright or not to copyright

      "I suppose this guy will copyright the stolen copyright material though."

      Isn't that why all the butthurt about DeepSeek apparently training itself off ChatGPT?

      "I feel they make it up as they go along."

      All your base are belong to us.

      1. O'Reg Inalsin

        Re: To copyright or not to copyright

        Without delay all AI output should be public domain. OpenAI will get paid through their account subscriptions, that's enough.

        AI output already cannot be copyrighted. And any training done on that data is adding something "orginal" - according to OpenAI et al.

        In general the US economy is getting strangled to death by troll-booths on every road.

  7. Anonymous Coward
    Anonymous Coward

    Yeee Ha!

    Welcome to the new Wild West.! Grab as much as you can, to hell with the law, stake your claim, then mine and exploit the hell out of it. MAGA!

    1. Phil O'Sophical Silver badge

      Re: Yeee Ha!

      Making America Greedy Again?

      1. breakfast Silver badge

        Re: Yeee Ha!

        Make America Genocidal Again?

  8. Anonymous Coward
    Anonymous Coward

    strategy to let America exert control over its allies

    After all, what value is an ally if you can't exert control over it?

    1. HorseflySteve

      Re: strategy to let America exert control over its allies

      The point of having allies is that you hope they'll come to your aid when you need help.

      Abusing your allies is a good way of changing their perspective on that arrangement...

  9. Jou (Mxyzptlk) Silver badge

    Scrape everything won't improve AI

    See, you got to check what you feed to your child. If you feed it every crap you find you will only get crap.

  10. Long John Silver Bronze badge
    Pirate

    Global demise of copyright is an inevitability - explain why not, if you can

    I hold no brief for any of the AI production companies. However, should US AI developers be constrained to exist in the pretty walled garden provided by copyright, the fruit on their vines shall wither. The technology could thrive elsewhere. Regardless of whether AI lives up fully to the expectations of entrepreneurs, some aspects of its use evidently shall: one such is as an educational and research tool offering an annotated collection of human culture together with means to service enquiries by drawing from across literature databases; copies of this tool, some perhaps devoted to highly specific domains, can sit on individual computational devices, on more powerful institutional computers, and in 'the cloud'. Anyone imagining AI to be containable within the traditional commercial environment is deluded.

    Arising from this is an irony. For instance, take Microsoft (MS), which is in the AI game. It would be difficult for MS to claim high moral ground when condemning 'free loaders' who copy MS software and use it without a licence. Private individuals do this with no prospect of comeback, could not corporate entities independent of MS demand greater 'fair use' of MS products?

    Gradually, but at quickening pace, the intellectually stultifying atmosphere arising from the specious concept of 'intellectual property' is being cleansed by multiple individual acts of disobedience to outmoded law. There is a shift in trade from selling digital end-products (on a rigged monopolist market) to seeking patronage for use of skills (individual, and cooperative) in making (and for distributing with added value to zero) digital artefacts. A reckoning awaits for avaricious folk with a deep sense of 'entitlement'. The irony is compounded by the USA, the arch-rentier nation, housing the most powerful of disobedient entities.

    1. kmorwath

      Re: Global demise of copyright is an inevitability - explain why not, if you can

      You mean patronage like two thousands years ago, when skilled people had to lick feet of oligarchs to sell them their works, at a price the oligarchs fixed? No, thank you. That's not freedom. It's serfdom.

  11. mark l 2 Silver badge

    "Paying a truly fair fee to all authors – whose copyrighted content has already been or will be used to train powerful LLM models that are eventually aimed at competing with those authors – will probably be economically unviable," he claimed, as AI vendors "will never make profits."

    Awe boo hoo, the poor AI companies won't be able to make a profit unless they steal everyones copyright works. That just shows the business model is broken then.

    Is ChatGPT that far removed from services such as Megaupload and Napster which also relied on other peoples IP to make money without compensating them?

    1. Anonymous Coward
      Anonymous Coward

      Sam Altman's the new Kim Dotcom! (they even look the same from a distance)

      1. Jou (Mxyzptlk) Silver badge

        So Sam Altman is 2 meters (6ft 6.5 inch), >140 kg (over 300 lb), usually without hair? Oh wait, you come from the future, where Sam put himself under the stretcher to make him taller. Bigger did not require special treatment.

  12. IGotOut Silver badge

    I've bought a gun, a getaway car and have a crew ...

    ...I should be allowed to rob banks, otherwise how am I allowed to profit?

    I've never come across a bunch of people more deserving of being put up against a wall and shot than this lot.

    Bring on the revolution.

  13. Anonymous Coward
    Anonymous Coward

    Checks notes. It's not April.

    Checks calendar. No, definitely not April.

    Checks internet for todays date. It's the middle of March. Which means it's not April or the beginning of said month.

    What in the actual Shatners Bassoon is this?

    I'm off to LLM some films. I mean if copyright is no more then that shouldn't be a problem right?

    In all seriousness I get this horrible feeling that this AI push is to get people off the internet and to a system where you search and the AI gives you the answer rather than reading websites for convenience. That way the powers that be have yet more control over opinions and thought.

  14. alcomatt

    Oh, the creep is back...

  15. ComputerSays_noAbsolutelyNo Silver badge
    Joke

    Democratic democratic

    I guess the proposal lands right in the bin, cos it proposes democratic things.

    If they do a sed s/democratic/republican/g the proposal will become law.

  16. RockBurner

    "Office of Science and Technology (OSTP)"

    What does the "P" stand for?

    1. seven of five Silver badge

      Brains. Err...

    2. theDeathOfRats

      What does the "P" stand for?

      Porridge?

  17. localzuk

    Enforcing copyright is a global issue

    If one country stops doing it, other countries follow suit and the conventions that protect internationally produced works fall apart.

    So, the end result is that those movies? That software? Those books? Produced in the USA? Why would other countries respect their copyrights if the USA won't respect the copyright of the works made elsewhere?

    The USA makes hundreds of billions of dollars a year off the creative industries. Ignoring copyright in this manner risks ruining these industries in favour of a snake oil salesman.

  18. frankvw Bronze badge

    New technology calls for new a new copyright paradigm

    L'Histoire se répète...

    Before 1990 the media industry was content in its copyright position, since it had the monopoly on multiplication facilities and therefore on media distribution. Very few people had a record pressing plant in their basements or a CD multiplication facility in their attics, and OCR to scan and digitize books was also still very much in its infancy which meant a photocopier was the only way to duplicate books.

    Then along came the CD-ROM drive, compressible media data formats (MP3 being the first mainstream variety of those), cheap document scanners, accurate OCR software and sufficient Internet bandwidth to distribute digital media. Suddenly the genie was out of the bottle and the rules had changed forever.

    The industry, predictably, went up in flames, screamed bladdy murder and dug in their heels to resist the change that had already taken place. It didn't help one bit. Then some media houses decided to go with the change, embrace it, and ride the wave. Digital distribution, DRM that actually didn't get in the way too much of enjoying legally bought media, and different business models all took some time to mature, but here we are. Those who adapted survived, those who didn't, didn't. Darwin would have nodded approvingly.

    Now we see the same. There's no really effective way to keep any content whatsoever from being accessed by an AI. If you and I can obtain a copy of something (and we can) then so can an AI. And that means, simply put, that our ideas about copyright and the associated legislation have to change to adapt to the new situation. It's that or be left behind.

    First off, what constitutes copyright? I'm not a lawyer (heaven forbid) but it seems to me as a layman that copyright focuses on the right to multiply and distribute. If I publish a copy of a book that I do not have the rights to, I commit a clear copyright violation.

    But then there are different scenario's. In the mid-1990s I wanted to learn Perl. I borrowed the O'Reilly Camel Book from a colleague, read it, and after about a week I returned it with thanks and a cup of coffee. Then I proceeded to utilize that knowledge in my work, writing Perl for a living. I did not pay O'Reilly for the right to read their book (my mate had already done that) and yet I used the information therein to my commercial advantage.

    Is that also a copyright violation? Some would argue yes, some would say maybe not, but all would probably agree that enforcing copyrights in this case is impossible - after all, who would police the lending of a book to the guy sitting at the desk across from you?

    But for the sake of argument let's say this is a copyright violation. So I'm a good boy and I don't borrow the book. But I'm a little broke that month, so I don't buy it, either. Instead I ask my colleague, who has paid for the book and read it, how to solve certain problems in Perl. That means my colleague now disseminates the information in the book, or at least gives me commercially valuable advise on the basis of what he has read in the book. Is that a copyright violation?

    I ask because the latter is exactly what AI is doing right now.

    Clearly we need a new copyright paradigm. In today's world the old one no longer works.

    1. Burgha2

      Re: New technology calls for new a new copyright paradigm

      "Clearly we need a new copyright paradigm. In today's world the old one no longer works."

      Ok. The AI companies use that vast computing power to track which information they assimilate and every time they use a bit of it they pay the original author for it.

      I mean it's pretty inconvenient (and expensive) for me to remember when to pay for a software subscription, but I have to do that.

      1. frankvw Bronze badge

        Re: New technology calls for new a new copyright paradigm

        "Ok. The AI companies use that vast computing power to track which information they assimilate and every time they use a bit of it they pay the original author for it."

        That is hardly feasible. If I borrow a book and read it, the contents become an integral part of my knowledge base. How should it be determined when I "use a bit of it"? Every time I write a bit of code I use the sum and total of all books on computing, software development, algorithm design and what not. Knowledge, including the contents of books, visual media, recorded lectures, news articles and what have you is not kept or used separately. It all becomes part of a greater whole and no amount of vast computing power can treat every bit of information separately. That's simply not how it works.

        1. Scotech

          Re: New technology calls for new a new copyright paradigm

          Wow. Well... I feel a rant coming on here...

          While that's a flawed analogy, as the knowledge a person leans from these sources isn't equitable to the slight contributions to the vector weightings in a LLM, the end result is the same. Once the model has been trained, it's a black box - there's no way to reverse engineer it to establish exactly what data it was trained on, or to establish the relative contribution of each item in the training materials to the output, given the weightings are still combined with a random factor in order to produce the actual output seen.

          Now in orchestrated outputs using approaches like RAG or deep reasoning, its a different story - thses approaches only use the LLM to parse and present the relevant source data like a search engine summary, and so they're actually designed to be able to trace their sources. But the thing people who aren't following the tech closely often miss is that orchestrated AI services mix both deterministic approaches and non-deterministic ones in order to produce their end-results, and despite appearances, there's no actual reasoning going on under the hood (as a human might do) but also, it's NOT simply just a search engine or database either, though it may use those as orchestrated components.

          The LLM is the part that makes this whole thing unique, and is the part under contention. It's simply not possible though to say what training sources 'most' contributed to any given output - the output was the result of the weightings as a whole, which are derived from the training process and the unique combination of inputs into that process, which is often non-deterministic in and of itself. As such, its not only impossible to trace a result back to one or more specific training inputs, the question itself is fundamentally incorrect, as it presupposes that certain training materials were more or less responsible for a given output.

          The OP was correct in that a new approach to copyright is necessary here, but it's an easy fix - just ensure that content is licensed separately for LLM ingestion. Done. This is an approach that's already been in use for centuries, and one that we're all perfectly familiar with. Ever seen those copyright notices on movies stating that the product is licensed for home use only? If you want to air it in a public setting, you'll need to purchase a separate commercial license for that, which will generally cost a whole lot more

          There's absolutely nothing stopping content creators from doing exactly the same thing right now, with regards to LLMs. What it does require, however, is that you gate your content behind acceptance of the license terms. If you air your new blockbuster on the big screens in Times Square, you can't then demand every passer-by pay you for the privilege of having seen it. Similarly, if you post content in a publicly accessible place on the web, don't go crying when it gets scraped. Put it behind a sign-in, and require people registering accounts to agree to license terms restricting how they can use the content.

          The trouble is, we've had a good few years of people being used to the internet being a kind of public marketplace where they can have all their creations out on display for anyone passing by to see up close. Now though, they've got people using their right to stand on the pavement to take pictures of their wares and use it to churn out low-grade pastiches at volume in a sweatshop, which they're then selling at a discount right by the market entrance. So the only option is to get a shop. Yeah, it's more overhead. Yeah, it means you have to work harder to entice passers by to actually step inside. But it gives you control over who comes inside, and what they can do there, and let's you kick people out if they misbehave.

          Ok... Rant almost over! But the final point I want to make is that the beauty of this approach is that not only does it not require any changes in the law to work, it also protects against the kind of stuff OpenAI are trying to pull here, in that they can get all the exemptions they like, but if you control the access to your stuff, instead of just leaving it out on the open web for all and sundry to find, then the only way they can access it is on your terms. And that goes for most of big tech. The past 50 years has been an accelerating process of technology opening access to data, but at a rapidly increasing cost to what we used to consider fundamental rights. Perhaps over the next 50 years, we can begin to rein that in and find a better balance between openness and privacy? We all need to relearn the value of having barriers and gatekeepers in place, and stop being so cavalier about throwing everything we have out there for the whole world to see, and to steal.

    2. doublelayer Silver badge

      Re: New technology calls for new a new copyright paradigm

      Before 1990, the individual computer users were content in their security position, since malware was crude and limited. Always-on internet access had not spread very far, usable encryption software was limited in scope, and basically no home user had a public mechanism for a random attacker to send them malicious instructions. Then along came easy internet access, email, and AES, and people to put them together and make ransomware. Suddenly the genie was out of the bottle and the rules had changed forever. Users, predictably, went up in flames, screamed bladdy murder and dug in their heels to resist having their disks encrypted and held to ransom. It didn't help one bit. Those who failed to protect themselves got encrypted. Those who failed to back up lost their data or had to pay. Darwin would have nodded approvingly.

      Now we see the same. There's no really effective way to eliminate ransomware entirely. And that means, simply put, that our ideas about whether it's legal to ransom and the associated legislation have to change to adapt to the new situation. It's that or be left behind.

      Does that argument make any sense to you? Just because something that was and still is illegal is easier, does that mean we have to change the law to make it legal? Maybe we need new ways of enforcing that law to deal with more widespread instances, and if we decide we no longer approve of it, then we can repeal it. There are some people who do; I've seen the arguments that anyone who gets hit with ransomware successfully and doesn't have backups was stupid at least twice and thus deserves anything they get, often from people who put as much thought into their moral philosophy as the fanatic pirates do to their philosophy about what they want is a moral imperative. In reality, the likely change to the law is making paying ransoms illegal when it formerly wasn't to try even harder to eliminate as much of it as we can.

      1. frankvw Bronze badge

        Re: New technology calls for new a new copyright paradigm

        "Does that argument make any sense to you?"

        No, because your comparison is flawed. Absorbing information which then becomes an inseparable and indistinguishable part of a greater whole is an entirely different matter from malware.

        1. doublelayer Silver badge

          Re: New technology calls for new a new copyright paradigm

          I don't think the comparison is the problem. The problem is that, no matter what the activity is, we decided that it was illegal. Now, it is easier to do it. Your comment says that we need to reassess whether it is legal because of that, when the two things have no dependence on one another. We could decide today that it's a bad idea and nothing gets copyright protection anymore, but for the same reasons that we could have done so in 1963. That it is easier is not even a bad argument for doing it; it's no argument at all.

          You've decided to try to pull in the irrelevant additional argument about whether using a work as training data to an LLM is the same as a human reading it. It's not, by the way. We could have that discussion in a separate thread if you like. It isn't relevant here because it has nothing to do with how easy it is to do, and that was the entire point of your history repeating itself argument. I have a feeling you are bringing it up now because it is hard to defend your original point, but that does not make the two arguments connected.

    3. Falmari Silver badge

      Re: New technology calls for new a new copyright paradigm

      @frankvw "I ask because the latter is exactly what AI is doing right now."

      Rubbish, don't anthropomorphise AI, only 'the problem (solve a problem in Perl) is the same.

      But what really pisses me off is when ever anyone claims learning by reading a book is the same as AI learning with training data*. Because every bloody time the wankers claim if reading a book does not infringe then AI does not. Conveniently ignoring that they are not comparing like to like just like you as none your reading examples breach copyright. Borrowing a book from collage/library/friend does not infringe copyright no copy was made.

      But training data is a copy the AI companies don't buy a licensed copy that is copyright infringement. Hell replace AI with a person it would sill be copyright infringement.

      * Which is what you have done in a roundabout way. Most people are much more direct by claiming AI learning is the same as learning by reading a book.

  19. Pussifer
    Devil

    Sounds like MS had a hand in writing that.

    To me, a lot of those words from OpenAI sounded like people at Microsoft wrote them.

    ----

    The Commentard that changes his handle on a whim, just like JD Vance changes his actual name.

  20. Rosie Davies

    There's Optimism

    I do love that he can say "Tier I countries (US allies)" and believe that's still a thing. US optimism seems to have always been admirably resilient.

    Rosie

  21. Anonymous Coward
    Anonymous Coward

    OpenAI is a bubble waiting to burst. It loses money on every query made to its servers (even when the client is paying a subscription), and has no economy of scale. This means the more "successful" it is in attracting new users the more money it loses (as in $billions per year).

    If the US government starts demanding other countries make changes to their copyright laws then I suggest we start by removing the clauses that criminalise working around soft/hardware protection mechanisms (as most obviously seen in printers, but also in an increasing number of cars and all sorts of machinery/equipment), provided it is for purposes of maintenance, repair, or interoperability .

  22. WigglesVonSpiggles

    OpenAI welfare

    My favourite comment from the article:

    “Uptake in federal departments and agencies remains "unacceptably low," the Microsoft-championed lab says“

    Translation: “We demand you force departments and agencies to use our bullshit generators and pay us money for the privilege! It’s our god-given right as your new tech bro overlords!”

    The absurd sense of self-importance and entitlement in their requests… might well find a receptive audience now that the US is in the midst of a techno-facism takeover.

  23. Stevie

    Bah!

    If AI trainers want to use other people's copyrighted works to make their vile software work, perhaps they could just do what everyone else does and apply to license said content?

  24. Nematode Bronze badge

    Block AI crawlers?

    Shurely blocking AI crawlers shouldn't be too hard? I sense a commercial opportunity here, complete with database of AI IPs.

    1. Anonymous Coward
      Anonymous Coward

      Re: Block AI crawlers?

      From my strictly unscientific survey, most of them seem to originate from a plethora of AWS instances.

      Can we block Amazon from the world?

      Actually, I'm ok with the crawlers as long as they target anyone working for Trump. Expose all their dirty little secrets including the toe sucking.

  25. spold Silver badge

    Well same as you would do....

    ...if you are submitting a CV for a job, when it is going to be screened electronically, you cut and paste all the requirements into it in small type white on white (so the screening software scores it but it is invisible) - so similarly, lots of sites who have content that could be scraped should have similar content that is not visible and is maliciously evil garbage to bugger up AI training.... just sayin'.

  26. Barrie Shepherd

    Perhaps they could start by sorting out the US's stupid Patent system.

  27. frankvw Bronze badge
    Boffin

    Is deep learning a copyright violation?

    "I don't think the comparison is the problem. The problem is that, no matter what the activity is, we decided that it was illegal. Now, it is easier to do it."

    Fair enough, you do have a point there, and +1 for that.

    "You've decided to try to pull in the irrelevant additional argument about whether using a work as training data to an LLM is the same as a human reading it. It's not, by the way. We could have that discussion in a separate thread if you like."

    New thread started. So here goes, starting with a look at the definition of copyright. Collins Dictionary states that "If someone has copyright on a piece of writing or music, it is illegal to reproduce or perform it without their permission." The Brittannica Dictionary defines it as " the legal right to be the only one to reproduce, publish, and sell a book, musical recording, etc., for a certain period of time". However, Brittannica's encyclopedic article goes further, expanding these definitions with: "Now commonly subsumed under the broader category of legal regulations known as intellectual-property law, copyright is designed primarily to protect an artist, a publisher, or another owner against specific unauthorized uses of his work."

    The term "unauthorized use" is what I believe the current discussion hinges on.

    i

    The way an AI processes information fed into it mimics the way humans do it. The information becomes an integral and inseparable part of a body of knowledge, identity and decision making (simulated in the case of an AI but based on the same principles as the organic neural networks that AI is modeled after) and does not continue to exist within that AI's memory as a separate "work" that the AI then redistributes or publishes in whole or in part.

    I myself in the past have borrowed books on programming, learned their contents and then proceeded to teach programming techniques directly based on the contents of the book. I have done that for a living. Is this "unauthorized use" in terms of existing copyright and IP protection legislation? Currently it doesn't seem to be considered as such in the case of humans. But if an AI trains on information in manner similar to the way humans do it, and they process and then proceed to use that information in similar ways, then IP legislation should be applied in similar ways, too. THAT is what I mean when I say that current legislation is made outdated by new technology and should therefore be adapted.

    You claim that the way I used this book is different from the way an AI trains on it. Explain that to me, if you will. Because I can't see the difference.

    1. doublelayer Silver badge

      Re: Is deep learning a copyright violation?

      It is popular among LMM creators and fans to try to make an equivalence between training and human learning. The word was originally chosen to make this association because metaphors are particularly common in this area. That does not make them similar. But let's take this from the top of your comment.

      Copyright has involved several restrictions which have been enforced long before computers. As written today, having a copyright on a work means that you're the only one who gets to sell it, it is illegal to use it without having a legal copy (purchased, borrowed, or provided to you by someone who has permission to do so count, going to get an unauthorized copy does not count), you can attach a license to use of that work (E.G. open source and every other software license), and that these rights apply to substantial portions of the work, not just the entire thing as a unit. It's not just being the only person to sell it.

      You maintain that an LMM does the same thing as human memory, but this is not correct. "The way an AI processes information fed into it mimics the way humans do it.", for example, is just wrong. Neural networks do not mimic human neurons; we don't have that good of an understanding of how specific human neurons work. LLM memory uses specifically divided tokens. Human brains do not; memories are significantly longer. LMM memory does not have the ability to generalize further than those tokens, whereas human brains can. Their difficulty doing mathematics without writing programs and having them executed demonstrates this. There are lots of differences between this software and brains, which is perfectly natural; as cool as our brains are, and as great an AI as we could create if we could actually model them, our knowledge of neuroscience and raw computation power is insufficient to model the entire thing. The LMMs we produce do the task assigned to them, evidently with sufficient accuracy for the people who sell and use them though not for me, so nothing says it has to work like a brain does. Nothing, that is, except for the argument about why their use of copyrighted data is valid.

      Another problem is your contention about what happens to the data after it is ingested. "The information becomes an integral and inseparable part of a body of knowledge, identity and decision making (simulated in the case of an AI but based on the same principles as the organic neural networks that AI is modeled after) and does not continue to exist within that AI's memory as a separate "work" that the AI then redistributes or publishes in whole or in part." None of those parts is true. The tokens are linked as probabilities, meaning that while some of them are entirely discarded, others are still present in their original form. LMMs do quote their training data frequently, sometimes when they're asked, sometimes by mistake, and sometimes when answering unusual queries that don't have a lot of associations in their trained state. A human brain might do the same. If a human uses their brain to quote copyrighted information for an audience, that's not allowed. The fact that an LMM is possibly doing this unintentionally doesn't change the result.

      On the way that you learned programming from a book, you are allowed to teach from a book. If you gave out copies to the students without getting permission, that would be a copyright issue, but if you just read it, learned it, and taught based on your learning, then you are not violating copyright. There are three differences between you and the LMM here:

      1. You presumably obtained an authorized copy before reading it. LMM authors could have done that. The many cases show how often they chose to obtain illegal copies, either deliberately or by scraping sites where someone else did. Before any training takes place, that's already a violation. In many cases, this is the only violation being litigated, meaning that that whether you agree or disagree with me about the differences between an LMM and a brain, it doesn't matter because they haven't bothered to train it yet. At this point, they just have a copy in their big storage array of data they plan to retrieve to send to their training later, which violates the license attached to most books which often says that it is "not to be stored in a retrieval system". This means that, even if they did go and buy a book off the shelf, that would probably be insufficient for all the things they expect to do with it and they would need a special license, but they haven't even tried that. They have proven this by obtaining some licenses to datasets, for example paying Reddit for copies of their user posts. I do not have any objection to them training on data like that that they have permission to use (if you don't want Reddit to be able to sell your posts, read the Reddit terms and conditions and maybe don't post there). I do have a problem with data that either doesn't have that open a license or where the use is explicitly disallowed.

      2. You probably didn't just learn to program from that book. After reading it, you wrote some code of your own, adding extra information to what you were teaching later. You may have read more sources as well. The LMM doesn't do that. It is incapable of writing code and watching the result, understanding whether what was intended or not. An LMM can write code, but it has never had the experience of debugging to a goal. It is as content (to anthropomorphize it a bit too much but I didn't start it) to write code that doesn't compile as to write something perfect, and it doesn't adhere to the letter or the spirit of the specification unless by chance. While your teaching is based on an actual goal, the LMM's is just based on what teaching looks most like the text it already saw. In fact, the code it writes is mostly based on other code it saw, not the content of the textbook. If the textbook says never to do something because it's a readability disaster but a lot of the code in its training data does that anyway, it will very likely use that structure anyway. A normal mind could easily mentally reformat that the way the book suggests, the way that I've seen lots of code that uses bad structure or expressions but can still glean what I need from it and not write that way because I remember cautions from others.

      3. If your teaching wasn't based on anything but the book, then there is a chance it could be viewed as a copyright violation. If I had to teach about something I knew nothing about and I decided that the easiest way for me to do the job assigned would be to get a textbook and just parrot it back to my students, this could be interpreted as a performance of that book. I may have summarized and paraphrased the contents, but that is not enough to prevent the violation. In practice, nothing would really happen because the copyright holder wouldn't know I was doing it, and for that matter neither would my students. They'd probably both think that I was just a bad teacher. Also, nobody is very interested in trying to prove that I was doing that specifically instead of being one of the many other types of bad teacher. This is the hardest argument to make about an LMM, which is probably doing that to lots of sources instead of one. I think it is still correct and a viable complaint to be made, but I think the arguments about use without permission and direct quoting are more convincing.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like