back to article Microsoft: Copyright law didn't stop the VCR and shouldn't stop the LLM

Microsoft is coming out swinging over claims by the New York Times that the Windows giant and OpenAI infringed copyright by using its articles to build ChatGPT and other models. In yesterday's filing [PDF], Microsoft's lawyers recall the early 1980s efforts of the Motion Picture Association to stifle the growth of VCR …

  1. Mike 137 Silver badge

    "or the player piano"

    Please note M$: piano roll manufacturers were eventually obliged to pay royalties.

    1. Lurko

      Re: "or the player piano"

      Fair enough.

      But this should also mean that the news agencies will have to recognise as creators and then pay those who actually make the news. All most news "sources" do is collate and re-report around an event. Sometimes there's a bit of analysis, but usually not. And for bread-and-butter news it's pretty common that all they do is repeat what the police or courts have released, perhaps with a few more flowery words (or even use a company press release).

      If my deeds are helping fill the pages of "Before the magistrates", then I should be paid. Without me there would be no content, they'd have no business.

      1. anonymous boring coward Silver badge

        Re: "or the player piano"

        In real newspapers, from days gone by, there was a lot of analysis and expansion on topics. I guess all that’s paywalled now.

        Most youngsters have never read a real article. It’s all tablodified sh*t now.

  2. abend0c4 Silver badge

    Mere snippets

    Does that mean if I install an unlicensed copy of Windows and remove the preponderant mass of unnecessary bloat, I'm not guilty of copyright violation? If so, that, too, is a huge untapped business opportunity.

    1. Anonymous Coward
      Anonymous Coward

      Re: Mere snippets

      What's more, it would actually run faster and be much, MUCH smaller.

    2. Roland6 Silver badge

      Re: Mere snippets

      Need to get a LLM trainer to read your existing Windows install then use the resulting LLM to generate a Windows install on your new system.

      The you will be able to sell your LLM to anyone who wants to solve the problem your model has been trained to solve.

      That definitely, according to Microsoft’s current lawyers, won’t infringe MS’s copyright…

  3. heyrick Silver badge

    What an absolute joke of an equivalence

    VCRs were personal items that people used to record stuff, and no, copyright law didn't stop the VCR. Though in some countries blank media had a levy to compensate rights owners regardless of what the media was actually going to be used for.

    Fast forward a couple of decades to the DVD era. It's now possible to make good quality copies of DVDs, and at scale too. You know what? Copyright law had quite a lot to say about people that did so.

    So conflating old analogue tech mostly used by individuals/families with a modern corporate garbage spewer that pilfers other people's content is perhaps the dumbest argument I've yet heard regarding AI.

    1. Woodnag

      Re: What an absolute joke of an equivalence

      Zactly.

      VCRs were largely used to time shift programs for personal use, which is why they passed the court tests.

      If a VCR was used to make a new film for commercial use from 100s of snippets of existing films (analogy to LLMs), that would be illegal.

      1. HuBo Silver badge
        Gimp

        Re: What an absolute joke of an equivalence

        Clearly, the more acurate analogy (rather than VCRs) is to view Microsoft as Kim Dotcom, and OpenAI as Megaupload Ltd (or vice-versa).

      2. katrinab Silver badge

        Re: What an absolute joke of an equivalence

        And also, you could buy video tapes of things from the copyright holder.

    2. MJI Silver badge

      The Levy

      And this is why I am happy to download stuff.

      Why should I pay a levy on blank DVDs, blank video tapes?

      More to the point why should I pay a levy for a blank MiniDV tape which goes into my HCR9?

      Why should I pay a levy for the backups of my OWN PHOTOS?

      Why should I pay a levy for burning my OWN VIDEOS to DVD or BD?

      1. anonymous boring coward Silver badge

        Re: The Levy

        I don’t think defending piracy was the point of the post you replied to?

        You do seem to have strong views on the subject, however.

        Now MS and the other LLM providers are defending piracy, of a kind.

        1. MJI Silver badge

          Re: The Levy

          No I object to paying levies on tapes for some theoretical pirating.

          I object to the dumbing down of kit to ruin copying.

          I was so lucky my edit VCR and portable VCR both worked on regenerate sync, rather than degrade it. then Sony were forced to rejig the format to degrade.

          Later on I captured all the originals to hard disk, and of course the levy crap applied to blank DVDs.

  4. Lee D Silver badge

    The VCR was a tool used to do something - and that something could be illegal or not depending on what the user did with it.

    LLMs are a tool used to do something - and that something could be illegal or not depending on what the user did with it.

    Not only are they the same in that respect, that argument actually means that you still can't use them illegally and still need to get consent for the data you're using, and can't just randomly spew out thousands of copies and sell/give them away without the original owners seeking action against you.

    This is a dumb analogy, and actually makes the argument fall against them even worse.

    1. Roland6 Silver badge

      You are overlooking. Key difference: to make a VCR you don’t to use copyrighted content. Okay I might use copy of say Pirates as part of my factory testing, but the receipient of the VCR would never know this. However, a LLM needs to be trained, ie. have “read” potentially copyrighted material.

      So for equivalence to the VCR, MS have to sell the tools (and only the tools) that allows someone to create their own LLMs, using content within their collection, which if for personal use is covered by current law, if for commercial use or sale then the laws require you to get a licence etc.

      Interestingly, this reminds me of compiler licences, where (back in the 1980’s) you had to check the licence as the basic licence typically allowed for development and private use of the output, but not for commercial exploitation ie. resale. The other licence trip up was the libraries, whether they could be included in a commercial distribution or not. I’ve not had reason to review the licences companies such as Microsoft attach to their compilers in recent years - would not be surprised if there are more favourable terms for software that’s intended to run on Azure or only available through the MS store.

  5. bronskimac

    The VCR analogy is spurious, although it does draw attention to the fact that they appear to be duplicating and distributing copyright material without legal authority. Perhaps a better argument in their defence would be that of "derivative works", based on, but substantially different from, the original article. The difference is the part where it gets tricky and ultimately may come to a jury or judge deciding if it is sufficient. Presumably publishers of newspapers, novels and media can implement licencing terms on their website content prohibiting future use in training models? Not a solution for those already ripped off by the LLM models, of course.

    1. katrinab Silver badge

      I would run with the argument that the training data is the source code. If I took the source code for Microsoft Office and recompiled it in a different way from how Microsoft's release team compiles it, it would still be copyright infringement.

      1. Graham Cobb

        I disagree that there is any analogy. Training data doesn't seem to have resemblance to source code. It is very similar to human learning.

        Imagine that there is a human with a perfect ("photographic") memory. The fact that you could ask that person to repeat "the sentence on page 75 of the book - the one which starts with 'Fred took down the picture...'" would not make that person's reading of, and learning from, the book anything other than fair use.

        1. Anonymous Coward
          Anonymous Coward

          > Imagine that there is a human with a perfect ("photographic") memory. The fact that you could ask that person to repeat "the sentence on page 75 of the book - the one which starts with 'Fred took down the picture...'" would not make that person's reading of, and learning from, the book anything other than fair use.

          That's an irrelevant analogy and argument, because neither corporations nor LLMs are human. They operate at a scale, speed, and lifetime that no single human could achieve, and that is why we have different laws for corporations.

        2. katrinab Silver badge

          No

          Ultimately LLMs are processed using the same boolean computer instructions that have been around since at least the PDP11, probably earlier. The training data is used to compile billions of IF statements that determine what the output is going to look like.

          I pick on the PDP11 because it was the first computer to run Unix. Most computers these days run some sort of Unix or Unix-like clone. The exception is those that run Windows, which is different, but not in a way that is relevant to this discussion.

          1. Graham Cobb

            What has the computer architecture got to do with it? Training data (and human mental models) are (both) DATA, not code! Sure, modern computers are all architecturally similar but there is no reason that you couldn't have an LLM built on an analogue computer design (or the wetware architecture used for the human brain). The processing architecture is irrelevant to copyright.

            1. anonymous boring coward Silver badge

              It’s to do with the distinction between humans being influenced, and computers harvesting with perfect memory. It a pretty massive difference.

  6. Mage Silver badge
    Coffee/keyboard

    VCR?

    Selling your VCR recordings was copyright violation, or giving away multiple copies. That was never permitted.

    MS Lawyers are deluded if they think a corporate scrape is comparable to VCR use on TV broadcasts which was mostly personal time-shifting. Anything else was quickly stamped on.

    1. MJI Silver badge

      Re: VCR?

      I sold a few, made a bit from it, not a lot as wear and tear was expensive. Head discs not cheap.

      Perfectly legal sales as well.

      I have had to pay a levy on those blanks to compensate some random company with nothing to do with my copies.

      1. anonymous boring coward Silver badge

        Re: VCR?

        Doubt that would have been legal. The levy was to compensate for piracy that was known to happen. It didn’t legalise the practice.

        Insurance doesn’t legalise theft, as an analogy.

        1. MJI Silver badge

          Re: VCR?

          Nope fully legal, sold quite a few at the time.

          Wear and tear was the biggest issue with a new head disc at £160 AFAIR for my edit deck in the early 90s.

  7. katrinab Silver badge
    FAIL

    Not the same thing at all

    Video tapes were a device that could be used to make and watch recordings from both legal and non-legal sources. If you set up a market stall selling video recording from non-legal sources, law enforcement would shut it down.

    ChatGPT / CoPilot / etc don't give you the choice of which training data to use. They are the market stall selling access to material from non-legal sources.

  8. IGotOut Silver badge

    Would anyone like to point out ..

    ...the other issue with VCRs (and audio cassettes for that matter).

    When you bought blank media, there was a levy to help (in theory) go to a fund to compensate for piracy.

    So how about MS etc, pay into a fund every time a query is run that uses stolen material?

    Yeah thought not

  9. Neoc

    Apples and oranges:

    The VCR is a tool that allows the making of recordings. Of itself, it breaches no copyrights or patents. Using a VCR to create recordings using copyrighted material for non-private use is a breach of copyright.

    The LLM trainer is a tool that allows the making of LLMs. Of itself, it breaches no copyrights or patents. Using a Trainer to create LLMs using copyrighted material for non-private use is a breach of copyright.

    Or to put it another way:

    The <tool> is a tool that allows the making of <product>. Of itself, it breaches no copyrights or patents. Using a <tools> to create <products> using copyrighted material for non-private use is a breach of copyright.

    Feel free to replace <tool> and <produc>t as you wish (e.g. "DVD Burner", "DVDs")

    1. Graham Cobb

      No, you are wrong. Using copyrighted material is not a breach of copyright. Only reproducing it is a breach of copyright. So "Using a Trainer to create LLMs using copyrighted material for non-private use" is not a breach of copyright. Just as, using a device to analyse some recorded music to discover the number of sharps and flats used in it (to take a silly example) is not a breach of copyright.

      1. Neoc

        Read again. The "for non-private use" is the kicker, as you yourself pointed out.

      2. anonymous boring coward Silver badge

        LLMs aren’t actual brains. They spout a lot of slurped material verbatim, but unattributed. Hence: copyright infringement.

  10. Long John Silver Bronze badge
    Pirate

    Leviathans fighting over scraps whilst the world goes to pot?

    The digital era, in so far as it impacts upon ordinary people (scathingly called 'consumers'), took off in the 80s. Digital technologies have ceased to be optional in the context of most activities, ranging from manufacture through to entertainment; their applications to warfare undoubtedly excite the reptilian brains of belligerent political 'leaders'. Applications at one time in the realm of Sci-fi lie on the horizon. Digital technology is pregnant with further possibilities.

    The latter 20th century, through to now, is the most intellectually challenging, hence stressful too, time experienced by ordinary people. Their leaders, sadly almost all of them far too 'representative' of the limited outlook and capacity for thought among electorates, are well beyond their depth. In part, it is understandable for people in general to be confused, and rudderless, because instant communication demands instant response; this leading to cascades of inanity on 'social media' (MSM too) such as X-twatter.

    There is deep irony to observing Microsoft and publishers of so-called 'news' battling in court. Not only is each a leviathan by size and temperament, but also they may fittingly be considered remnants of the dinosaurs. They both wallow in a protected pool wherein their anachronistic modes of doing business (rentier economics) persist; they are arguing over share of the 'cake', not matters of principle.

    Legislation rooted in the 18th century is not, in fact never was, fit for purpose. The inception of the 'digital age' has made clear that constructs capable of being expressed in digits cannot be owned, and controlled, in the same manner as physical artefacts. Law framed as if that were so is becoming unenforceable. That is pragmatic reality. Only simpletons imagine 'law' always to reflect a common notion of morality, or law always to coincide with good sense.

    The players, in this reported legal battle, bear comparison with the Luddite's of old. Of course, they collectively are far more powerful/influential than were Luddites. They are trying desperately not to be swept away by an innovation supporting creation of possibilities for the many. Irony is compounded by the fact that doing away with 'intellectual property' (IP), to be replaced by 'attribution', leads to market-capitalism far less tainted by monopoly; however, in a world dominated by conglomerates, market-capitalism, as once understood, has ceased to be.

    A further twist is that this and other 'protection of interests' legal actions (including criminal prosecutions) can be rendered nonsensical by a few strokes of the pen elsewhere on the planet. The 'Global South', so-called, contains many nations recently emerged from Western colonisation. Each has entered a global market for things, and for ideas, which is dominated by rules (conventions) set during the colonial era. It takes but one nation, that is one out of reach by US Marines, to recognise that its population's future is best served by unshackling people from a moribund body of law. The nature of the beast is that following the dropping of pretence in one nation that IP exists, the whole rotten legal edifice will collapse across the globe; that unless fools in the USA and UK take it as an opportunity to start WW3.

    1. anonymous boring coward Silver badge

      Re: Leviathans fighting over scraps whilst the world goes to pot?

      Well, I can’t totally agree with your conclusions.

      But, yes, a different world where there is no such thing as copyright, or patents, or trademarks, or intellectual property, can be envisioned.

      However, rest assured that the closer the tiger economies, or what have you, gets to parity with the west (or surpasses) the more they will start to shout about the above mentioned protections.

  11. DS999 Silver badge
    Facepalm

    As if Microsoft

    Doesn't very well understand the difference between "personal use at home with no commercial benefit" and "business use on a massive scale with hoped-for trillions in commercial benefit"

    1. anonymous boring coward Silver badge

      Re: As if Microsoft

      Their arguments are designed for dinosaur judges.

      This is meant to drag out for 5-10 years.

  12. tiggity Silver badge

    American pie

    For those that may not have read this, quite interesting in verbatim chunks of (copyrighted) song lyrics can be regurgitated without needing to use sophisticated prompt engineering (the usual flimsy excuse "AI" companies trot out is that users are essentially "hacking" via super clever prompting to get copyrighted data out)

    https://thenextweb.com/news/generative-ai-regurgitates-training-data-copyright-fair-use

    Also interesting is this on book text regurgitation

    https://www.cnbc.com/2024/03/06/gpt-4-researchers-tested-leading-ai-models-for-copyright-infringement.html

    Almost looks like OpenAI are taking a deliberate "we don't care about copyright" approach, given how badly they performed compared to other competing "AIs"

  13. Grunchy Silver badge

    Copyright = right to make copies

    There’s a tricky issue when someone programs an artificial brain to emulate a particular, identifiable style, but in a “non-infringing” way.

    But “copyright” is a well-understood legal concept (as are Deceit and Fraud).

    I’m confident this all gets worked out, eventually!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like