"or the player piano"
Please note M$: piano roll manufacturers were eventually obliged to pay royalties.
Microsoft is coming out swinging over claims by the New York Times that the Windows giant and OpenAI infringed copyright by using its articles to build ChatGPT and other models. In yesterday's filing [PDF], Microsoft's lawyers recall the early 1980s efforts of the Motion Picture Association to stifle the growth of VCR …
Fair enough.
But this should also mean that the news agencies will have to recognise as creators and then pay those who actually make the news. All most news "sources" do is collate and re-report around an event. Sometimes there's a bit of analysis, but usually not. And for bread-and-butter news it's pretty common that all they do is repeat what the police or courts have released, perhaps with a few more flowery words (or even use a company press release).
If my deeds are helping fill the pages of "Before the magistrates", then I should be paid. Without me there would be no content, they'd have no business.
Need to get a LLM trainer to read your existing Windows install then use the resulting LLM to generate a Windows install on your new system.
The you will be able to sell your LLM to anyone who wants to solve the problem your model has been trained to solve.
That definitely, according to Microsoft’s current lawyers, won’t infringe MS’s copyright…
VCRs were personal items that people used to record stuff, and no, copyright law didn't stop the VCR. Though in some countries blank media had a levy to compensate rights owners regardless of what the media was actually going to be used for.
Fast forward a couple of decades to the DVD era. It's now possible to make good quality copies of DVDs, and at scale too. You know what? Copyright law had quite a lot to say about people that did so.
So conflating old analogue tech mostly used by individuals/families with a modern corporate garbage spewer that pilfers other people's content is perhaps the dumbest argument I've yet heard regarding AI.
Zactly.
VCRs were largely used to time shift programs for personal use, which is why they passed the court tests.
If a VCR was used to make a new film for commercial use from 100s of snippets of existing films (analogy to LLMs), that would be illegal.
And this is why I am happy to download stuff.
Why should I pay a levy on blank DVDs, blank video tapes?
More to the point why should I pay a levy for a blank MiniDV tape which goes into my HCR9?
Why should I pay a levy for the backups of my OWN PHOTOS?
Why should I pay a levy for burning my OWN VIDEOS to DVD or BD?
No I object to paying levies on tapes for some theoretical pirating.
I object to the dumbing down of kit to ruin copying.
I was so lucky my edit VCR and portable VCR both worked on regenerate sync, rather than degrade it. then Sony were forced to rejig the format to degrade.
Later on I captured all the originals to hard disk, and of course the levy crap applied to blank DVDs.
The VCR was a tool used to do something - and that something could be illegal or not depending on what the user did with it.
LLMs are a tool used to do something - and that something could be illegal or not depending on what the user did with it.
Not only are they the same in that respect, that argument actually means that you still can't use them illegally and still need to get consent for the data you're using, and can't just randomly spew out thousands of copies and sell/give them away without the original owners seeking action against you.
This is a dumb analogy, and actually makes the argument fall against them even worse.
You are overlooking. Key difference: to make a VCR you don’t to use copyrighted content. Okay I might use copy of say Pirates as part of my factory testing, but the receipient of the VCR would never know this. However, a LLM needs to be trained, ie. have “read” potentially copyrighted material.
So for equivalence to the VCR, MS have to sell the tools (and only the tools) that allows someone to create their own LLMs, using content within their collection, which if for personal use is covered by current law, if for commercial use or sale then the laws require you to get a licence etc.
Interestingly, this reminds me of compiler licences, where (back in the 1980’s) you had to check the licence as the basic licence typically allowed for development and private use of the output, but not for commercial exploitation ie. resale. The other licence trip up was the libraries, whether they could be included in a commercial distribution or not. I’ve not had reason to review the licences companies such as Microsoft attach to their compilers in recent years - would not be surprised if there are more favourable terms for software that’s intended to run on Azure or only available through the MS store.
The VCR analogy is spurious, although it does draw attention to the fact that they appear to be duplicating and distributing copyright material without legal authority. Perhaps a better argument in their defence would be that of "derivative works", based on, but substantially different from, the original article. The difference is the part where it gets tricky and ultimately may come to a jury or judge deciding if it is sufficient. Presumably publishers of newspapers, novels and media can implement licencing terms on their website content prohibiting future use in training models? Not a solution for those already ripped off by the LLM models, of course.
I disagree that there is any analogy. Training data doesn't seem to have resemblance to source code. It is very similar to human learning.
Imagine that there is a human with a perfect ("photographic") memory. The fact that you could ask that person to repeat "the sentence on page 75 of the book - the one which starts with 'Fred took down the picture...'" would not make that person's reading of, and learning from, the book anything other than fair use.
> Imagine that there is a human with a perfect ("photographic") memory. The fact that you could ask that person to repeat "the sentence on page 75 of the book - the one which starts with 'Fred took down the picture...'" would not make that person's reading of, and learning from, the book anything other than fair use.
That's an irrelevant analogy and argument, because neither corporations nor LLMs are human. They operate at a scale, speed, and lifetime that no single human could achieve, and that is why we have different laws for corporations.
No
Ultimately LLMs are processed using the same boolean computer instructions that have been around since at least the PDP11, probably earlier. The training data is used to compile billions of IF statements that determine what the output is going to look like.
I pick on the PDP11 because it was the first computer to run Unix. Most computers these days run some sort of Unix or Unix-like clone. The exception is those that run Windows, which is different, but not in a way that is relevant to this discussion.
What has the computer architecture got to do with it? Training data (and human mental models) are (both) DATA, not code! Sure, modern computers are all architecturally similar but there is no reason that you couldn't have an LLM built on an analogue computer design (or the wetware architecture used for the human brain). The processing architecture is irrelevant to copyright.
Selling your VCR recordings was copyright violation, or giving away multiple copies. That was never permitted.
MS Lawyers are deluded if they think a corporate scrape is comparable to VCR use on TV broadcasts which was mostly personal time-shifting. Anything else was quickly stamped on.
Video tapes were a device that could be used to make and watch recordings from both legal and non-legal sources. If you set up a market stall selling video recording from non-legal sources, law enforcement would shut it down.
ChatGPT / CoPilot / etc don't give you the choice of which training data to use. They are the market stall selling access to material from non-legal sources.
...the other issue with VCRs (and audio cassettes for that matter).
When you bought blank media, there was a levy to help (in theory) go to a fund to compensate for piracy.
So how about MS etc, pay into a fund every time a query is run that uses stolen material?
Yeah thought not
Apples and oranges:
The VCR is a tool that allows the making of recordings. Of itself, it breaches no copyrights or patents. Using a VCR to create recordings using copyrighted material for non-private use is a breach of copyright.
The LLM trainer is a tool that allows the making of LLMs. Of itself, it breaches no copyrights or patents. Using a Trainer to create LLMs using copyrighted material for non-private use is a breach of copyright.
Or to put it another way:
The <tool> is a tool that allows the making of <product>. Of itself, it breaches no copyrights or patents. Using a <tools> to create <products> using copyrighted material for non-private use is a breach of copyright.
Feel free to replace <tool> and <produc>t as you wish (e.g. "DVD Burner", "DVDs")
No, you are wrong. Using copyrighted material is not a breach of copyright. Only reproducing it is a breach of copyright. So "Using a Trainer to create LLMs using copyrighted material for non-private use" is not a breach of copyright. Just as, using a device to analyse some recorded music to discover the number of sharps and flats used in it (to take a silly example) is not a breach of copyright.
The digital era, in so far as it impacts upon ordinary people (scathingly called 'consumers'), took off in the 80s. Digital technologies have ceased to be optional in the context of most activities, ranging from manufacture through to entertainment; their applications to warfare undoubtedly excite the reptilian brains of belligerent political 'leaders'. Applications at one time in the realm of Sci-fi lie on the horizon. Digital technology is pregnant with further possibilities.
The latter 20th century, through to now, is the most intellectually challenging, hence stressful too, time experienced by ordinary people. Their leaders, sadly almost all of them far too 'representative' of the limited outlook and capacity for thought among electorates, are well beyond their depth. In part, it is understandable for people in general to be confused, and rudderless, because instant communication demands instant response; this leading to cascades of inanity on 'social media' (MSM too) such as X-twatter.
There is deep irony to observing Microsoft and publishers of so-called 'news' battling in court. Not only is each a leviathan by size and temperament, but also they may fittingly be considered remnants of the dinosaurs. They both wallow in a protected pool wherein their anachronistic modes of doing business (rentier economics) persist; they are arguing over share of the 'cake', not matters of principle.
Legislation rooted in the 18th century is not, in fact never was, fit for purpose. The inception of the 'digital age' has made clear that constructs capable of being expressed in digits cannot be owned, and controlled, in the same manner as physical artefacts. Law framed as if that were so is becoming unenforceable. That is pragmatic reality. Only simpletons imagine 'law' always to reflect a common notion of morality, or law always to coincide with good sense.
The players, in this reported legal battle, bear comparison with the Luddite's of old. Of course, they collectively are far more powerful/influential than were Luddites. They are trying desperately not to be swept away by an innovation supporting creation of possibilities for the many. Irony is compounded by the fact that doing away with 'intellectual property' (IP), to be replaced by 'attribution', leads to market-capitalism far less tainted by monopoly; however, in a world dominated by conglomerates, market-capitalism, as once understood, has ceased to be.
A further twist is that this and other 'protection of interests' legal actions (including criminal prosecutions) can be rendered nonsensical by a few strokes of the pen elsewhere on the planet. The 'Global South', so-called, contains many nations recently emerged from Western colonisation. Each has entered a global market for things, and for ideas, which is dominated by rules (conventions) set during the colonial era. It takes but one nation, that is one out of reach by US Marines, to recognise that its population's future is best served by unshackling people from a moribund body of law. The nature of the beast is that following the dropping of pretence in one nation that IP exists, the whole rotten legal edifice will collapse across the globe; that unless fools in the USA and UK take it as an opportunity to start WW3.
Well, I can’t totally agree with your conclusions.
But, yes, a different world where there is no such thing as copyright, or patents, or trademarks, or intellectual property, can be envisioned.
However, rest assured that the closer the tiger economies, or what have you, gets to parity with the west (or surpasses) the more they will start to shout about the above mentioned protections.
For those that may not have read this, quite interesting in verbatim chunks of (copyrighted) song lyrics can be regurgitated without needing to use sophisticated prompt engineering (the usual flimsy excuse "AI" companies trot out is that users are essentially "hacking" via super clever prompting to get copyrighted data out)
https://thenextweb.com/news/generative-ai-regurgitates-training-data-copyright-fair-use
Also interesting is this on book text regurgitation
https://www.cnbc.com/2024/03/06/gpt-4-researchers-tested-leading-ai-models-for-copyright-infringement.html
Almost looks like OpenAI are taking a deliberate "we don't care about copyright" approach, given how badly they performed compared to other competing "AIs"
There’s a tricky issue when someone programs an artificial brain to emulate a particular, identifiable style, but in a “non-infringing” way.
But “copyright” is a well-understood legal concept (as are Deceit and Fraud).
I’m confident this all gets worked out, eventually!