
Chisel your own lost marbles (and nuts!)
#winning
Thomson Reuters has won a partial summary judgment in a copyright case against shuttered AI firm Ross Intelligence, a decision that disallows fair use as a defense for training models on proprietary data without permission. "We are pleased that the court granted summary judgment in our favor and concluded that Westlaw’s …
I see beast666 is now posting as AC, presumably after having most of his recent posts removed. But you just can't hide the stupid. I'll wager that's also him inexplicably banging on about Hilary Clinton in an unrelated article. What a waste of space.
Patience might be helpful here. You're demanding a response from people mere minutes after posting something. These forums don't work that quickly. You're also getting angry at what is at most two downvotes your original post received (no, not me). For all I know, it was only one by the time you complained. Get used to it, more people will express their views through votes than through replies. You may not get many votes or replies on this topic because the copyright of legal documents is not an article a lot of people are likely to click on, but if you do, they won't come through as quickly as that.
If I want to train some AI on publicly available court decisions, I have a brilliant idea: I should do it with the public court decisions, rather than someone else's summaries of those decisions that I don't have permission for.
Yes, I do like paying people for things if I like the price. If I don't like the price, then I don't buy them. Price discovery and an open market are almost always available. If I want a certain book, then it is not hard to find the places selling the book and how much they charge to get a copy. If they all charged £200, then I will probably read a different book. Of course, there have to be some restrictions. For example, one of the only cases where anyone tries to charge £200 for a book is for textbooks, which is why I would buy used ones. I'm sure textbook authors would try to prevent people from selling used copies if they could, and I will fight against any attempt they may make to do that, but that is much less far than you constantly argue for.
I find it useful to note that when I read an item on the web, I have to copy the text and sundry markup and metadata onto my computer.
As I read the item, I am, in effect, training my own wetware intelligence1
Am I violating Reuters's copyright by doing so with their material?
My only alternative, according to this decision, is not to use the web or read anything at all.
This may seem like a victory, of sorts, but is it? Victories are sometimes Pyrrhic2.
_____________
1 For admittedly small values of intelligence.
Lawyers pay for licences for the copyright content they use in their work, actually. The books are exceptionally expensive too, I used to be the person buying the licences! The reason they are expensive is that it's a commercial licence ;)
>Am I violating Reuters's copyright by doing so with their material?
No, you are not. The LLM is also not violating any copyright. You might notice that nobody is suing the LLM.
The LLM's developers, though, who are not the LLM, are not reading the copyrighted text. They are using it as input to a computer algorithm, and then commercializing the output.
That action is not reading, and does not even look like reading. It does look a whole lot like creating a derivative work and selling it, though.
That's why they are being sued, and not the LLM (who, as you correctly notice, is not doing anything wrong).
Consider, however, that everything is derivative in one manner or another.
It's been said "Good artists copy; great artists steal1 or, the great Tom Lehrer put it
Plagiarize!Let no one else's work evade your eyes
Remember why the good Lord made your eyes
So don't shade your eyes
But plagiarize, plagiarize, plagiarize
Only be sure always to call it please "research"2.
That digression aside, bear in mind that this is a victory for a giant corporation, Thomson Reuters, that seeks to control and hoard information for its own profit, and not some "starving artist" with a single website.
As I said above, yes, this may be a victory. But a victory for whom?
____________________
1 Though often attributed to Pablo Picasso, the actual attribution is somewhat murky.
2 Lehrer has placed the majority of his work in the public domain in 2022. A recording from 1953 may be found here.
The guy in the Lehrer song isn't exactly the hero of it, you know. This issue is not about picking the more sympathetic corporation. This issue is about finding the right rule, either currently, where we're trying to figure out what rule the current law already says, or for the future if we decide that is wrong and we want the law changed, for every situation where this crops up. This means that the giant AI corporation copying an individual's work without permission and a scrappy individual with an AI model copying a massive enterprise's work without permission should be treated the same way. That way, they both know what is required of them before they start out. In fact, I would suggest that the sizes of corporations demonstrate why we need that clarity; the kind of thing that large AI companies do routinely would be immediately and rigorously smacked down if an individual did it, and only similarly large corporations have the legal might needed to push against it. This is unfortunate, but I support the people with copyright, even if they are large, because by defending themselves they also defend smaller creators with the precedent.
"Am I violating Reuters's copyright by doing so with their material?"
Well, if you are told you need to pay for it, but you find a way of getting a copy without paying for it, I think you'll find that courts think you are. The same reason that, if I ever get a copy of the AI models these companies make and use them, even if I don't sell them, they're going to think that I don't have a right to do it.
Thank you for responding.
With regard to 'price discovery', I think you may have misunderstood the context in which I used the term.
First an example in relation to IP-associated products when it does apply. Suppose differing authors write travel guides to Venice. Their publishers will be aware that many guides already exist and temper the wholesale price of the books in an attempt to gain an edge in the market. Similarly, booksellers may choose to price their wares in accord with increasing their overall sales but on smaller margins. Prospective buyers, for whom the bookseller prices are the only thing to which they can respond, may 'shop around' for a particular work recommended to them. Printed-on-paper books are a physical commodity for which scarcity can influence pricing; in that market, scarcity is partly determined by popularity because further print runs are possible. Discovered prices represent the 'going rate', that above the not easily manipulable cost of printing and distribution.
Works available in electronic digital format don't fit into the conventional price/demand and scarcity dynamic. In essence, digital sequences, regardless of their cultural or practical worth, or of their cost of construction, have negligible intrinsic monetary value. Only under the aegis of legal monopoly rights may monetary worth be ascribed, and that is arbitrary at the discretion of makers/sellers. Obviously, vendors are aware of price as a factor in gaining sales. However, both the legal monopoly and public perception of value weigh in to protecting the ersatz market. Most people, at least those above a certain age, familiar with printed works, don't discriminate between the cost of constructing and distributing the physical medium (i.e. the paper, ink, and cover) and the fluid nature of digits which can transit among differing media expressions without need of permanent fixation or uniqueness.
Sequences, i.e. information expressed in the most abstract of formats, lack true scarcity and, once in the wild, cannot be contained. Their fecundity is unbounded.
Irrespective of law, and of misplaced (i.e. inappropriate for context) belief in 'property rights' being applicable, disobedience is rampant and shall continue increasing. Holders of copyright are fighting a fierce, yet soon to be lost, rearguard action. The Internet and, now, AI, are the final nails in copyright's coffin.
That leaves the matter of how freely available information (and culture) in digital form ought to fit into mankind's activities, and how genuinely imaginative people may be incentivised. Elsewhere, I explain the manner by which erstwhile monopoly can be reinstated within competitive market-capitalism. The result will be ruinous for the many middlemen feeding off the talents of others. However, a new Renaissance awaits in the wings.
Yes, I get it. To you, "price discovery" means "copyright is evil and everything should be free". From other comments you've made, I wouldn't be surprised to hear that most phrases mean that to you. To me, "price discovery" means discovering what prices things, including competing products, are should I want to obtain them. This is often not a problem for copyrighted works.
You focus entirely on the small costs of copying digital data, which you incorrectly reduce to zero, but they are quite close to zero so I can live with that. What you have failed to take into account, repeatedly, is that if these are zero, they become nearly meaningless to the price of the work, but the other costs such as the ones needed for the creator to produce the work still exist as they always have. I would try to convince you of this by pointing out the situation with those things you're willing to accept, but I fear that there are two problems that will prevent this from doing any good. The cost of printing up a physical book is much lower than the price it sells for because the difference is being paid to the authors, illustrators, editors, typesetters, and everyone else who did intellectual labor in order for the book to exist. Some of it is also being paid to the publisher because, in many cases, they paid in advance so the author could produce the book. Most of those same costs exist if it's a digital book, and if the people concerned can't get money, they won't do the work, and the book won't exist. I'm afraid that you will ignore this because you appear to think this labor is infinitely available, which it's really not. Furthermore, I expect that, were we having this argument in 1970, you'd be passionately arguing that anyone with access to a printing machine should be allowed to churn out unlimited copies of any book and anyone with the requisite copying machinery should be able to reproduce copies of any media at all, but you're reducing your scope to digital data here for simplicity.
If you would not argue this, then consider that digital copying is just a cheaper printing machine, and the creators of that book still incurred a similar cost to make it. If you would, though, I don't think we'll ever see eye to eye because you are arguing that intellectual labor has no value. If you really think that, why not just never buy anything made with it, as it has no value, but no, you want to have access to the products of their work but you don't think they deserve to benefit from doing the work you're enjoying unless someone just happens to donate. I still wonder what kind of work you do.
Whilst American courts host debates on copyright, segments of the planet beyond US jurisdiction blithely ignore US law, and pay minimal lip-service to international conventions concocted before many ex-colonial nations gained independence (of a kind). Countries, initially fragile, that is with respect to established ex-colonial powers, had to play by imposed rules.
Times change, and the rise of BRICS indicates willingness to stand together in opposition to Western hegemony. For instance, the US is fearful that one day BRICS will detach from the US dollar; Mr Trump has issued dire threats which upon realisation would, in fact, harm the USA. The US deeply depends upon rentier economics based on the increasingly risible concept of intellectual property; should a substantial chunk of the world unilaterally change the game, the entire US rentier edifice would collapse; only with great difficulty could the US return to being a major maker of tangible goods instead of an assembler of components made elsewhere.
Whilst this plays out, the US stands to lose its lead on 'AI'. Not only that, but the US will be unable to enforce rules concerning the use of AI manufactured outside its legal jurisdiction; after all, when 'trained', AI models can be condensed into smaller versions capable of running on ordinary PCs and, being software, these can easily permeate across the globe.
Although AI's potential may, at present, be 'over-sold', it nevertheless shall greatly impact knowledge-use and education.
"Although AI's potential may, at present, be 'over-sold', it nevertheless shall greatly impact knowledge-use and education."
Good point. Right now it's churning out garbage that people believe verbatim.
But perhaps there a slight chance tha the utter unreliability might help peoplet learn that believing the first Google result doesn't equate to knowledge.
We can only live in hope
The Hoover institute had a "Challenges Facing the US Economy" one day conference, a few weeks ago. For those not familiar: Hoover is neocon central - being chaired by Condoleeza Rice, for example.
One of the 4 sessions was on AI.
Summary of the presentations by the panelists: AI Uber Alles. AI will save the US economy and the world.
I asked a question of the economist and the policy expert:
Can you comment, from an economic and also a policy standpoint, what the impact of the systemic abuse of public copyrighted data by the AI industry might be?
The policy guy gave an excellent overview of the copyright problem, but said that AI is too important for this to be a barrier.
The economist actually pushed back a bit - saying that the sanctity of private property is a cornerstone of capitalism but concluded that "clarity" was needed.
My takeaway is that the fix is in: the commons of copyright information will be sacrificed for the enclosures (model training) underpinning AI.
It’s interesting that OpenAI has started buying licences. That sets the expectation that they know they need a licence. Any and all content for which they don’t have a licence must therefore be excluded from the training set as per their own understanding of the law.
This puts everyone with a website or YouTube channel in a very good position when the class action starts.
That's like saying "if Computers are the latest be-all-to-end-all to do office work, how did all those big companies work without computers?"
Just because something has not been used before it doesn't mean it cannot make work faster and better. But that doesn't mean it is going to do so.
So far AI has minimum benefits, gives a lor of answers and results that are plain wrong and or false and costs Trillions of dollars so it is not worth it.
They worked in an almost identical way, assigning statistical significance weightings to things and using vector maths to find stuff.
Fun fact, Google were also sued for copyright and lost, it's why they no longer put all the content on their page, robbing others of ad revenue. They now use sources like wikipedia directly, and pay for the privelege.
I don't understand why it is necessary for the individual cases to play out and maybe get appealed to the Supreme Court rather than the Supreme Court declaring the 30-odd cases to be a federal class action between multiple claimants and defendants, and render a preemptive ruling on this vital matter before any more billions are spent building out "Generative AI" (*cough* Bullshit!) projects and data centers.