"... a tech industry figure insisting that the "original sin" of text and data mining had already occurred and that content creators and legislators should move on ..."
Translation: "All your data belong us"
Governments are allowing AI developers to steal content – both creative and journalistic – for fear of upsetting the tech sector and damaging investment, a UK Parliamentary committee heard this week. You're going to get a vanilla-ization of music culture as automated material starts to edge out human creators Despite a tech …
Exactly. Copyright is absolutely absurd at this point. At the VERY least it should end immediately upon the author's death, because the ENTIRE purpose of it is to encourage creativity. A dead person creates nothing.
Corporate copyright should be banned entirely.
I've heard that - "Bad data is worse than no data" - from a Marine, but I don't believe that's always true. In code, obsolete comments can provide useful human-pointers. The phrase, "Trust, but verify" seems applicable there.
As to LLMs, good luck with your tarpits.
No. It's no different that you reading a bunch of books and writing another book using your accumulated knowledge.
That's why all of this nonsense from publishers is bullshit.
I'm not saying all this AI bullshit is a good thing, it isn't. But claiming that the training is somehow a copyright violation is utter nonsense.
Eh, no.
It's not the same as me "reading a bunch of books and writing another book using your accumulated knowledge", it's the same as me stealing a bunch of books that I then read and write another book using my accumulated knowledge.
Glossing over whether "building a statistical model of a language that allows for probabilistic generation of plausible text" is any way similar to "accumulating knoweldge" the crime alleged is the initial theft, not the subsequent "writing".
I don't think that's correct that the theft is the only thing that matters — if it was, the big AI companies could have absolved themselves of any liability by just paying for one single reproduction of the works they used. Even the most prolific creators would hardly get more than a few thousand bucks for that, and that's clearly not enough to compensate them from the fact their style can now be copied at mass scale for cheap.
"It's no different that you reading a bunch of books and writing another book using your accumulated knowledge."
If I write to you the first paragraph of an NYT story, you don't immediately continue where I left off, possibly paraphrasing. ChatGPT and others will, because they do not know things. They calculate and select next likely word fragments based on recent word fragments.
But also if you take part in a legally limited use of content (read article) and then you choose to exceed that use (and egregiously) then you owe more for breaking that content use agreement. So do OpenAI and the other datascrapers.
Despite a tech industry figure insisting that the "original sin" of text and data mining had already occurred and that content creators and legislators should move on.
Not a problem. It's reversible. If you can't pay for what you took just delete the training. All of it. And hose backups.
Implicit to demands by 'creators', and by the massive industry (plus middlemen) marketing digital creations, is an assumption of ideas and digitally encoded products, possessing substance like physical artefacts and therefore able to be 'owned' in the same sense.
Realisation of the specious nature of 'intellectual property', such as enshrined in the Statute of Anne (1709), has been dawning painfully for copyright rentiers during the massive global build up of digital connectivity. Digital artefacts are not containable in locked cabinets. Indistinguishable copies can be made and distributed at negligible cost by anyone possessing a single copy. That is hard reality: a fact of life. Digital sequences cannot be 'stolen' because, unlike with a physical artefact tethered to a unique instantiation, a putative 'owner' cannot be deprived of his master copy. The losses whined about relate to potential income from renting-out sequences under the aegis of an artificially created monopoly; sequences have no intrinsic value, hence they lack scarcity, and therefore there cannot be 'price discovery' in the context of supply-and-demand market economics.
Set aside the concerns of 'entitled' publishers and those of the middlemen, to consider the matter from the point of view of people capable of 'creating' something of lasting cultural worth, there being far fewer of these than the industries 'milking' also-ran talent admit. People who genuinely 'create' are internally 'driven'. Some may not care about recognition, but most value, perhaps crave, admiration and respect from people deemed capable of grasping the creator's particular niche in culture. Truly 'driven' folk seek to devote all their time to creative endeavour and its offshoots (e.g. education). Most need to generate income from beyond their own resources.
Suppose, somebody has a burning desire to establish herself as an author of bodice-ripping yarns. She will devote free time to writing. She will hone her skills and seek constructive criticism from friends, teachers, and other acquaintances. Traditionally, an aspirant writer had to pique the interest of a publishing house. In general, said writer starts off trying to get short stories published in magazines and the like.
Somehow, an ethos of 'entitlement' has arisen, wherein the mere fact of a work being published confers an unquestionable accolade. People humbly buy 'the book'. If the purchaser doesn't like, or understand, the work, then, should the author already be 'known' the buyer most likely is revealing his own inadequacy. Regardless, trying to persuade a bookseller to give reimbursement for a book the customer regards as, by virtue of content, 'unfit for purpose' is an errand for fools.
Bear in mind, by now our author is thoroughly imbued by the attitude of her publisher with respect to 'rights'. Of course, the publisher gets the lion's share of revenue; in days past the publisher has taken an investment risk by ordering a print run. Electronic publication somewhat changes that.
The digital era renders previous practice almost anachronistic. An inspiring writer, perhaps paying for initial advice, can publish directly online. The author must build a following of appreciative readers. That is, to acquire 'reputation'. The author can solicit financial support, e.g. via crowdfunding, for new projects. Perhaps, writing can become a full-time occupation with money set aside for a pension. The writer can interact directly with the growing body of admirers. In the absence of copyright (it increasingly unenforceable anyway) 'reputation' is her sole asset; it designates her place in the competitive market of skills seeking patronage for her genre of writing. Anyone can distribute digital copies of the bodice-ripping stories. Anyone can take her works, a particular work even, and 'derive' a new version. For example, in a different genre one could rewrite a tale about 'Harry Potter' with a new ending.
But, shall not our authoress be ripped-off, left, right, and centre, by unscrupulous people? Not if 'entitlement to attribution', a key protection against barefaced plagiarism, is given legal backing. Anyone, is entitled to do what they like with the thrilling yarns, but whatever they distribute must give clear recognition of origin. The 'Gordian Knot' of copyright is sliced to be replaced by simpler to understand legislation based upon the concepts of misrepresentation, fraud, deceit; these tied into principles of civil and criminal law.
> Somehow, an ethos of 'entitlement' has arisen, wherein the mere fact of a work being published confers an unquestionable accolade.
It's called copyright and it's not "somehow", the entitlement is its very purpose.
> a key protection against barefaced plagiarism, is given legal backing
Quite.
I wonder how many actual print publications the OP has? It is probably one of my many psychological failures, but I do get quite a good feeling seeing what I wrote in print on physical paper. The fact that someone else has not only read what I presented, but believes it to be worth publication fro others to read, and sometimes is even cited by yet more people is quite the ego trip. Of course, I claim I do it for the advancement of knowledge (academic papers), or making a contribution to the discussion of an issue in the public consciousness, (letters to 'the press') but read Proust's 'In Search of Lost Time' for how an author reacts to seeing their work in print for the first time, or Stephen Fry's reaction to receiving the six free copies of his first book.
I think we all get that to some extent, but the creator of this thread appears to think that's all we should ever need to create something that's not physical. I do wonder, other than wordy defenses of piracy, what things that person creates? It would make a lot of sense if those things were physical, the one category they still think has value.
Oh do fuck off.
This is the sort of self-centred word salad produced by over educated (and under intelligent) tech guys - and can be summarised by the typical offer: "If you let me use your work, I can pay you in exposure". Artists have been receiving vacuous offers like this since forever, and so far the answer has always been the same. (See first line of comment).
No, reputation and attribution are not fair reward for the time and effort put into a creative act - and nor should they be. You've taken someone with one skill (making art) and required that they magically possess a second skill (self-promotion) and a third skill (policing the internet) and a fourth, especially magical skill (convincing people that something they can take for free should receive reward - trust me, this is not as easy as it sounds). It does not matter if you think mere electrons are worthless, and therefore any works are worthless - creative acts of any value whatsoever require effort, and if we value creative acts rather than AI slop, we as a society need to organise ourselves in a way that the effort can be made.
Or in other words, if we want artists (yes, we do), we should feed them, clothe them, put a roof over their heads and ensure they can continue creating. That means that we find a way to make abstract creative acts (whether physical paintings, or digital illustrations) receive reward. That, in turn does not need a magical new business model that as yet has not been proven to work - it needs existing concepts of ownership to be carefully revised to continue protection for artists.
And if you want to understand how badly the concept of patronage works in the internet age, go and take a look at the hundreds of excellent indie game studios currently closing because it turns out that offering free-to-play or free-to-try games does not bring the rewards they though they would. Successful patronage turns out to be a (admittedly very noisy) outlier in this day and age. Everyone else gets peanuts.
-- if we want artists (yes, we do), we should feed them, clothe them, put a roof over their heads and ensure they can continue creating. --
I would fully agree with this if it didn't allow for things like the Turner prize winning "a room with lights going on and off" to be included.
And this is why I am a fan of copyright. It makes it possible for art to survive, but it does that by letting people express how interested they are in various types of art. If I find something unpleasant or annoying, I don't buy it, and if everyone doesn't buy it, then the artist who made it will either change their approach or do something else. Our other options appear to be not supporting anyone, in which case only the richest artists will be able to make all the art they want, or we support artists through direct funding, in which case many artists that nobody likes will be funded just because they are artists. I oppose both of those alternatives.
You're a fucking retard mate, you attract downvotes cause you write like you have fascist beliefs and you're egotistical
Your opinions are wrong, stupid and dangerous.
Drop fucking dead, pal. Seen you crying about downvotes with no rebuttal so here's rebuttal:
You speak such absolute bullshit you're either trolling or evil. Either way: I genuinely think the world would be slightly better without you
The way forward is to legislate to force the AI slurpers to remove the stolen data from their models.
And if they can’t do that (and they keep saying they can’t untangle it) then they must delete the WHOLE of their model and’s associated data and start again, this time WITHOUT stealing stuff
Yes it will cost them a fortune. My heart bleeds - they shouldn’t have done it in the first place
Why are governments so utterly shit when it comes to dealing with crap like this?
To rephrase your question: "Why do companies get away with stuff an individual would not get away with?"
It is irrelevant whether or not it is "too hard" to untangle the unlicensed content from their models, its infeasibility should not factor into the decision at all. Either they can continue to use their model, provided they can remove all unlicensed content in its entirety as well as from any up- or downstream data sets they use, and provide third-party verified proof they did so, or if they can't do the complete aforementioned, then they can't use any product(s) derived from the illicitly used materials. It's that simple.
If we were serious about holding people accountable (but we'd have to be a nation of laws for that, and we aren't), then there would be actual repercussions and accountability for the officers of the company. Isn't that why they are paid the big bucks, because they are 'responsible' or something? If you want to play CEO or be some other senior officer of the company or hold effective power over direction and activities of the company, that's great, but that comes with strings attached: you will be personally liable for the activities of the company. Let's see how much law breaking still happens if we remove that immunity...
Similarly, if "companies are people", let's start executing some of them: corporate death for the entities and prohibitions for any of its officers on being an officer of any company, or have effective control over the direction of a company.
But now watch the courts kowtow to these perpetrators and not only placate them, but ask them for their own suggestions regarding "how would you, as the person being told to appear in front of the court today, like to go about rectifying things? Oh, you suggest a pinky promise to not do it again but keep what you have? That should do, of course...". Anything suggested by these perpetrators as reparations should be immediately dismissed as not enough, because nothing they will suggest would move the needle one angstrom.
Creators, aka working class, the pleb, excess carbon are just expendable cogs in the machinery of the world.
They should be grateful that their creations will be forever embedded in the AI models that in the future will take over being in charge of the planet.
That poem you wrote, that song you recorded, that angry comment you sent on public forum, the picture you took of your dumb face, this will all become the foundation of what makes the AI.
It's like having children, but much bigger than that. AI will travel to other universes and your little contribution with it.
All the rest is just being salty, because you will not get some cash from it.
Think bigger, think different.
> You won't win a fight against big corporations, so embrace what they do.
No! That is how authoritarians and fascists get into power. Our liberties are interconnected, and to sell out ones own liberties is to sell out everyone else's liberties.
This mentality is wrong and there is no compromise on this. No!
There are only 12 notes in western music scales, and where does art come from if not taking inspiration from past artists? In short, almost all music melodies have been made, and stolen, and reused at some point.
Then came software that allowed you to dissect and "explode" music into its individual parts, rearrange a few notes with little or no training, and BAM! "Look ma, I'm a musician, no hands!"
It became more about content than listening. It got even stickier with illegal software downloads (ironically, often from russian websites lol). then top acts and artists, that were really just content creators, that were young enough to work the web for attention started getting paid to advertise and push the "A.I." inspired agents that would do the stealing and rearranging for you. The kids instantly latched onto that of course. It's the latest and greatest!
All of a sudden these same influencers are getting ripped off by the up and comers and are like "wait! not like that!" lol
The music industry has very good lawyers that can go after content creators using such things for top acts, but not for the smaller indie guys, and the labels want the youth movement. They don't want to isolate their sales base.
A very sticky web indeed. As ACDC said... "who made who?!"
Musical copyright cases are often complicated by the problems you're describing, with one creator thinking they own a simple set of chords. This is why they often lose them, though it's mostly a role of the dice to see what the jury thinks that day. However, your simplification misses several important points.
Yes, there are twelve notes in an octave. There are also many octaves (technically unlimited ones, but we can limit it to seven or so), and many instruments can and do use notes between those twelve, and there's a lot more to a sound than its frequency which is why we don't listen to all our music played on the sine wave. That makes no difference, because it's similar to saying that there are only twenty six letters in the English alphabet, so anything written is just an arrangement of those. Not every melody has been previously generated, even if they have similarities. While some people may try to claim ownership over sections that are far too short, there are people who create new works and seek to protect the whole, rather than each component. Meanwhile, people who intentionally made minor changes still had to compensate the original creator; while some people may have decided that covering someone else's song would be a quick way to fame and some of them were right, they had to pay for the right to make that cover. The same is true of sampling. It wasn't free when the people you're talking about did it.
Even if we decided that music has two few components, that doesn't extend to other forms of work. There are a lot more arrangements of words than there are of notes and more reasons to string some of them together. Depending on how into information theory you want to get, you can put visual art above or below music on the entropy scale, and even if you consider a picture to have less information content than a song, video lets you extend that quite a bit longer. Generative AI companies have been helping themselves to all of those things without permission. To me, how original these things are is not the question. If it was copyrighted (it was), and they considered it worth including (they did), then they need to obtain the rights to it. A lot of those rights will be really cheap. If it was so unoriginal that it didn't add anything, there should have been no problem excluding it from the training data. They included it for a reason, they found that their models were better with it than without it, and they can pay for that.
"While some people may try to claim ownership over sections that are far too short, there are people who create new works and seek to protect the whole, rather than each component."
I wonder if a better approach would be to randomly sample, say, a hundred people to ask what song "X" is by playing parts of the music. If enough people mistake it for song "Y", then it is infringing. This can stop arguing over individual phrases of notes that's like a 1000th of the entire piece, while catching out those that aren't exactly identical but close enough to be mistaken. There is "prior art" here, as sometimes Trading Standards does blind tests of asking members of the public to identify knock off goods (my mother was asked to do this once when shopping in Woking).
"with one creator thinking they own a simple set of chords"
Nobody owns a chord, and nobody owns a chord progression (like I-vi-iii-IV (1-6-3-4) for metal). These things are like the fundamentals of how music works and it would be as crazy as somebody claiming that they own 4/4 (common time).
"There are also many octaves (technically unlimited ones, but we can limit it to seven or so), and many instruments can and do use notes between those twelve"
There's a lot less than that because music has to "flow", to set a mood, to help tell a story. Generally one can't just shove a bunch of random notes together and expect it to sound good. Luckily, however, this restriction can be balanced out by time - note durations - and just as important, the silences between notes.
"To me, how original these things are is not the question."
Exactly. They never asked, they just took. But perhaps even worse is that inside the AI is this big virtual blender where everything churns around and, well, forget about anything that even remotely resembles attribution. Essentially industrial scale theft of created works.
"A lot of those rights will be really cheap."
Imagine the bureaucratic nightmare of even paying a single dollar to "the internet". Still, that's not an excuse.
Well, there are two angles here.
First: I'm not a copyright lawyer, nor an expert in legal ease. I can simply attest to what I've seen go down in my life from my experience. I can tell you these guys have very high priced lawyers, and high priced lawyers can do black magic you didn't know was possible. They will have up turned left/right up/down and get away with things you didn't know you could get away with. So, if you are going to go up against these guys in any kind if legal situation, you guys all better be on the same page, and you better have a good story. In my experience in a case like this, if taken to court... would take a decade or more to get through the case, and then they will say... "ok, heres a billion dollars that you guys can fight over, have at it." Meanwhile, they have talked you into a "compromise", and made a half a trillion. So, good luck with that.
Second: what these guys are doing is stealing you, and then synthesizing you into a copy, or reproduction in a lot of cases. Same as you described above. They can have many other uses, such as targeting ads, or profiling you for future jobs and such... but mostly stealing your ideas, your looks (if attractive enough), and or speech, and then reusing them to train their language modals (mostly ideas and language).
There is another issue here, and that's you mostly signed all that away when you become a "content creator" anyway. which means you will have no case to begin with. Good luck with that.
My beef is with the data warehouses. Those guys are sitting back there collecting everything, and then selling it to third party God knows who, for God knows what, and we really don't have a say or choice. You mostly have to sign off on that ish just to live and breath. I have a big beef with those silent ass hats, and they don't speak up for a reason. They know what they are doing is bad things. My opinion is if you can go after, and shut down, these HIGHLY UNSECURED data warehouses, you could solve most of these privacy issues, but it won't happen because society has become too dependent on them. Especially law enforcement and such. Just giving you my 2 p.
"the "original sin" of text and data mining had already occurred and that content creators and legislators should move on"
Let's try this differently.
" the "original sin" of ripping movies and music and dumping them online for easy access has already occurred and that content creators and legislators should move on "
Somehow I don't think the media companies or their fancy lawyers would be the slightest bit persuaded by that sort of broken logic.