Re: Huh what a load of twaddle
> getting a source of data entirely created by humans remains as easy as it always was. Taping the creativity of humans is as easy as it always was.
That totally misses the point.
The total of the available data (or "content") will always contain materials created purely by humans. But, from now on, it will also always contain content created by non-humans.
The problem - that you entirely fail to address - is how to tell the two apart, how to sort the wheat from the chaff. We - both humans and the machines - can only read what has been published (whether to the public or to a restricted audience, like your dev team). And we have no way of telling, for certain, what was purely human, what was purely machine output and what was an unholy mix of the two by the time it reaches us.
Unless, that is, you ARE truly proposing that we "ask humans to sit in a room and create content" whilst under constant surveillance to ensure that what we get is purely human created, with a certificate of compliance to prove it.
In point of fact, yes, we could literally do that, pay people to sit in rooms and create on demand, to generate material that is fit to be fed into the maw of the LLM. But you are not going to get either quality or quantity from doing so, and especially not in an affordable fashion: "Hello, miss world-renowned author, how much do we need to pay you to have our invigilator watch your every move?" Not to mention the cost of the invigilator's wages.
And we'll just do that for everybody who may feel like publishing something, just in case they turn out to be the next high-quality creator, because we want to have machines trained on the best, so they can generate the best. Although, hang on, we also want machines that can, on demand, sound like the young, or the old, or the illiterate, or managers, or salesdroids, or the "xxx as a second language", every shade of human (to help when asking the machines to write/edit/criticise a book, a play, a film script).
It will be a great relief to everybody, not just the trainer's of LLMs, when everything we ever see/read/hear is accompanied by a rating, from "hand crafted by humans" or "shat out by AI", issued by the BBFC[1]. But until that happy day, this concern is anything but "twaddle".
[1] Big Brother For Creativity