"There are now language models with hundreds of billions and even trillions of parameters trained on terabytes of text scraped from the internet." [emphasis added]

A fallacious impression has emerged that if it's in the public space it's free for all to exploit. That is of course not the case, as anyone who's ever bought a DVD of a movie will have been forcibly informed. However the AI academic circle seems to believe that copyright laws don't apply except in respect of their own research papers.

Sadly, enforcement of authors' rights is almost impossible as finding out about the infringement is a million to one against.And the same applies to personal data - particularly photos of identifiable individuals, social media posts etc.

