Google, DeepMind accused of 'stealing the internet' to create Bard AI chatbot


It's debatable whether training LLMs or other generative AI infringes copyright. But LLMs aren't trained directly from the internet, they're trained on a curated data set that comprises COPIES of the scraped copyrighted data. I'd go after the copies made tor the training set if I was a lawyer.

Big tech will argue the training set is a cache. No. It's a copy.