
There's a pretty good argument here
I think there's a reasonable argument that the weights for one of these LLMs is essentially a lossy-compressed copy of the input data. That they can regurgitate snippets of that data is merely a symptom of this deeper problem. If true, this is could be really bad for them (legally speaking) because courts have held in the past that simply loading data into memory from disk constitutes making a copy (this is why you can own a hard-copy book but with software you merely own a license to it). That the compression is lossy doesn't help that much; the RIAA will go after you for sharing an mp3 no matter how low the bit rate. I'd be curious to hear from a copyright lawyer about why none of the plaintiffs have been using this angle of attack.