it comes down to what are LLM''s doing with the information they slurp
if you go into what a LLM does with training data you will find it "tokenises" the data
that is it splits sentences into a series of numerical tokens, a token can be a word or part of a word, or even a comma or other punctuation mark
what it does then is put them in a large data base so it can run queries, but in simple terms the output of the queries is not full sentences its the probability of a word coming after another word in a particular context
if you ask "what colour is a cat" you will likely begin with something like "a cat is" with high probability, then you will get several choices like "most often", "very often", "sometimes" then a colour "black", "white", "ginger", "blue", "green", "red" etc and each option will have a probability of coming after a previous word / statement
so the answers will rarely be classed as a derivative work, as it will bear no resemblance to the training data ingested, so copyright is not the right way to approach this issue
most likely to succeed IMHO is if there is a "no commercial use" clause on the website data, as clearly training a LLM will be for commercial gain
the other option Is if the data / website owner has robots.txt in place with the relevant entries for a AI Scraper bot and they can show in logs the bot accessing and ignoring the entries in robots.txt, although many argue robots.txt is an informal agreement that reputable sites like google etc honour but that it has no legal status, so likely to fail unless the company scraping the site and ignoring the robots.txt file says in their website / terms somewhere they will honour it
but the biggest problem is that due to the tokenisation process once data is ingested there Is no way to remove it from an LLM