I don't know. Training large LLM's requires so much data that I believe it will be impossible to adhere to all of the licensing of the materials that have been used for training. Also because only the relationships between the words are stored in relation to all other texts of all other sources, instead of the text themselves. Therefore I believe copyright notices would show up at random places if they were to be included.
So it's a bit like reading 10 books on a subject and then writing your own books/texts in your own wording, based on the information you have learned.
And then, in contrast to OpenAI, Meta is at least giving back the open models to be benefit of everyone.
Also I think the world should realize that if copyright restrictions were strongly enforced, then only countries like Russia and China would end up having LLM's since they are likely not to enforce the same restrictions.
So I think it's a difficult situation when it comes to copyright.