So we need DCMA 2.0?
I would take a raincheck on that. DCMA 1.0 as for as I can see has only benefited large US corporations while making large inroads into the Public Domain, even challenging the validity of the concept itself.
I could see from a lot of ill considered and rash ideas from those affected (creators, copyright owners) being shanghaied by the "perpetrators" to produce a model where everything has a copyright (IP) owner or a deemed owner which effectively extinguishes the "Public Domain."
There is a fundamental conceptual problem here. If I were to consult the vast online open access resources available for programming or network technology, I would have internalized that (often copyrighted) content and I then might proceed to obtaining paid employment using that knowledge - this is considered fair dealing. [In my case the dead tree network was my source of this information but the principles are not too dissimilar.]
The conceptual problem I see is: how is my reading and internalizing a web page manifestly different from a LLM being trained with the same page? In neither case are the page's contents reproduced or stored the LLM or my brain (I don't have an eidetic memory.) To the extent that the source content could be reproduced from within an LLM I would guess would be no more than is permitted by the far dealing provisions of the Copyright Act.
When content is published the creator, owner and publisher can (or should be able to) jointly or severally specifiy permitted access and subsequent use of the published material. Any breach or dispute should have a simple, low cost process available to the parties to obtain timely remedy (with very limited recourse to judicial appeal procedures.)
Currently I don't think there are any real sanctions available to site owners whose site has been indexed by a web crawler that ignored their Robots.txt.
That would be a breach of a publisher's permitted access and permitted use (indexing.)
The content's licence normally specifies the owner and/or creator's restrictions and permissions.
A fairly simple example I would consider is where I train a LLM on the entire Public Domain corpus of The Gutenberg Project say from an offline resource (eg their 2010 DVD.)
From my reading of Gutenberg's T&C I think I would not be in conflict with any of those provisions.
Posing rhetorical questions I would ask what moral or ethical lines will I have crossed at that point? And when I provide free, open access to my trained LLM? Finally when I place a paywall in front of my LLM?
Finally how does one legislate ethics and morality? Extant attempts are without exception cures disastrously worse than the disease.
Personally I would prefer this whole Al circus would disappear up its own arse taking its entire troupe of AI snake oil peddling clowns with it.