"Used correctly, identifying code collisions could be used to break copyright, trade secrets and patents."
Or you might find that, for a complex system, code collisions are not at the level you're thinking they should be. Oh, it looks like that when you ask a programming-trained model to answer a simple, introductory question. There are only a few ways to write a function that are sensible, that is to say not playing code bowling, when that function does one basic thing. Here's a parallel.
There are only a few ways to express the sentence "She put the box on the table". You can use synonyms, for example that she placed or deposited the box, or you can order the words a bit differently (the box was placed on the table by her), but if you do anything too extreme, you end up with a sentence like "The furniture item known as a table was the recipient of a placement action involving the box, and she was the initiator" and you sound ridiculous. When the sentence is longer, though, you quickly expand the number of valid options to express it. No matter how complex the sentence, if it sounds natural, you can break that sentence into components that have been seen before, but the sentence as a whole is new, and the organization of multiple such sentences expands that. The same is true of programs; as they get longer and more sophisticated, the chances of finding a meaningful collision decrease and you'll be reduced to arguing over idioms or common constructs.
You're hoping that collisions will be common enough to invalidate protections you don't like, but it is unlikely you'll find them, and if you did, it likely wouldn't get the result you want. If it's really easy to recreate code, then why not recreate it yourself. If you can demonstrate that you reproduced the code on your own, then copyright is already limited. People just won't believe that you did if you show up with identical copies because they know that really doing that from scratch would be hard, so you probably didn't do that. LLMs aren't doing that either, no matter how many times their authors claim that the training data is not stored in them. As text-based ones have demonstrated repeatedly, a lot of it is in fact still in there verbatim, even if it has been reorganized. Ones that must output valid* code make it harder to get it to regurgitate large chunks, but it will and has happened. Copyright exists because it is not trivial to generate the work, even if every individual component looks simple afterward.
* Sort of, no guarantees.