Re: "Stored in a retrieval system"
"Similarly, there's no simple way of taking a model and cracking it open to get copies of its training data. Some of the data isn't there, and what is there isn't stored in any convenient way. That isn't a guarantee that it's not present."
Ah but see, you are calling things similar that are actually completely different.
In one case, with the compiler, while parts of the source code, ie. variable names and the like, cannot be recovered, the behavioral description of the program contained in the source code can by definition be recovered in its entirety. If only small isolated fragments could be recovered, then the program wouldn't be the compiled version of the source code; what it means to be the compiled version is that the behavior described by the source in regard to the program and the behavior of the machine code are entirely identical.
In the other case, while fragments can possibly be recovered from a network, what you're asserting is "you can't prove that it's not in there." That's not at all the case with a program - we can prove it's in there, it's just not in a convenient format.
So you're talking about "we know it's the same" vs "we don't know it's not the same". Except of course we do, because as I noted in the other comment, the network simply does not have the space to store more than a thousandth of its source data. So not only are the things you're asserting very different, but we also know that the thing you're asserting about DL models cannot be true for any functioning network in anything more than a fraction of possible cases.
And while we're at it:
> Getting code and compiling it is copyright infringement even if I never give it away.
No it isn't. Licenses specifically govern redistribution. You can in fact do with free/open-source code whatever you want on your own computer. You may have broken licenses acquiring it in the first place, but that's nothing to do with compiling it, and all the material LLMs are trained on were scraped on the open internet.