Re: "Stored in a retrieval system"
The analogy is not exact, but it is closer than you're imagining. Decompilation is not as simple as it's painted. There are many ways to take a machine code file and get some source code that, when you run it through a compiler, gives you the same or similar machine code. Sometimes, even that fails. However, you tend not to get source code that you want to read, let alone modify and put back into production. There are some languages where that's different, and there are binaries that left in a lot of extra data that makes this easier, but since that data is irrelevant to the functioning of the program, we can't count on it being there.
Similarly, there's no simple way of taking a model and cracking it open to get copies of its training data. Some of the data isn't there, and what is there isn't stored in any convenient way. That isn't a guarantee that it's not present. In many cases, LLMs quote from their training data on request. That's more likely to happen with a large model than a small one, not really a surprise. Large models also tend to be the more useful ones, though. Even if that quoting is not exact, this doesn't really matter. A poorly OCRed copy of something I don't have the right to copy is still infringement.
You don't need simple byte-for-byte recovery to violate copyright. In fact, you don't need to reproduce at all, and that's where the analogy makes more sense. Getting code and compiling it is copyright infringement even if I never give it away. Chances are that if I have those things, you're not likely to catch me, but the violation doesn't cease to exist just because I got away with it.