All for analogies, but can we bit a bit accurate (or at least explain our analogy)?
> Model training might be fairly described as a process of encoding the whole work/s, rather like jpeg encoding a copyright artwork
Uh, no, sorry, that is taking analogies too far. And it most certainly isn't a "fair" description.
> The fact that it produces imperfect copies of it's source is an encoding limitation, the same as other highly compressed copying and storage methods.
Bad analogy. The lossy nature of JPEG is well-defined in the quantisation step and even the worst over-compressed JPEG still spits out a recognisable copy of the *whole*. If you can suppress the ringing, a lossy image compressor will literally spit out a copy "as though you were standing x metres away" and can not resolve the high frequency components. Still designed with the sole purpose of representing the whole.
Note that we only apply lossy compression (of a form arguably similar to JPEG...) to audio and visual, not text. "Strip out the high frequency components" from text and you get gibberish, especially when trying to compile the results.
> The model itself is a copyright violation
I have argued before (and it was not well received, sob) that the model *might* be compared to the Huffman tree you can find in many compressors: as you traverse the tree, you hit a leaf node and spit out the tiny (just a few characters, maybe a word) on that leaf. The same tree is used (in this analogy) and traversed a *lot* to finally spit out a decent chunk of material. In the decompressor, it is the input bitstream that causes a specific traversal - one bitstream outputs "Moby Dick", another " Life of Brian" - and does so reliably and repeatedly.
It is those input bitstreams, the traversal pattern, that are the copyright violation. Not the Hufman tree.
In the analogous traversal of the Copilot (or other language model), the model itself should be as innocent as the simple tree (just a ot bigger and mor graphy than treey). So it must be what is guiding the traversal that is to blame, much as we did the bitstream above? But the traversal process is stochastic: until you finish it you have no idea precisely what you will get out. Unlike the compression example.
UNLESS, of course, in Copilot "the model" is a really crap model and the whole thing is not behaving like a "well behaved" deep learning model ought to and Copilot is all just a fake.