Re: Two questions for the price of one
> Just because something's been reorganized and turned into floats doesn't mean the original data is not there.
If it were "reorganized and turned into floats", that would be correct. However, that's not what happens during model training.
To start with, the floats don't come from the data ingested, they are already there. Model training starts with the full-sized model, only it's weights (the massive amount of floats) are initialized to random values in the range 0.0 to 1.0 Training merely adjusts the weights of the models.
Next, the amount of data ingested has nothing to do with the model size. If my model is 5GiB in size, it will be 5GiB after 1 sample, 1000 samples, or 10E12 samples have been ingestested. So by application of the laws of entropy, the model is incapable of storing the ingested data. In a storage device, the amount of information stored is directly correlated to the amount of space needed to store it. Even if I had a 1:1000 compression, I would still need 42x more storage if I want to store 42x the information.
> They have been known to do so
Yes, they have. In most experiments showcasing that, those were very few, very specific cases, that were only found because people looked at the training data, picked samples they knew occured in many copies in the dataset, and then prompting for exactly that data.
And even then the output wasn't exactly the same as the sample data.
This is the result of overfitting. and it is something that methodology in ML actively tries do avoid, because it degrades model performance by negatively impacting the models ability to generalize.
Does that mean the model stores the data? No. It means the model is biased towards certain patterns more than it is to others.