Re: Analyzing isn't copying
We don't have a great idea of what this model contains within it. The point is that it is not purely an analysis or a copying program, but a combination of the two. You can get it to produce verbatim quotes, including of copyrighted material. That has been covered in articles here and elsewhere; I'm sure you've seen them. That wouldn't have happened if it was purely a lossy analysis tool. What we don't know as well is exactly how much you can get it to quote, because it's set up in such a way that it will not print an entire book in one go.
The data it has read is radically altered from the original format, and some of it has likely been discarded. Neither proves that it is not a copying tool or violating copyright. If I write a program to mash up data into a form that's nearly unrecognizable, it doesn't prevent me from violating copyright if it can be used to reconstitute the data I don't have rights to. If I take a book and discard a few chapters, but then quote the rest, I have still violated copyright. A perfect example is if I take copyrighted music and run it through a lossy compression algorithm. Some of the original sound is no longer available in the file I created and will never be retrieved by someone who only has my file. The original strings of bytes aren't there either because the format is different. Yet, because it makes the same general impression to someone who listens to it, I am not off the hook.
Music also provides a good example of how even small uses of copyrighted data can be a problem. Some people have decided that they like certain sounds produced by other musicians, frequently drum beats. It would seem pretty easy to make more whenever you want, because you just have to get a drum (or basically any object that makes a nice sound) and hit it with something, but still they value them and integrate them into their songs. These beats are very short slices of audio, and they are not the original audio because, when they're reused, processing has to be done to remove other instrument sounds that are unwanted. If you choose to do this, keep around some money because you aren't allowed to do that for free. You have to license the sound from the person who created it, even despite its brevity, your changes, and the fact that your song is likely completely different from theirs. You used something original to them for your purposes, and so has OpenAI. It isn't as simple as determining whether it produces the entire original work. Even if it can't, it could still be violating.