back to article It is possible to extract copies of images used to train generative AI models

Generative AI models can memorize images from their training data, possibly allowing users to extract private copyrighted data, according to research. Tools like DALL-E, Stable Diffusion, and Midjourney are trained on billions of images scraped from the internet, including data protected by copyright like artwork and logos. …

  1. steviebuk Silver badge

    This is why

    The Aussie satire magazine (can't remember its name) has stuck itself behind a paywall now as it no longer wants these AI models like ChatGPT from sucking up and exploiting their free info.

    Unfortunately the likes on Microsoft and Google will buy into this so the little people that have their images robbed will be able to do nothing about it, but the big corps will continue to profit off everyone elses work.

    1. Sora2566 Bronze badge

      Re: This is why

      You're thinking of the Chaser: https://chaser.com.au/news/

  2. Natalie Gritpants Jr

    AI shouldn't get the benefit of fair use as it isn't human. It's not capable of satire, parody, homage or any of the other human things that benefit from fair use. It doesn't memorize stuff either, it plainly stores all of its training data as is shown by this technique.

    1. Oglethorpe
      Facepalm

      Re: plainly stores all of its training data

      0.026% and 2.3% somehow means "all"?

      1. Michael Wojcik Silver badge

        Re: plainly stores all of its training data

        Right. The models are overfitting, but only in a very small fraction of cases – though that's still a large absolute number of cases, and I agree raises significant copyright and ethical issues.

        Eliminating duplicates and similar measures seem like band-aids to me. Adding some random noise to each image in the training data would probably help more – maybe even using duplicates of images with different noise, or distorting them using a more-sophisticated mechanism such as using a different model (e.g. the previous generation) to combine source images into a corpus of derived images, which you use to train the final model.

        But the more pressing need is the legal and ethical work to reach some consensus between creators and IP owners on the one hand, and model builders on the other, on what images can be used for training in the first place, with what restrictions, and with what compensation.

        1. FeepingCreature

          Re: plainly stores all of its training data

          Maybe a process like: train the network on the dataset; generate lots of random images, more than the original data; train on the random images only and then lightly untrain (negative lr) on the original dataset?

          So the network memorizes the "impressions" of the original set but avoids too-close matches. Though I'm not sure if the random-images part adds anything, since you're training the network "with itself".

    2. that one in the corner Silver badge

      > it plainly stores all of its training data

      And magically increased my ADSL speed when I downloaded Stable Diffusion last week, so that it could include the copy of all the training data! Oh, and it has also glued a few extra SSDs into the PC to hold it all!

      What a bargain!

      Shame it didn't upgrade the GPU as well, that is probably in the next release.

      /s

  3. ChoHag Silver badge

    > yes, memorization happens; but it is very rare

    It's just the tip!

  4. Anonymous Coward
    Anonymous Coward

    Obligatory xkcd

    https://xkcd.com/2169/

  5. Oglethorpe

    Hindsight

    Without a known source image, it doesn't seem possible to objectively determine whether a generated image is an actual extraction or an original synthesis. Please do correct me if I'm misunderstanding.

  6. Glenn Amspaugh

    Wait, wait, wait.

    You're telling me that if a picture of X is fed into an ai training glob, and someone enters 'picture of X' it will generate a picture of X?!!

    Brilliant!

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like