@John Brown (no body) ”A picture of a cow in a field, labelled as a picture of a cow in a field, but with subtle changes not normally visible to a human such that when the AI sucks it in an analyses it it "sees" a leather purse in a field and then "thinks" it's got a correct identification of a cow.”
I read the paper referenced in the article and I would say it’s a fair analogy, the AI “sees" a leather purse. Because the AI would been shown an image of a leather purse. The image actually gets encoded, and the Diffusion model (AI) gets given that. But the encoding process (analyses) due to the changes to the image generate the encoding for an image of a leather purse.
These text to image models, Stable Diffusion and DALLE-2 etc are a variant of diffusion models called Latent diffusion model (LDM). LDMs have an encoder for input data and a decoder for output data. The text to image models uses a variational autoencoder to provide a neural net to encode and a second neural net to decode.
So, to paraphrase the paper using variational autoencoders latent diffusion converts images from pixel space into a latent feature space. Models then perform the diffusion process in the lower-dimensional image feature space.
The researchers were able to poison a text prompt by changing an image labelled with one text prompt so the encoder would produce the same encoded values as an image labelled with another text prompt. The image in pixel space to a human still matches the label, but the image in feature space which is what these models train on matches an image with another label.
E.G. they were able to poison the dog prompt to produce cat images.