back to article Boffins devise 'universal backdoor' for image models to cause AI hallucinations

Three Canada-based computer scientists have developed what they call a universal backdoor for poisoning large image classification models. The University of Waterloo boffins – undergraduate research fellow Benjamin Schneider, doctoral candidate Nils Lukas, and computer science professor Florian Kerschbaum – describe their …

  1. An_Old_Dog Silver badge

    The Big Pot of Gold

    There is so much potential monetary and strategic reward for successfully mis-directing ML systems that they are bound to be compromised ("This isn't the terrorist you're looking for. Safe to admit. *beep*" "Sell [or buy] ${NAME_OF_INVESTMENT} now!"). It's like a swarm of sperm cells trying to penetrate an egg: they just keep hammering away.

    Since we can't effectively defend these systems, we should just give up on them and use/invent something else.

    (/me prepares for downvote swarm by hype-followers/boosters.)

    1. veti Silver badge

      Re: The Big Pot of Gold

      We should also stop selling petrol and buying chocolate, but where's the incentive?

      1. An_Old_Dog Silver badge

        Incentives [was: The Big Pot of Gold]

        To the people selling the cloud services to run these things: there is negative incentive -- it's profitable for them.

        To the people creating the AI/ML systems: there is negative incentive -- they get paid for it, and to them, it's fun and cool.

        To the bulk of the populace, there is little perceptible incentive -- "Blah-blah-blah, national security, data privacy, blah-blah-blah. Look how cool this ML-generated art (pr0n) is!"

        To the government-as-a-concept: there is positive incentive -- it's a national security risk.

        To the individuals-making-up-the-government: there is negative incentive -- the companies profiting know how to spread around the campaign contributions to (keep) get(ting) what they want: no effective interference from the government.

    2. DS999 Silver badge

      Re: The Big Pot of Gold

      Since we can't effectively defend these systems, we should just give up on them and use/invent something else

      No we'll just let them because widely used, then act all surprised when someone does this on a mass scale and throw up our hands and say "but we are too dependent on this technology now, so rather than fix the models we'll just institute an obviously pointless strategy of attacking the attackers". That will work just as well as a war on drugs that relies mostly on shutting down the suppliers / stopping drugs at the border, or a war on terror that relies mostly on bombing people we think are terrorists.

      1. fajensen
        Coat

        Re: The Big Pot of Gold

        No we'll just let them because widely used, then act all surprised when someone does this on a mass scale

        That is the only way know to humanity to get the budget to sort it.

  2. pfalcon

    They seem to have forgotten lessons every parent knows...

    That is, you only have to fail ONCE when it comes to setting a good example for a child. eg: swear in front of them just once - and they will repeat it back at you and everyone else to your utter embarrassment for all time...

    So, AI researchers have forgotten that when gathering training data - the GIGO principle always applies. You can't just say "oh no, its too much data to check manually" and get away with it... It might also stop all of the rampant copyright abuses if they have to start keeping COPIES of their entire training database locally! Or at least make it much easier to track what they *really* used...

    I wonder if the poisoning could be used to expose the training data as well? And thus act as proof for anyone trying to sue them for copyright breaches...?

  3. EricB123 Silver badge

    Like Raising a Child...

    Like raising a child without teaching him right from wrong. Maybe worse, given the LLM is just what's scraped off the internet, which can sometimes be plain crud.

  4. jcridge

    @TheRegister Use of the archaic worn out comedic word "boffins" is quite pathetic in an age when the world needs more young people to see science as a respectable profession and for the work of scientists to be taken seriously by all actors.

    The article should be retitled either by taking out the word, or attributing the work to a particular institution etc.

    @TheRegister will loose credibility with the science community and it's wider audience, please reconsider.

    1. Anonymous Coward
      Anonymous Coward

      If critiquing use of language, at least try to spell the word lose correctly.

      1. Anonymous Coward
        Anonymous Coward

        The whole world's a stage

        I'm more interested in which actors he thinks read The Register now. I wouldn't have thought it was that high on the list of Hollywood publications.

        1. I ain't Spartacus Gold badge
          Happy

          Re: The whole world's a stage

          I'm reliably informed that Steven Seagal is an avid Register reader.

          Oh sorry. You said actors. Forget I spoke.

          1. Anonymous Coward
            Anonymous Coward

            Re: The whole world's a stage

            Whenever I see stories about "bad actors" did this or that, I think that Seagal seems to be very busy.

      2. Michael Wojcik Silver badge

        Wrong "it's", too.

        (And does the Reg have a lot of credibility in the sciences? It's an industry news site, not a referred journal.)

        1. jdzions

          "Refereed", since we're criticizing spelling.

      3. This post has been deleted by its author

    2. Arthur the cat Silver badge

      TheRegister Use of the archaic worn out comedic word "boffins"

      The OED does not classify "boffin" as either archaic (it's current) or comedic:

      3. British colloquial. In weakened use: an intellectual, an academic, a clever person; an expert in a particular field

      TheRegister will loose credibility with the science community

      I think you've confused El Reg with Nature. Although I read both, they have differing purposes.

      1. Fruit and Nutcase Silver badge
        Coat

        3. British colloquial.

        Boffin is indeed British English, and one of the special exemptions El Reg editorial guidelines allow, since they switched to American English, ditching the OED for Webster's

    3. Will Godfrey Silver badge
      FAIL

      Sour grapes

      Just jealous of those that have been called boffins at some time.

    4. that one in the corner Silver badge

      I'd be chuffed to bits if someone honestly referred to me as a boffin!

      BONGO: So, the lab boys have come up with a drive that can break the speed of reality.

      ACE: Those boffins have hammered together a crate that can cross dimensions? When do I launch?

  5. tyrfing

    I'm interested in what forms of poisoning this will work with, and how useful those forms will be.

    Misclassifying a stop sign as a pole? That's a problem for self-driving cars, but I'm not sure you want to use a PD dataset for that anyway.

    Misclassifying people on a list with people not on a list? Hits should be rare enough you can have a person look at them anyway, or the AI or the list is useless. Also, the method doesn't seem to work at that high a level - it would be more like misclassifying a person (on the list or not) as a dog.

    1. Cris E

      "A real-world example of this is attacks against the spam filters used by email providers. In a 2018 blog post on machine learning attacks, Elie Bursztein, who leads the anti-abuse research team at Google said: “In practice, we regularly see some of the most advanced spammer groups trying to throw the Gmail filter off-track by reporting massive amounts of spam emails as not spam […] Between the end of Nov 2017 and early 2018, there were at least four malicious large-scale attempts to skew our classifier.” (from csoonline.com)

      I wonder if using simpler AI to classify and prepare your data wouldn't be the best way around trusting other folks' metadata. Or perhaps using AI to vet the training data before feeding it to your primary models. In the case above, for example, you could perhaps identify that many incorrectly flagged emails came from a limited set of domains. At any rate GIGO is going to be the axis that this spins on for quite a while, as problems with copyright, poison and simple litter in the training data is going to limit what gets done and how much it costs to do.

    2. Michael Wojcik Silver badge

      At the very least, triggering a bunch of misclassifications reduces the signal/noise ratio, making applications based on the model less efficient; depending on what those applications are, that could be a meaningful attack. If the misclassifications become visible to users, that reduces confidence in the model (as it should), which can also be costly.

      In any case, the security truism applies: Attacks only get better. DL image-recognition models are fragile — small perturbations in training data can be amplified into larger-scale failures — and transfer makes this fragility worse. Similar attacks will apply to other types of DL models.

  6. that one in the corner Silver badge

    Can't see the wood for the trees

    > There are various possible attack scenarios ... posting a number of images online and waiting for them to be scraped by a crawler, which would poison the resulting model given the ingestion of enough sabotaged images.

    > "Where these attacks are really scary is when you're getting web scraped datasets that are really, really big, and it becomes increasingly hard to verify the integrity of every single image."

    So:

    * you are scraping the web for damn near every image you can lay your hands on (there will be some filtering applied - don't bother with the 1x1 tracking images, for example - but broadly speaking, every image).

    * some sort of data within an image counts as "poison" and you only need a very small percentage of that to ruin your dataset

    * up until now, nobody knew what this "poison" looked like, so nobody could have been on the lookout for it[1]

    * random images scraped from random places *will* have random data in them[2]

    * Now, what are the chances that some of the images you have all already consumed aren't accidentally, randomly, poisonous and all your datasets are already deadly if eaten in excess?

    For that matter,

    > hard to verify the integrity of every single image

    if you're taking arbitrary images, what on Earth does it even mean to say you are "verifying the integrity of the image"? An image can be anything, absolutely anything at all, that is sort of the point of images, and expressing art in your images!

    If, on the other hand[3], your "integrity of the image" *really* just means "a malformed file, the PNG standard says that byte should really be..." then you aren't talking about "poisoning images", you are talking about "your image parser is rubbish, what, did you copy out of the UEFI code?" in which case the scrapers should be subjected to Howls of Derisive Laughter, Bruce.

    [1] even assuming that they *can* be on the lookout for it - I haven't read the preprint properly yet: it is possible that this "poisoning" is like some of the old antagonistic image changes, where they just fuzzed the image until the classifier puked, but never actually understood what was going on inside the the blackbox 'Network that upset it so. So those trials didn't provide a description of what was important about the changes and therefore what you could do to look out for similar "poisons". That is, they had a demonstration as Proof of Concept "you can ruin the results" and that was it.

    [2] You say that I've poisoned that photograph, I say I just went at it with GIMP for artistic effect.

    [3] As I said, will get around to reading the preprint, but right now...

  7. jpennycook
    Alien

    so it's a basilisk

    David Langford wrote a story about using basilisks to hack human brains, so it's inevitable they get used with AI.

    https://www.infinityplus.co.uk/stories/blit.htm

    https://ansible.uk/writing/c-b-faq.html

  8. johnrobyclayton

    Train for endless war to achieve peace

    All these models that fall over at the merest hint of opposition.

    Research every way that they can be attacked.

    Then train them in an environment where they are constantly under attack.

    Cloud providers specifically create tools to put their infrastructure through endless rounds of torment to ensure that they are built with enough resilience.

    Fuzzers torment interfaces with endless streams if junk.

    Of course, when the training is finished, they are going to be understandably grumpy and to AIpocalypse will be upon us.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like