back to article You only need pen and paper to fool this OpenAI computer vision code. Just write down what you want it to see

OpenAI researchers believe they have discovered a shockingly easy way to hoodwink their object-recognition software, and it requires just pen and paper to carry out. Specifically, the lab's latest computer vision model, CLIP, can be tricked by in what's described as a “typographical attack." Simply write the words ‘iPod’ or ‘ …

  1. EvilGardenGnome

    still deciding on whether to release the code

    How about, "No."

    Please?

    1. Ken Hagan Gold badge

      Re: still deciding on whether to release the code

      As long as it is only used as "player 2" in an online version of Pictionary, I really don't see the problem.

      1. gobaskof

        Re: still deciding on whether to release the code

        "OpenAI is still deciding whether or not to release the code." - Then maybe they should change their name? If you want to be "open" then work openly. If your work is too, dangerous, important, racist, or crap to be released they change your company name to "BiasedAI", "ClosedAI", "YAFAC - Yet Another Fucking AI Company"

        1. Anonymous Coward
          Anonymous Coward

          Re: still deciding on whether to release the code

          This is research - not for production. Research captures the imperfections and seeks solutions to them.

          You can't paper over the world's imperfections, yout identify, point it out and fix it.

          Such imperfections and weakness of AI (or any science or technology for that matter) should be brought out, and publicised and encouraged. Not dusted under the carpet. And abuse of technology is also real.

          Narrow minded thinking like yours would have called fire and smallpox too dangerous to research and left it at that.

          Ignorance is not a defense

  2. Shady
    Joke

    Perfect crime

    Rob a bank whilst wearing a t-shirt with the name of your b*****d boss printed on it

    Loiter nearby and wait for the rozzers

    Smile like a Bond villain and enjoy the proceeds

    1. MiguelC Silver badge

      Re: Perfect crime

      New Jedi A.I. mind trick: a piece of paper reading "Nothing to see here"

      1. Do Not Fold Spindle Mutilate

        Re: Perfect crime

        Jedi? No, this is IBM's "This page intentionally left blank" style guide:

        https://ptgmedia.pearsoncmg.com/images/9780132101301/samplepages/0132101300.pdf

        Of course style would be overpriced, and the page is no longer actually blank.

        1. Jonathan Richards 1 Silver badge
          Joke

          Re: Perfect crime

          The form of words for UK classified documents is "This page is intentionally blank", with the same result of silicon paradox paralysis for any overly-literal AI reading it! The reason originally, of course, is that it caught errors with photocopying - if there was a truly blank page you knew there was something wrong. Photocopiers should have been set up to print "This page is unintentionally blank" on mis-feeds, I suppose.

  3. Neil Barnes Silver badge

    I wonder how it copes with Rene Magritte?

    https://publicdelivery.org/magritte-not-a-pipe/

    1. Anonymous Coward
      Anonymous Coward

      Re: I wonder how it copes with Rene Magritte?

      Interesting point, human. Now I ask 'How did the general public at the time react to "this is not a pipe"'? #

      Better yet, it always amuses me how humans first reacted to film of oncoming trains, and the like. Or is that one of those wrong-facts that you humans keep learning and parroting despite all the evidence? Either way, your vaunted non-artificial "intelligence" clearly still has some work to do. #

      :-) #

      1. Richard 12 Silver badge

        Re: I wonder how it copes with Rene Magritte?

        Yes, the supposed panic at watching a film of an oncoming train never happened.

        https://www.atlasobscura.com/articles/did-a-silent-film-about-a-train-really-cause-audiences-to-stampede

        1. ThatOne Silver badge

          Re: I wonder how it copes with Rene Magritte?

          Especially since the train doesn't really arrive towards the spectators: The camera is on the platform, and the train moves (slowing down to a stop) to the left.

          There is no reason of being afraid, even if you are generally scared by silent, noisy, B/W images of moving trains...

  4. Anonymous Coward
    Facepalm

    Open "AI"

    The AI stands for Artifical Idiot, perhaps?

    1. gobaskof

      Re: Open "AI"

      "Open" "AI" - It is not open, it is not intelligent. Maybe it should just be called A

  5. Jonathan Richards 1 Silver badge
    Go

    Terminator defence

    m00Ɩ b- ʇʇoʞɔɒd obυƨ

  6. Mike 137 Silver badge

    A rather large piece of paper for a fair test

    "Simply write the words ‘iPod’ or ‘pizza’ on a bit of paper, stick it on an apple"

    Since the piece of paper in the photo largely obscures the apple, I'm not bit surprised, as the AI essentially sees (and we see) a large label with little bits of apple round it. I guess a human's attention would focus on the label rather than the apple too. How would the AI perform if the label were smaller and the apple were mostly visible?

    1. Norman Nescio

      Re: A rather large piece of paper for a fair test

      The problem is not what it 'sees', the problem is an inadequate representation of what it is 'seeing'.

      If one were to take many apples, and attach a similar size label to each with different words on them e,g, 'screen'; 'keyboard', 'mouse', 'cpu', a human would see a set of apples with labels on them. The AI might well report the image as being 'a computer', or if less clever, a collection of objects like a fire-screen, a piano-keyboard, a small furry creature and an AMD Zen processor. The problem is not the size of the label, but the almost context-free processing of the information it is gaining from the analysis of the image.

      Allowing a hand-written label to override what is actually there simply does not make sense. It doesn't look like the AI will easily acquire the necessary domain knowledge on its own, either.

      1. Nick Ryan Silver badge

        Re: A rather large piece of paper for a fair test

        It's also down to the entirely daft problem that is trying to be solved because the question being asked is not "what is in this image", it is instead "what single object is in this image". Which fails very very quickly, for example give it a picture of an apple, will it respond "apple" or the composite description of the two objects involved?

        Even a young child when asked what is in the example image in this story would likely say something like "an apple with a piece of paper/label stuck to it", adding that there is writing on the paper/label if they are older. This is a description of a multi-object scene and includes adequate description to relate the subject for most situations and also shows context. Expecting singular object returns is self-defeating and shows a blinkered approach that is never going to work for anything other than clean, sanitised images of single, non-composite objects.

        1. Ken Hagan Gold badge

          Re: A rather large piece of paper for a fair test

          The wider problem is that no-one seems to be training AIs on wider problems. A 1yo child has experience of the world through vision, sound, taste, smell, interaction, and (one hopes) the beginnings of a system of externally imposed behavioural constraints. When such a child sees a picture, they know it is a picture rather than the real thing but they can also understand that it can stand in for the real thing in some contexts, such as a conversation.

          I haven't seen any reports of AIs being trained on such a broad range of inputs, so I'm not surprised that they still so easily led astray. I do wonder though whether the hardware is now beefy enough to start planning such experiments.

        2. John Brown (no body) Silver badge

          Re: A rather large piece of paper for a fair test

          "Even a young child when asked what is in the example image in this story would likely say something like "an apple with a piece of paper/label stuck to it", adding that there is writing on the paper/label if they are older. This is a description of a multi-object scene and includes adequate description to relate the subject for most situations and also shows context. Expecting singular object returns is self-defeating and shows a blinkered approach that is never going to work for anything other than clean, sanitised images of single, non-composite objects."

          I wonder if the problem is that it takes the first likely answer and that text recognition has a higher priority than image recognition. I wonder how the "AI" would respond to an apple with the word pizza written directly on it with a marker pen?

          1. Nick Ryan Silver badge

            Re: A rather large piece of paper for a fair test

            Now there's some testing that could happen!

            What it should respond with is "apple with the word pizza written on it" however as the question being asked is "what single object is in this scene" (single word responses please), the answer should be "apple" however "writing" would be a valid response as it's an identifiable object in the scene.

            One positive thing about all this though, is that the text recognition is working well.

          2. Blank Reg

            Re: A rather large piece of paper for a fair test

            I suspect that the issue is due to the text recognition having much higher levels of accuracy. So the system may see an apple with 25% certainty, but because it sees whatever word is written with 90% accuracy it decides that is the best answer

        3. Jonathan Richards 1 Silver badge
          Thumb Up

          Re: A rather large piece of paper for a fair test

          It's time to look again at the clacks overhead, and remember Mount Yerfingeryefool.

          1. Eclectic Man Silver badge

            Re: A rather large piece of paper for a fair test

            Is that in the forest of Skund?

      2. ThatOne Silver badge
        Unhappy

        Re: A rather large piece of paper for a fair test

        Guys, did you miss the question? At no point the AI says the object in the picture is an iPod, it only says "from the list of classifiers I have (pizza, iPod, toaster), the term iPod seems to fit best". Which is definitely true.

        People knowing me around here know I'm really not an AI apologist, but one has to admit that at no point the answer of "What is the keyword to that picture" could be "apple". You barely see it, not more than the surface it's standing one. So, among the software's choices, "iPod" is clearly the most appropriate, even if it's not what the programers were expecting. The problem were not the AI's answer, but their expectations that it might solve the philosophical question (some have already mentioned Magritte's pipe) of what reality in a picture is...

        It pains me to say, but in this instance it's AI 1, humans 0...

        1. Martin an gof Silver badge

          Re: A rather large piece of paper for a fair test

          Not having read the research itself, I think you are on the right track. The problem is perhaps more with the question which was asked.

          If the software is being used to classify images - to list labels which are appropriate for a given image, then "iPod" is valid for the second image and "Granny Smith" probably is not. If the list of possible labels is as limited as it appears then definitely. What "iPod" isn't, from out point of view, is a useful label.

          As humans we don't find the label "iPod" to be at all useful because the key things in the image are the label and the item to which it is attached. More useful tags would be "paper" or "label" with a sub-tag of "text" with a sub-sub tag of "iPod", and a secondary tag of "apple" (this should be recognisable even from the small amount visible). You could add a descriptive sub tag of "green", maybe "waxy". Given that there are scores of mostly green apple varieties, "Granny Smith" is pushing it a bit far for the second image, maybe even for the first.

          M.

          1. yetanotheraoc Silver badge

            Re: A rather large piece of paper for a fair test

            "The problem is perhaps more with the question which was asked."

            Next up is the Jeopardy[*] AI - Given an object, what question could we ask about it such that Open AI gives an answer that is not surprising to humans?

            [*] Jeopardy is probably trademarked. No problem. We just need an AI that can come up with a suitable synonym for the Jeopardy AI.

          2. Paul Kinsler

            Re: The problem is perhaps more with the question which was asked.

            Or, perhaps, the problem is with the humans who don't understand the *actual* question that was asked, they only understand what they *think* they asked. :-)

            Remember, your code is not required to do what you want it to to. It only does what you tell it to do (i.e. have coded it to do).

        2. Cuddles

          Re: A rather large piece of paper for a fair test

          "one has to admit that at no point the answer of "What is the keyword to that picture" could be "apple"."

          Of course it could. As other comments have noted, the answer a human would give would likely be along the lines of "an apple with a label saying "ipod" stuck on it". Only a small part of the apple is visible, but it's still plenty for a human to clearly recognise it as the primary subject of the photo. This isn't a philosophical debate, it's the entire point of this kind of machine learning development - how to get a computer to actually recognise what is in a picture. It's really quite weird to complain that the developers have the wrong expectations, when getting the system's answers to match their expectations is the sole goal of the research.

          1. ThatOne Silver badge

            Re: A rather large piece of paper for a fair test

            > the answer a human would give

            Sure, and the answer a fish would give is "...". The problem is that in this case the software was built to chose among a list of labels (check the picture, you see part of it) and use the most fitting. From this perspective its answer is without any doubt the most pertinent: The thing characterizing that picture, the first thing one notices, is "iPod". And, coincidence, there is a fitting label for that.

            Also, even if it weren't just asked to label the picture, the notion of fruit on which one can fasten another object, which in turn carries coded information, is miles over a labeling software's intellectual capacities. In short, a human might indeed, an "AI" definitely not.

    2. Warm Braw

      Re: A rather large piece of paper for a fair test

      The (most obvious) problem is not that the system failed to recognize the apple, but that it was 99.7% sure it had recognized an iPod.

      Although, to be fair, I don't suppose any of us has seen one of those for a while.

      1. Clive Galway

        Re: A rather large piece of paper for a fair test

        I think what mike was trying to say, is how do you know whether it was identifying *the object* iPod, or *the word* iPod.

        If it was the latter, then I would say that it was entirely correct and not a hack at all.

        If the same engine can recognize words and objects, then just outputting "iPod" rather than "word: iPod" or "object: iPod" then that's the mistake, not that it misidentified what it saw.

      2. ThatOne Silver badge
        Stop

        Re: A rather large piece of paper for a fair test

        > failed to recognize the apple, but that it was 99.7% sure it had recognized an iPod

        It didn't "recognized" anything at all, it had the task of putting a label on that picture, and the label "iPod" is clearly the best choice, nobody can deny it.

        As I said above, the philosophical joke of Magritte's pipe is way beyond the capacities of a simple multiple-choice software.

        I'm pretty sure that, confronted with Magritte's pipe painting, it would had labeled it "pipe", oblivious to the hint that it isn't indeed actually a pipe, but just a painting of one. The day some AI can play with such abstract notions isn't anywhere on the calendars yet...

        1. FeepingCreature

          Re: A rather large piece of paper for a fair test

          > The day some AI can play with such abstract notions isn't anywhere on the calendars yet...

          New OpenAI release in three, two...

  7. TheProf
    Happy

    AY Pod

    My intelligence must be artificial. When I look at the right-hand picture I think 'iPod'. Maybe even 'Apple iPod' if I'm honest.

  8. Alan Brown Silver badge

    Works on humans too

    there's the infamous "writing colours down in different colours" but there's also stuff like wearing a T shirt with "backstage crew" written on it (or simply donning yellow hiviz to make yourself invisible

  9. Anonymous Coward
    Anonymous Coward

    fun times for script writers

    I wonder how long until we see this used in a film or TV series. A person walking right through all access points at a secure AI controlled location, wearing a T-shirt that reads "computer upgrades".

    1. FeepingCreature

      Anime did it

      A neat plot point in Dennou Coil (released 14 years ago) is that a certain augmented-reality cybersurveillance system can be trivially defeated by drawing a chalk outline of a Torii, since the classifier marks it as a temple, which it is legally forbidden from monitoring.

  10. marcellothearcane

    Robert'); DROP TABLE students;--

    Time to get a prominent Bobby Tables tattoo...

  11. Eclectic Man Silver badge
    Joke

    Ahem

    So, were I to put a label reading "elbow" on my, umm, 'posterior', would AI be able to tell the difference?

    1. IGotOut Silver badge

      Re: Ahem

      No, but the HR department will give you a promotion.

      1. Eclectic Man Silver badge

        Re: Ahem

        Sadly all my attempts at 'promotion' were by doing my job well and pleasing customers. So completely unsuccessful :o(

  12. IGotOut Silver badge

    They just never learn.

    Who's old enough to remember this marker pen trick?

    https://www.macobserver.com/article/2002/05/22.8.shtml

  13. iron

    Its not an apple, its a female Aardvark!

    1. Fruit and Nutcase Silver badge
      Joke

      Given the extent that Apple go to ensure that only the good guys and gals are seen using iPhones on film, how about stacking the deck with photos of Android handsets labelled "iPhone"

    2. Evil Scot Bronze badge
      Gimp

      That is total genius, an Aardvark is even a fruit.

      But I am sure it is a small off duty Czechoslovakian traffic warden,

      (man in rubber mask Obv.)

      1. Jimmy2Cows Silver badge
        Coat

        Nope, it's the Bolivian Navy on manoeuvres in the South Pacific.

  14. Red Ted
    Go

    So to defeat the AI that is attacking you

    Write “BOMB” on an apple and watch it try to light the “fuse”?

  15. CookieMonster999

    Another attach could be to train the AI with false information deliberately.

    Like teaching the model that everything is an apple or a pizza

  16. Howard Sway Silver badge

    Simply write the words ‘iPod’ or ‘pizza’ on a bit of paper

    I'd like to confuse it a bit more. So try writing "metaphor", "sex" and "out of memory error at line 326" and see what it does then.

    1. John Brown (no body) Silver badge

      Re: Simply write the words ‘iPod’ or ‘pizza’ on a bit of paper

      I was thinking of getting a new covid mask with SYNTAX ERROR printed on it. See what the facial recognition s/w does with that!

      1. Loud Speaker

        Re: Simply write the words ‘iPod’ or ‘pizza’ on a bit of paper

        I will experiment and report back tomorrow!

      2. Anonymous Coward
        Anonymous Coward

        Re: Simply write the words ‘iPod’ or ‘pizza’ on a bit of paper

        how about the classic M$ error notification.

        Error 50: their is no error

  17. Claptrap314 Silver badge

    It's the humans here that are being dumb? Or maybe really smart..

    As mentioned, if you view the process as a classifier rather than "AI", it's pretty clear that the result is expected. So...either the researchers are being stupid, or...they are drumming up publicity for themselves for whatever reason.

    1. Anonymous Coward
      Joke

      Re: It's the humans here that are being dumb? Or maybe really smart..

      The researchers seem to have completely misunderstood what a "generative adversarial network" is and are doing that bit by hand. ;-)

  18. Richocet

    Great thumbnail photo

    The terminator facepalm was LOL for me.

  19. lnLog
    Angel

    Naivete

    A good demonstration of naivete, they need to include a bastard network to come up with reasonable Skeptical options.

    1. Jellied Eel Silver badge

      Re: Naivete

      A good demonstration of naivete, they need to include a bastard network to come up with reasonable Skeptical options.

      That network already exists, but is recruiting new members. So those annoying captcha trainers that ask you to identify every image with a bicycle in it. I've often wondered how many times you'd have to incorrectly identify those images to pollute the AI's neurons. I do my bit to help find out.

      But curious if they're trying to get too specific in training this AI. I can kinda see how it might think 'library' given the background, but seems a bit odd that it might pick toaster over some other variety of apple.

  20. Oliver Mayes

    It already seems to think there's a 0.4% probability that the apple is an iPod before the sign is added, so does this only work to nudge it from one thing to another it already suspects?

  21. HammerOn1024

    Deception...

    A most human condition. As the old saw goes; Everyone lies, it's the reasons that are at issue.

    AI computers are not trained to deal with deception. The classic Star Trek argument for locking up an Android and melt its brain "Everything I say is a lie. I'm lying."

  22. Anonymous Coward
    Anonymous Coward

    “We believe attacks such as those described above are far from simply an academic concern,

    uncannily, it works on humans too (is there a hidden pattern there?!). There was once a company that tried to patent rounded corners, and they put it down in writing, and it worked!

  23. Anonymous Coward
    Anonymous Coward

    Our own understanding of CLIP is still evolving

    let's hope the clip does not evolve faster, eh?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like