back to article Object-recognition AI – the dumb program's idea of a smart program: How neural nets are really just looking at textures

Neural networks trained for object recognition tend to identify stuff based on their texture rather than shape, according to this latest research. That means take away or distort the texture of something, and the wheels fall off the software. Artificially intelligence may suck at, for instance, reading and writing, but it can …

  1. John Smith 19 Gold badge

    "It's fake smart."

    Possibly the most succinct description of this technology to date.

    "However, current state-of-the-art CNNs are very susceptible to random noise such as rain or snow in the real world, [which is] a problem for autonomous driving. "

    Well that's a bit of a b**ger if you've implemented you're autonomous vehicle system using them, isn't it?

    My instinct is that this should have been obvious from the mathematics underlying CNN's. Yet apparently no one picked it up.

    Or was it that people did, but hoped no one would notice?

    1. steelpillow Silver badge

      Re: "It's fake smart."

      "My instinct is that this should have been obvious from the mathematics underlying CNN's. Yet apparently no one picked it up.

      "Or was it that people did, but hoped no one would notice?"

      Oh, I think the average CNN researcher's public persona is too fake smart for them to have really picked up on this sort of thing. It will take an army of them several years to figure out the bleedin' obvious. Well, I give it 2-5 years before they grasp the scale of the chasm between current AI and general intelligence.

      Funny that, when I say "several years" like that it sounds overly unkind, yet when I suggest that we might see general intelligence in 5-10 years it seems overly optimistic. What we really need to do is to figure how an AI can tell its arse from its elbow - which is hard if we can't either.

      1. T. F. M. Reader

        Re: "It's fake smart."

        @steelpillow: "It will take an army of them several years to figure out the bleedin' obvious."

        Don't misunderestimate them: it will take a few years's worth of research grants and VC money to write a few versions of image recognition software and then painstakingly analyze what it is doing wrong. Note that what the software prioritizes should be recognized a priori by the people who actually program the priorities in (like texture before shape - that is not the "AI" part of the whole business), but never mind... Then new versions of software, better at distinguishing cats from elephants, will be written, and new tests will be devised and new experiments will be run... Grant/VC money will be provided as long as it is regarded as "strategic", which comes and goes every 20 years or so.

        Some 40 years from now the cat/elephant controversy will be licked and someone will ask whether AI image recognition can distinguish between a cat and a lioness... If we ever get there.

        Cynical? Moi?

      2. Doctor Syntax Silver badge

        Re: "It's fake smart."

        "yet when I suggest that we might see general intelligence in 5-10 years it seems overly optimistic."

        It is. It's the "five to" that's wrong. The normal estimate is 10 years and has been for decades.

        1. Robert Moore

          Re: "It's fake smart."

          "The normal estimate is 10 years and has been for decades."

          I expect AI will run all our Fusion reactors and flying cars. Real soon now.

    2. fajensen

      Re: "It's fake smart."

      Yet apparently no one picked it up.

      Those Negative-Nellie Experts will soon join the breadline under the label of "Not a team player"!.

      Because Markets: Markets want trillions in silly-money on flim-flam technologies that doesn't fulfil the stated purpose (although the technology does "Work" in the sense that it provides the narrative that attracts the silly-money, the latter being that which "Markets" care about).

      There is a Jihad on any form of expertise these days. The results are one spectacular failure after another. The solution is the hiring of more incompetence until the drooling retards running the show cannot even recognise failure any longer. This being so, Chris Grayling is perfectly positioned for a long and lucrative consulting career as Partner with McKinsey!

    3. Muscleguy

      Re: "It's fake smart."

      Mind you dense swirling snow is testing for human drivers. I live in Dundee Scotland and absent the last two winters (more severe in England) such challenges are common. Sensible drivers in those conditions slow down since their visual distance is much reduced.

      Note snow at night can be both easier and more difficult depending on the type of snow and its direction. Snow flurries you cannot see until they are just in front of your windscreen in the dark are nasty. You have to constantly never yourself not to flinch.

      A camera lens is subject to the same sorts of problems even if it works and reacts faster. If the snow is the wrong sort from the wrong direction it might even obscure all or part of the view, much like peering through frantically whirling wipers.

      I'm prepared to accept that tech like lidar and radar might be relatively immune to such issues but they will be susceptible to others and that then creates the problem of which viewing method to prefer if they all differ?

      1. Doctor Syntax Silver badge

        Re: "It's fake smart."

        "I live in Dundee Scotland and absent the last two winters (more severe in England) such challenges are common."

        Commuting over the Pennines had similar challenges. However certain drivers, such as those in Mercedes, seemed to find it quite easy to to overtake. Perhaps the makers of self-driving cars have already added the characteristic that enables this. I think it's called over-confidence.

      2. MacroRodent

        Re: "It's fake smart."

        One often gets into similar situations in Finland. I would say a camera as typically installed has more problems than a human in the driver's seat. The combination of two eyes and head movement makes it easier to separate the snow from the real scene. But ultimately one must drive slover and remember that in snow, a car starts behaving more like a boat. Turning the wheel or braking has a delayed effect.

        1. John Smith 19 Gold badge

          ".. two eyes and head movement makes it easier to separate the snow from the real scene."

          I'm fairly sure the hardware budget for an autonomous road vehicle could run to 2 separate cameras and some mechanism to vibrate their PoV just a bit, like a sort of mock human head.

          Making the NN's use that new information effectively might be a bit tougher.

    4. Anonymous Coward
      Anonymous Coward

      Re: "It's fake smart."

      I think that 'fake' smart is a little unfair and perhaps 'different' smart would be a better way of describing it.

      The idea that it's textures that are being recognised, rather than shapes, makes a lot of sense to me. When I've tried to model neural networks* in my mind it wasn't clear how shape recognition could work but I can see how texture recognition might be achieved - shapes don't really have patterns whereas textures pretty much are patterns, and it's pattern recognition that NNs are good at.

      It also fits with an old article here on the Reg about a NN that was first trained to recognise cats and was then told to generate a cat image based upon its learned recognition criteria; what it came up with was not a picture of a cat but a seamless patchwork of bits of cat.

      * I believe the convolutional bit is more to do with reducing the data instead of processing it differently.

    5. Michael Wojcik Silver badge

      Re: "It's fake smart."

      My instinct is that this should have been obvious from the mathematics underlying CNN's. Yet apparently no one picked it up.

      Or was it that people did, but hoped no one would notice?

      People did, and published research about it. It's just that research was largely ignored by lay readers, image-recognition-technology boosters, and the press.

      As usual, though, the Reg commentariat assume they're smarter and more knowledgeable than the people actually working in the field.

      1. John Smith 19 Gold badge

        "People did, and published research about it. It's just that research was largely ignored"

        Actually I didn't.

        And it seems that as another poster put it the "Negative Nellies" were ignored

        Personally these would have been the people I'd have hired to find out what the limits to the technology (and wheather there are ways around them)

  2. Anonymous Coward
    Anonymous Coward

    Extra texture

    So my hand-painted hippy flower-power VW van will be a target for every damn self-driving car from here to eternity. Sigh.

  3. Nattrash


    Some systems pretrained on ImageNet might not perform so well in other domains, like facial recognition or medical imaging.

    I must admit, that kind of surprises me. Having spent many hours behind a microscope, doing pathology screening, and lived through the emergence of image analysis software, I thought I remembered parameters like roundness, circumference and stuff. Later, thought I recognised the same principles on radiological images. So are we saying here that image analysis software does nothing at all with shape?

    BTW, the thing that is REALLY worrying is the graph with the normal, clear pic of the cat; only 99% of humans recognised that? That must have been a real brutal night at the pub then...

    1. Joe W Silver badge

      Re: Surprising...

      Re: 99%

      That's one person in their group. I think that a miss-click on the test form is the likeliest explanation....

      Re: roundness

      That's a statistical image recognition. These parameters are part of a statistical model, and depending on the scores in the different categories the object is classified. CNNs (NN in general) are model free.

    2. Anonymous Coward
      Anonymous Coward

      Re: Surprising...

      AC for obv reasons

      Having worked on software to *help* in analysis of stained tissue slides.

      Methods we used were not AI, they were custom human developed for tissue type and stain, based on use by the actual clinicians about the analysis methods they used (we talked with them first, developed algorithms (that could be fine tuned by parameter changes through the app) and sat with clinicians whilst they used the app.

      So breaking down the image based on structures, stain colour / intensity etc.

      Not a neural net used anywhere, but it behaved like a human as it was based on how humans did it (be it human brains automagically run skeletonizing, percentage stain take up etc. algorithms)

      hardest bit was getting clinicians to explain methods they used as (bar edge cases) they would look at a slide and would be almost instant cancerous / cancer free decision as their brain automagically did the work. We thus did a lot of our tests with edge cases where clinicians had to analyse not on !auto pilot"!

      1. John Smith 19 Gold badge

        "a lot of our tests with edge cases where clinicians had to analyse not on !auto pilot"!"

        I'm sure you know the rule.

        "If you can describe it well enough how it works that you can predict its performance it's not AI."

        Although "sort-of-mimics-a-bit-how-humans-do-the-job" is a bit more accurate, but a bit of a mouthful.

    3. Persona Silver badge

      Re: Surprising...

      Looking down the microscope at a slide you get a largely 2D view of the object and rarely use binocular microscopes. Shape recognition works.

      A person looking at a cat directly gets 3D information and when it moves your brain integrates it all into a cohesive 3D cat. Once learnt you can infer the 2D outline of a cat in any position so recognize the 2D image as a 3D cat. By training recognition systems with libraries of 2D images the systems have chosen to ignore the wildly variable 2D outlines that result from the different positions of the 3D object and concentrate on the more consistent textures …... which sometimes give horribly wrong results.

      1. Doctor Syntax Silver badge

        Re: Surprising...

        "Looking down the microscope at a slide you get a largely 2D view of the object and rarely use binocular microscopes."

        That's an interesting point. I started off as a pollen analyst. Pollen grains are very definitely 3D objects but mostly grains were very easy to recognise whatever their orientation. This was because human perception has evolved to understand 3D objects and I think combines 3 things - a 3D model, the 2D image and an understanding of viewpoint. Maybe the last is what's missing in AI and being substituted by pattern.

        Perhaps pollen grains (optical images, not SEM) would be a useful training set for AI as shape, size and texture are all significant.

        One of the problems with AI is that you can't ask it how it comes by its decisions nor explain to it how to reach them. One could at least explain to students "I'm focussing up and down through the grain to look for pores" or "try moving the cover slip gently to roll the grain over", these being useful techniques to deal with edge cases.

    4. Michael Wojcik Silver badge

      Re: Surprising...

      So are we saying here that image analysis software does nothing at all with shape?

      The research here shows that a particular handful of image-recognition ANNs, which share a number of architectural features including depending heavily on stacked CNNs, are more sensitive to texture signals than to shape or edge signals.

      Drawing a broader conclusion is speculation.

  4. Milton

    Wrong priorities

    So a somewhat simplistic take on this is that the CNNs are lazily prioritising texture when they ought to be prioritising something else, and a sophomoric reaction would be to decide that basic shape should be prioritised instead - and given what's been said about different angles and viewpoints, the word 'topology' comes to mind. But hold! - topologically, a teacup is identical to a donut. So this isn't so straightforward. This is going to involve proportion as well as shape, and texture, and the researchers behind these schemes are going to have to think hard about how to get the systems to take the hint, presumably without it being made explicit. Interesting challenge.

    1. Joe W Silver badge

      Re: Wrong priorities

      Yeah, the crux is that the software has no clue about the 3d nature of the object. Thus, instead of recognising it is a rotated cat, the NN focuses on what the images that are labelled "cat" have in common - and that is texture. So we need to devise a way to make the NN learn about the three dimensional nature of, eh, nature... not exactly straightforward. Especially taking into account the rather intransparent nature of CNNs. Black boxes, essentially.

    2. Doctor Syntax Silver badge

      Re: Wrong priorities

      "presumably without it being made explicit"

      With nothing more than pictures to go on providing explicit rules is needed. The training of the human visual system involves more than looking. I wonder what percentages of post-grad students are parents. If they were they might realise that when a child it prodding and biting objects or crawling round bumping into things it's doing more than reviewing the continual flow of images; it's correlating them with other senses and learning about solid objects. This is information that the AI isn't acquiring because it doesn't have the requisite data feeds nor the ability to interact with real objects.

    3. John Lilburne

      Re: Wrong priorities

      On flickr they attempt to tag images with what they think the image shows. A photo of an insect on leaf has the leaf tagged as grass, the insect as a lizard, a butterfly becomes a bird. This even when you've given it hints such as "Gonocerus acuteangulatus" or "Iphiclides podalirius". Also they have dataset of a billion or more images.

  5. This post has been deleted by its author

  6. Andrew Commons

    It seems to be an extension of this study...

    Keep in mind that the smallest change required to get an image classification algorithm to misclassify is .... 1 pixel.

    1. tiggity Silver badge

      Re: It seems to be an extension of this study...

      A bit of pixel edits and sneak that pr0n past the filters as the AI sees a banana, cherry bakewells, kebabs etc.

      1. Doctor Syntax Silver badge

        Re: It seems to be an extension of this study...

        "the AI sees a banana, cherry bakewells, kebabs etc."

        Food porn!

  7. DropBear

    Can we please...

    ...just puncture the current "AI" bubble already and call it a day...? Yes, it's 2019 and image recognition is a thing. Mostly. Sorta. If you squint at it just the right way, as this study demonstrates. None of it has absolutely anything to do with actual intelligence, as this study also demonstrates. So, world, just quit it already, my neck is on the brink of getting RSI from all the "nu-uh!" head-shaking I need to do every time I try reading all the effusive "AI" tech news these days.

  8. T. F. M. Reader

    Elephant texture and image recognition

    Too lazy to read the original paper: does it allude to the age old parable about blind men checking the texture of various parts of an elephant and coming up with different conclusions regarding what it looks like?

  9. Anonymous Coward
    Anonymous Coward

    No surprise...

    It's how a site I upload sometimes image to tags them - the curly hairs of a black model have been tagged as "bear fur" - which is quite racist also. In another a man with a white beard and dressed in black was tagged "skunk" - which is quite offensive too....

    At least it was good I can see tags, but not the people I share the albums with. Most of other tags are very off the mark too - my photo are not exactly common "snapshots" so I guess systems trained from far more generic images fail spectacularly as soon as the images are outside their knowledge - and yes, it looks biased on textures.

    1. Pat Att

      Re: No surprise...

      I think your definition of racist is different to mine.

      1. Anonymous Coward
        Thumb Down

        Re: No surprise...

        Would you like an image of you to be tagged as "bear fur" just because your beautiful hairs are dark and curly? It's that kind of bias that becomes quite offensive.

        I would tag you with another word, but I can't write it here.

        1. Pat Att

          Re: No surprise...

          Calm down - you are getting too emotional. It's basically a mechanical (well, electronic) process. Just because a picture that's fed into the algorithm looks like previously posted training images of bear fur does not make the system, the algorithm, or the programmers racist.

    2. A. Coatsworth Silver badge

      Re: No surprise...

      Hanlon's Razor, LDS, Hanlon's Razor...

      I think it is much easier to create an inept algorithm that takes curly hair and classifies it as bear fur; instead of writing one that *correctly* identifies the image as a human being and also correctly identifies their race, and then deliberately classifies them as an animal...

      It is also funnily ironic to anthropomorphize IA by giving it human traits, such as racism, in an article discussing how IA is not actually intelligent at all

  10. Pen-y-gors

    Interesting picture

    Looks like a cyborg programming a wall of slightly distorted Bletchley Park Bombes - is this how our robot overlords decode our DNA so as to exercise total control over all life-forms?

  11. Aristotles slow and dimwitted horse

    If "Captcha" is anything to go by...

    If that infernal "Captcha" thing is anything to go by, then even the basics such as accurate image recognition is a long long LONG way off.

  12. Mage Silver badge
    Big Brother

    re: image recognition

    A misnomer.

    There is no intelligence or recognition or "neural network" in any sense that would be recognisable outside of people looking for grants, investment or to monetise it. Human curated specialist databases and pattern matching. Thus inevitable that texture works better than line drawings or silhouettes. In the real world the so called "image recognition" might be as bad as 2% when dealing with things more complicated than number plates* and in an uncontrolled natural environment. A trained child or even a crow (if motivated) is likely better. The problem is that children, rooks, sheep etc soon lose interest and wander off.

    Self Driving

    Does Social Media need this?

    I'll be convinced there is decent AI when Spelling & Grammar checkers are even half as good as a trained human. I don't see much progress since 1991.

    It's 90% marketing and 10% functionality? I made that up.

    * Probably almost solved by Ray Kurzwiel's OCR in 1974, which was not actually AI.

  13. Ben Bonsall

    Train multiple CNNs on different things. One on texture, one on sihloetes, one on shape, one on edges. Train them all with a dataset that has multiple views on the same object, so a cat from the front and a cat from the back and from the side and above looking down when the cat is looking up etc.

    Same image processed multiple ways each time.

    Then train another one to use the output of the others to weight the responses. This supervisor is trained by giving it the set of processed images as one example, and the rotations and viewpoints as sets that are related, so it's tagged as this set and these sets are all the same thing, not these images are all of related things. If one quickly comes up banana and two come up toaster then it's more likely to be a toaster. you could even feedback from the supervisor to say try again, no one else thinks it's a banana and see what I says then.

    1. Mage Silver badge

      re: Train multiple CNNs on different things

      The problem is that it's not training in sense of an animal or child. It's really a method of storing data compressed from human selected / tagged images. The non-texture, shape based approach is really hard. Actually it's so far an unsolved problem.

      A chair is good example. If not mostly based on 'texture' in loosest sense, then you need images not just of every sort of chair but almost every angle.

      A child understands "chair-ness". The child can even decide if a box or rock or lap can be used as a chair.

      So called "AI" or Neural nets have no abstraction at all, no intelligence*, they need specific examples. They don't "recognise" but match.

      * Actually no-one has come up with a useful definition of intelligence, except in very general terms of tool using and problem solving in situations never encountered. Certainly not one that can be converted to an algorithm. Perhaps the best that can be said is that we can recognise it. Untrained people can easily be taken in. Eliza version of Turing test, which was never a serious proposal to test AI, but to test people. IQ tests don't test intelligence, even the guy that invented them said so, though the USA Army, and HR depts like them.

      By late 1960s it was obvious what Turing had suspected in late 1940s and Lovelace in Victorian era. Computers could flawlessly do very difficult things, that even an expert human would make mistakes at. Yet would probably never master apparently simple things a five year old child, or even a rook or chimp does easily. It was dubbed the AI paradox.

      See also

      Also it's true, though people seeking grants, investment and to sell their solution won't admit it: ""Every time we figure out a piece of it, it [AI] stops being magical; we say, 'Oh, that's just a computation.'"

      Neal Stephenson tries to examine this in "The Diamond Age: or A Young Lady's Illustrated Primer"

      Spoiler: The nanotech stuff and some other aspects are really magic & fantasy rather than SF, but an entertaining book.

  14. devTrail

    Case overblown

    The real trouble in this field is the way it is reported by the media and the marketing departments of big corps. From one side the label AI is usually assigned prematurely, from the other side any misstep is painted as the big disappointment of the whole field. AlexNet maybe one of the fist implementations with some successful applications, something similar is now used by google image search, but it is not the only one and it does not represent all CNNs. E.G.CNNs trained on MNIST definitely do not depend so much on texture.

  15. Tom 7

    Its finding the easy route. People do that all the time

    Its getting the right answers 'for the wrong reasons' but no-one told it they were the 'wrong' reasons so its just doing what it can, So the next trick will be to make sure it doesnt take an easy route - but whatever is going on humans are learning as much as AI.

  16. Anonymous Coward
    Anonymous Coward

    Its been a few years since I have done work in NN, but what I have read in recent years, I was working on what is now considered 3rd gen NN (spikey neuron neural networks) 17 years ago.

    I haven't looked into how the training is done these days, but back then to improve the reliability of the detection you would introduce noise directly into the NN at the point of training to mix it up a little improving the generalisation of the learning. Do they do that these days or do they just train it directly with data, really just getting the NN to just memorize it?

  17. Mike 137 Silver badge

    "Artificially intelligence may suck ..."

    The grammar's a litle unusual so I may have misunderstood, but is this the alternative to intelligence sucking authentically?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like