back to article Even complex AI models are failing 5th grade science

Think your AI agents are actually learning to solve problems? A new benchmark sheds light on what is real when it comes to sophisticated AI. Researchers from the University of Arizona, Microsoft, and the Allen Institute for AI tested several different state-of-the-art agents and found them readily able to answer the "what" of …

  1. Mike 137 Silver badge

    "there's no reasoning going on inside those digital brains"

    We've known this for ages - it's logically inevitable. The current mechanism of AI is adaptive statistical template matching - i.e. lookup to find patterns maximally similar to input stimuli, potentially adjusting for majority trends. It's about as intelligent as a bumblebee (which also uses adaptive template matching to learn). That's not to say current AI doesn't have its uses - particularly where the range of alternatives to select from is huge - but it's quite unrealistic to call what it does 'intelligence' in human terms. Useful tools maybe, but probably misnamed, and certainly not safely applicable to situations where it's faced with the entirely unexpected.

    However I'm a bit depressed that the 'projects' tested are classified as 5th grade - they seem rather elementary for that.

    1. b0llchit Silver badge

      Re: "there's no reasoning going on inside those digital brains"

      It's about as intelligent as a bumblebee.

      I'm sure that the bumblebee is several orders of magnitude more intelligent than any AI currently available. I have not yet heard of or seen an AI that is actually self-sufficient.

      But you are right, current AI systems are primarily statistical inference machines. They have their uses, but calling them "intelligent" would be a real slap in the face of that bumblebee. I'm sure the bumblebees perform better at solving the 5th grade science problems than the tested models.

      1. LionelB Silver badge

        Re: "there's no reasoning going on inside those digital brains"

        Agreed, that is an insult to the bumblebee. Bees perform incredible feats of precision airborne manoeuvring, and long-distance navigation in complex, unpredictable environments, that are way beyond what any current AI (or indeed expert human engineering) can dream of achieving. Then again, those abilities were honed over billions of years of evolution, so perhaps not a fair contest.

        And yes, current "AI" may well be "just" statistical inference, but that let's not knock that. There is evidence that statistical inference may well be an essential component (though hardly the be-all-and-end-all) of biological intelligence.

        (Also, I doubt bumblebees would actually solve 5th-grade science problems terribly well, even, or especially, if you slap them in the face -- cruel and inadvisable - and soak the test sheets with nectar.)

      2. Mike 137 Silver badge

        Re: "there's no reasoning going on inside those digital brains"

        "that is an insult to the bumblebee"

        Having studied bumble bees and observed them for years, I've come to the conclusion that their learning process operates very like current AI. They apparently learn by starting from pure trial and error, that then gets refined by the statistics of outcome. Thus a model develops based, if crudely, on the probability of success. However, once the probability of successes reaches a certain level they seem to stop adapting the model, at least in the short term.

        Just for example, I used to have a bumble bee nest in my wood shed. There was a gap at the bottom of the door through which they would fly in and out in support of the nest. But if I opened the door they would buzz around in confusion both inside and outside the shed, unable to find their way in or out. Clearly, in the early stages of creating the nest, a 'picture' of the way in and out had been imprinted, and they could not adapt short term to changes in it. If I'd left the door open for a long while they might have adjusted, but I didn't so I can't comment on that. However it is observable that different individual bumble bees (not just different species) can develop preferences for foraging on specific species or colours of flower, even where they may be not very numerous and there is plenty of pollen and nectar on alternatives nearby. This again looks like fixation of early imprinting.

        The admittedly extraordinary capacities for navigation, flight, foraging mechanisms (buzz pollination etc.) and nesting behviour are indeed, as an entire set, way beyond anything our current robotics can deliver, but that's not down to intelligence - it's an outcome of evolutionary selection.The bees don't apply intelligence to performing them - the mechanisms are essentially hard wired.

        Way back in the late 80's I annoyed folks at a microrobotics seminar by stating "when you can build a robot the size of a bumble bee that can do everything a bumble bee can do with comparable energy exchange, you'll have achieved something worth reporting".

    2. heyrick Silver badge

      Re: "there's no reasoning going on inside those digital brains"

      Exactly this. There is zero understanding, so there's no intelligence. Artificial Idiocy (as I like to call it) has a ridiculously long way to go before it is capable of being able to "reason".

      1. LionelB Silver badge

        Re: "there's no reasoning going on inside those digital brains"

        I would hesitate to attach "understanding" to the term intelligence (biological or human-made) that readily. Many organisms perform incredibly complex behaviours which I think most of us would be happy to label as "intelligent", but hesitate to claim that the organism "understands" the problem it is solving.

        It boils down to this question: do we take the term "artificial intelligence" simply to mean human-like intelligence -- in which case we are, of course, light-years off -- or do we broaden it to include other biological intelligence (in which case we are merely parsecs off)?

        1. Pascal Monett Silver badge

          A parsec is 3.26 light-years.

          1. LionelB Silver badge

            Yup... always forget which way round it is.

    3. Anonymous Coward
      Anonymous Coward

      @Mike 137 - Re: "there's no reasoning going on inside those digital brains"

      Even more depressing (I would say frightening) is the fact that AI is pushed or adopted in real-life scenarios where life of human beings is at stake, with no oversight and with no possibility of remediation. You must be at fault because AI said so and it can't possibly be wrong.

      1. Will Godfrey Silver badge

        Re: @Mike 137 - "there's no reasoning going on inside those digital brains"

        I've read quite a few SciFi stories based on that premise. Scary ones!

        1. heyrick Silver badge

          Re: @Mike 137 - "there's no reasoning going on inside those digital brains"

          Recommendations welcome...

          1. very angry man

            Re: @Mike 137 - "there's no reasoning going on inside those digital brains"

            Ai, it's just a few short instructions:

            input from IR

            input from movement sensor

            is IR input moving?

            Aline weapon

            input from Lidar

            is weapon line obstructed?

            change position to clear weapon line

            discharge weapon

            play recording " kill all humans"

            close loop

            restart loop

            change position.

            1. Anonymous Coward
              Anonymous Coward

              @very angry man - Re: @Mike 137 - "there's no reasoning going on inside those ...

              AI is very good at this kind of job. There's an added bonus in that nobody will be blamed if one of these autonomous intelligent weapons opens fire in a school or other crowded public places.

          2. Will Godfrey Silver badge

            Re: @Mike 137 - "there's no reasoning going on inside those digital brains"

            If you mean for SciFi stories, I can't remember the titles, I read most of these in my teens, but they all revolved around a glitch meaning that someone no longer existed, or their identity was swapped with a wanted criminal etc.

            One in particular was about a guy who suddenly not only lost all access to food, shelter transport etc, but was now regarded as an alien object and was being hunted for 'research'.

    4. a_yank_lurker

      Re: "there's no reasoning going on inside those digital brains"

      I hear the bumblebees are rather insulted by the comparison of artificial idiocy to them.

  2. Andy 73 Silver badge

    The next question...

    ..would you let a 5th grader drive your car?

    1. a_yank_lurker

      Re: The next question...

      Sooner than I would let AI aka artificial idiocy do so.

    2. doublelayer Silver badge

      Re: The next question...

      I suggest that's a different proposal. Driving a car is a mechanical task, albeit a very difficult one requiring a lot of things the human brain is good at doing. Writing a program to drive a car may include a bit of statistical calculations for visual things, but most of it will be non-AI code that can be more rigorously tested. In either case, the car doesn't have to figure out your goals in order to follow the course you set and not drive into any solid objects. Many other processes have been automated in this manner, and while I expect this one to take a lot longer because it is much more complex, you don't need to get AGI to do it.

      The thing this research tested is the problem solving ability of AI software, which as we know is abysmal. Even if it wasn't, it's unlikely we would want a car to start doing it--it shouldn't decide where to take you, after all. There are things where you need general intelligence and problem solving, which is where humans come in useful, and cases where you need fast and accurate machines. You wouldn't ask a ten-year-old to move a satellite in orbit either, but when a computer does the calculations and activates the movement, it's completely normal.

      1. Pascal Monett Silver badge

        "when a computer does the calculations and activates the movement, it's completely normal"

        Indeed it is, because moving a satellite does not include checking the rear-view mirror for another satellite in the left lane, or paying attention to road signs, or needing to be wary of rain, snow or ice. It's just apply this amount of Newtonian thrust to attain this result.

        A pocket calculator can do that today.

        I salute this study, and will keep a link to it, because it brings into sharp relief exactly how unintelligent what is commonly called AI is today.

        1. doublelayer Silver badge

          "moving a satellite does not include checking the rear-view mirror for another satellite in the left lane, or paying attention to road signs, or needing to be wary of rain, snow or ice."

          It does though. If you're moving a satellite, one of the major reasons you'd be doing it is so it isn't going to hit another one or come close to doing it. That's both now as you move it and later on because you can't constantly be moving the thing. You also have to be aware of local conditions, which includes space weather. Those are the things the computer calculates when it decides where the satellite should be going and the correct movements to get there. That's going to be reviewed by a person, but the person doesn't do the calculation and tell the computer.

      2. Andy 73 Silver badge

        Re: The next question...

        "In either case, the car doesn't have to figure out your goals in order to follow the course you set and not drive into any solid objects."

        I think this is a fallacy, in that a driver absolutely *does* have to figure out the goals of other drivers on the road and make deductions about things it has never seen before.

        "Is that car parked next to the shops about to fling it's door open?"

        "Is that thing with a single light coming towards me a bike or a car with a light out?"

        "Is the lorry carrying a load of traffic lights a safe thing to drive behind?"

        Sure, there are a lot of mechanical tasks - stop quickly when necessary being the obvious one - but when to stop quickly, or steer around things, or even accelerate is a deduction based on an understanding of a large number of unpredictable actors.

        1. unimaginative Bronze badge

          Re: The next question...

          Looking at those three examples, I disagree:

          1. A human might better anticipate a door being flung open, because its based on assesing humn behaviour. On the other hand a computer should react faster once it happens.

          2. A self driving car would have sensors that could better tell the difference between a bike and a car than human senses can. Its not reliant just on light from headlights.

          3. Hopefully a lorry carrying that sort of load would have it properly secured. A self driving car is also more likely to maintain an adequate distance than most human drivers seem to.

        2. heyrick Silver badge

          Re: The next question...

          I drive a little car that has a restricted speed (Google "Aixam").

          One of those on the road is like a red flag waved at a bull. You'll spend half your time looking at the road ahead, and the other half keeping an eye on the rear view mirror for the cars behind that will do something stupid. That's not a possible maybe, that's a "yes this twat is going to pass me at 90 (10 over the limit) on a blind bend going up a hill". Seen it enough times already. Because clearly risking a spectacular crash is a better option than thirty seconds behind something slow.

          So all of you that think that loads may be well secured, people follow the highway code, and cars are all driven by sensible competent people... bollocks to that. In reality some cars are driven by people who are careful and considerate, whilst others are driven by suicidal nutjobs. It's hard enough for a human to predict exactly what stupid thing is likely to happen. A computer? No chance.

          1. heyrick Silver badge

            Re: The next question...

            Quick example. There's a car coming around a roundabout. It is indicating that it will turn onto the road you're on. Do you pull out?

            What if it isn't indicating?

    3. Mike 137 Silver badge

      Re: The next question...

      would you let a 5th grader drive your car?

      A 5th grader is a 10-11 year old. I've personally known a couple of 10 year olds that were accomplished dirt track racers. So yes, I might (depending on the capacities of the individual 10 year old).

      However the official educational expectations at all grades appear to have been declining over time, so the average may indeed now represent insufficient capacity for the task.

    4. Disgusted Of Tunbridge Wells Silver badge

      Re: The next question...

      It depends, are they drunker than I am?

    5. Potty Professor

      Re: The next question...

      I learned to drive a car when I was 7 years old, admittedly on a farm track, but I was trusted to drive the tractor (a Little Grey Fergie) from the dairy to the main road gate and back twice a day to deliver the milk churns to the milk stand beside the road. Many years later, I taught my two daughters to drive, the older one (although very keen to learn) found it very difficult, but the younger one took to it like a duck to water, and was capable of almost anything asked of her by the time she was 11. The older one still is not as spatially aware as the younger, but has so far avoided any accidents. The younger one is now a "Bike Chick" and rides a Yamaha 250 Tourer.

      1. Potty Professor

        Re: The next question...

        PS, She informs me that it is a Virago, for the cognoscenti out there.

  3. Anonymous Coward


    As anyone who remembers Asimo's debut demo will tell you, it aced the first step up a set of stairs, managed the second step, and fell over on the third step.

    This is AI's downfall. Ask it a single question and its Artificial Guess can be accurate. But the second Artificial Guess takes the first Artificial Guess as a given, which introduces more uncertainty, and each subsequent Artificial Guess compounds it.

    IRL, an AI can recognize potential areas of concern in a mammogram better than the average doctor. But if you tried to teach it to plan out the treatment, I eoud not be surprised to see it recommending mastectomies just in case.

    1. heyrick Silver badge

      Re: Asimo

      "see it recommending mastectomies just in case"

      Google "Ian Paterson"...

    2. razzaDazza1234

      Re: Asimo

      I met Asimo 15 years ago at Suzuka and he was doing just fine. He even had a little dance.

      And there were moments were I could see its 'intelligence' in action and it caught my breath.

      Take a look on YouTube 2-minute Papers by Karl somebody to see how AI is developing. - especially in physics modelling where something in the past took days to process/render: those tasks can now be almost real-time - thanks to AI 'learning' how to do it.

      As to whether this constitutes what we call AI is a different tell. AI really doesn't need a big chunk of our intelligence to survive. One leading AI chap wrote a short while ago that the biggest hurdle in AI ATM is the personification of AI.He went on to say that once we can move away from the dominant human model, there will be a big leap forward. And I agree with him.

      Our dominant perception of intelligence can change.

      1. Pascal Monett Silver badge

        Sure it can.

        But of bunch of statistical rules is not Intelligence.

  4. NoizeBoy

    Interpreter please

    This is NOT a snarky comment but a genuine request for info... as an old (grumpy) geezer from the UK, who's never had kids... WTF is 5th grade? Do they have these grades in the UK now?

    When dinosaurs roamed the earth and I was at school we had Primary school and forms 1 to 6 in Secondary school. Ye Gods I feel old!

    I know, I'll get my coat (and pipe, slippers, dog and mug of Horlicks).

    1. diodesign (Written by Reg staff) Silver badge

      Re: Interpreter please

      5th grade is 10-11 years old.


      1. J.G.Harston Silver badge

        Re: Interpreter please

        "5th grade is 10-11 years old."

        So.......... final year of primary school? Or first year of secondary school?

        1. heyrick Silver badge

          Re: Interpreter please

          What happened to Junior? Or is that just considered a part of Secondary these days?

          For me it was Kindergarten [1], Primary, Junior, and finally Senior [2].

          1 - that part was in America, which is why it was called that.

          2 - and since that was boarding school, it was split into Junior/Intermediate/Senior. The only time forms were mentioned was Sixth Form because they were Prefects and they were bastards.

    2. a_yank_lurker

      Re: Interpreter please

      The title is a reference to a US TV game show 'Are You Smarter than a 5th Grader?' hosted by the American comedian Jeff 'You might be a redneck...' Foxworthy. Contestants were asked questions that are covered by the 5th grade in the US with help from actual 5th graders.

    3. doublelayer Silver badge

      Re: Interpreter please

      I'm not sure which countries use which set of terms. Since these are American researchers, they're using the American grade numbers, which go from 1st (age 6-7) to 12th (17-18).

      1. heyrick Silver badge

        Re: Interpreter please

        Here in France they count the years backwards, and have a terminal after the first!

        Here's a comparison of American and French.

    4. J.G.Harston Silver badge

      Re: Interpreter please

      And in English schools we have two years of sixth form!

      (Or have they renamed them to year 15 and 16 or something?)

  5. Anonymous Coward
    Anonymous Coward

    The old declarative AI and ML

    If one compares the old declarative AI like the ones based on Prolog or expert systems and today's machine learning systems one can disect the limitations of AI in general. While declarative AI could infere easily complex truth or implications from a set of simple facts its inference is limited to what is thought with simpler facts in comparison to the infinite of possibilities of the solution space. Now modern ML may have the magic of self generating its own simpler facts by means of shear data statistical analysis, approximation or classification but even with that is still limited like the declarative AI to only infere from the data driven self learned simple facts.

    Even system that combine both methods are inherently self limiting. Biological organisms and humans brain power is the ssentially different as the primary function is self preservation but not necessarily knowledge inference. To that sense polymorphic computer viruses and malwares and their design approaches including swarm approach like botnets are more intelligent that the current AI systems.

  6. a_yank_lurker

    Artificial Idiocy

    My cats display more reasoning ability than any artificial idiocy system can do.

  7. Schultz

    Up next ...

    The custom-trained AI that can solve this specific 5-th grader challenge better/faster than any human. Cue the headline: AI can do anything and will solve all our problems.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like