back to article Hope for nerds! ChatGPT's still a below-average math student

OpenAI's ChatGPT outperforms the average university student in computer science, according to researchers, but interestingly, not in math. The large language model's performance also beats the average student in other subjects, including political studies, engineering, and psychology, a paper published in Scientific Reports …

  1. Anonymous Coward
    Anonymous Coward

    English as a disadvantage.

    "... increase the competitiveness of students who are non-native English speakers"

    From now on, claim you don't know English and let the robots do your work.

    This tip brought to you by ChatGPT.

    1. Blogsey
      Holmes

      Re: English as a disadvantage.

      "Math"?

      Standards of English are slipping at El Reg!

      Sherlock. Because we will always have Paris.

  2. MatthewSt Silver badge

    Below average is a bit generous...

    User

    Can you make 147 using only 396, 200, 16 and 519?

    ChatGPT

    Sure, you can make 147 using the numbers 396, 200, 16, and 519 with the following mathematical expression:

    (396 - 200) + (16 * 519) = 147

    User

    No, it doesn't

    ChatGPT

    I apologize for the mistake. You're right, the expression I provided doesn't result in 147. Here's the correct way to make 147 using the numbers 396, 200, 16, and 519:

    (519 - 396) + 200 + 16 = 147

    1. Anonymous Coward
      Anonymous Coward

      Sarcasm ahead

      I don't understand. It spoke with authority and confidence, and showed creativity. Doesn't that deserve reward?

    2. jmch Silver badge
      Devil

      Re: Below average is a bit generous...

      Wrong numbers to make 147.

      You need (15 x (1+7)) + 2 + 3 + 4 + 5 + 6 + 7

    3. katrinab Silver badge
      Alert

      Re: Below average is a bit generous...

      Bing Chat:

      Yes, you can make 147 using only 396, 200, 16 and 519. Here’s one way to do it:

      (396 + 200) / 16 + 519 - (396 + 200) = 147

      Is there anything else you would like to know? [smiley face emoji]

      1. Norman Nescio

        Re: Below average is a bit generous...

        So it's been bugging me.

        396 - 200 is 196

        Square root of 196 is 14

        Square root of 16 is 4

        519 + 14 + 4 is 537

        396 + 4 is 400

        Square root of 400 is 20

        200 / 20 is 10

        537 - 400 is 137

        137 + 10 is 147.

        Or in one expression 147 = 519 + SQRT( 396 - 200) + SQRT(16) - (396 + SQRT(16)) + (200 / ( SQRT ( 396+SQRT(16) ) ) )

        This can be simplified further to: 147 = 519 +SQRT(396 -200) - 396 + (200 / ( SQRT ( 396+SQRT(16) ) ) )

        Which probably breaks the rules, as it uses square roots and the number 396 three times.

        I have no doubt there is a more elegant solution, but that's the best I could come up with.

  3. Eclectic Man Silver badge

    Just wondering

    Whether the 'results' of ChatGPT versus human students says more about the subject than the marks? But I guess that maths is maybe still a bit too precise for AI (although arithmetic should not be a problem).

    After all, ChatGPT has an extensive collection of books it has 'read' from which it can choose items for an answer, whereas students (and probably some professors) have not read nearly as many texts, or remembered them as accurately as a computer can. In maths you cannot just 'make stuff up' and hope that if it looks ok you will get marks. Well, OK, when I was marking undergraduate examples and tests, I think some of them did just that, but I didn't give them any marks for it.

    1. Rikki Tikki

      Re: Just wondering

      Looking at it another way, it may be that assessment methods are deficient, in that they may not actually test students' understanding of the subject, only the ability to regurgitate from books.

      IMO, the test would be whether the knowledge can be applied to a real problem solving exercise.

      As a footnote, I was a bit surprised that ChatGPT didn't do so well in economics - a lot of that subject is bullshit, I would have thought an ideal playground for ChatGPT. (disclaimer: my first degree was economics)

      1. doublelayer Silver badge

        Re: Just wondering

        That is certainly optimal, and it's common as courses get more difficult, but it is sometimes difficult to find a new problem that can be solved by students that haven't learned much yet and has never been considered by anyone on the internet. The basics of education will unavoidably be things which are known and discussed by lots of people, and students will need to get past those levels in order to get to the places where they can be set more complex problems that haven't already been solved.

        1. Michael Wojcik Silver badge

          Re: Just wondering

          It's also worth noting the economics (sorry, Rikki) of classroom instruction and evaluation. Low-level courses tend to have large class sizes, often in the hundreds of students for some disciplines. Evaluating that many students quickly becomes very expensive; it's not feasible unless you use methods that permit quick, preferably automated evaluation. By definition, such responses will have quite low information entropy, and so LLMs will tend to perform well on them.

          Also, at least in US higher education, introductory courses are asked to serve multiple competing aims. They are supposed to introduce students to a subject area so they'll be prepared for more advanced study. They're generally asked to perform as general-education courses, i.e. to help students "catch up" to the base level expected of undergraduates. Administrators want them to serve as gatekeepers, discouraging students who think they might be interested in a discipline but turn out not to be from continuing. Students use them for easy (or easier) credit hours to fulfill requirements. Departments staff them with the cheapest faculty – fixed-term, contingent, and graduate-student workers – and run many sections, to make their budgets. Students take different approaches to the courses depending on which of these aims align with their own motives, and many are content to simply squeak through.

    2. Claptrap314 Silver badge

      Re: Just wondering

      These programs don't remember what they "read". They store statistical information about the relationship of words in the totality of their inputs.

      What would happen if you ask, "What is the text of Act I, Scene I of Shakespeare's Romeo and Juliet"?

      Not curious enough to make my own account...

      1. FeepingCreature

        Re: Just wondering

        "Store statistical information about relationship of words in their inputs" sounds suspiciously similar to "remember".

        1. Claptrap314 Silver badge

          Re: Just wondering

          Except they aren't remembering the text, merely some information _about_ the text. That's worlds different. If you want to argue that they function as a lossy compression function, that's a useful argument, but it's the lossiness it's that belies the claim of "remember".

          1. FeepingCreature

            Re: Just wondering

            I mean, but it's not like humans usually remember text either, or events, or reality.

      2. Michael Wojcik Silver badge

        Re: Just wondering

        Due to the large size and dimensionality of the parameter space in contemporary LLMs, and the large context windows, it often is possible to elicit large passages of text verbatim from a source. It's not hard to find gradients in parameter space in these models to recover that sort of thing, particularly if you keep the temperature low enough to avoid introducing noise or annealing the output into some "better" (in objective-function terms) result.

    3. doublelayer Silver badge

      Re: Just wondering

      I think it says about as much about the subject as the program, since what it says is whether the solutions to problems asked at that level are likely enough to have already existed on the internet. At low levels, the answer is yes for most subjects. A computer science question asked of relatively early students like "Write quicksort on arrays of doubles in C++" is a question where someone has written that code and published it on a page with most or all of those keywords on it, thus it's relatively easy for a program that has mashed up a lot of websites to find and reconstitute something close to it. A similarly difficult history essay is likely to be on a topic that has already been discussed by some people online, maybe even in these forums, and the bot can copy from that. A political course that focuses on modern events means that the bot can mash up some newspaper articles and make an essay that approximates an acceptable response. A mathematical problem, made from randomly chosen numbers, is probably not written on a website since the sites will have been written with instructions on solving such problems in general, not that one in particular. Hence, the bot can't solve it, since it hasn't been built to reason or to understand, just to repeat what the internet generally says in a format that looks like natural language.

  4. Doctor Syntax Silver badge

    "the largest performance gap between ChatGPT and students was for math-related questions, followed by trick questions"

    So for courses other than mathematics, throw in a few trick questions.

    1. Mike 137 Silver badge

      " throw in a few trick questions"

      That's been a major component (double negatives etc.) of the CISSP exam for years, and a pass still means bugger all in terms of actual capacity to deliver infosec.

      If you want a real test of knowledge (i.e. the capacity to actually use the information gained to solve a real problem rather than just to regurgitate it) a properly informed party (one that can already use the information) has to get the candidate to explain why they provide their specific answer. The big problem with LLMs is that they're not and can not be informed as they have no concept of meaning or anything else. They're just meaning-blind generators of most probable token sequences based on their training set. Oh, how often must we repeat this until it sinks in?

      1. FeepingCreature

        Re: " throw in a few trick questions"

        Humans also only analyze correlations between tokens: you think your sensory neurons feed you direct access to unvarnished truth? LLMs are just one step further removed than humans are; neither of us experiences unmediated reality.

        If you have any causal account of meaning arising from correlations in reality and sensory perception, that account would also justify meaning in LLMs.

        1. doublelayer Silver badge

          Re: " throw in a few trick questions"

          No, we analyze correlations between a lot of things, not just tokens. That's why we are more capable than LMMs. We can make correlations between actions and consequences, intentions and results, processes and products. It can only make correlations between chunk of text and other chunk of text. We can also experiment on our own to gain new correlations, without having to have a new training run feed it to us from someone's exogenous information. Oh, and we have the ability to generalize which it mostly cannot; this is why a lot of children can solve any mathematical equation using operations they know, and any computer can do it extremely quickly, but GPT cannot solve a simple equation if it hasn't already memorized the answer.

          1. Michael Wojcik Silver badge

            Re: " throw in a few trick questions"

            we analyze correlations between a lot of things, not just tokens

            Care to cite any evidence from, say, methodologically-sound neuroscience research, or even psychology, to support this claim, for some useful definition of "token"?

            As I've noted before, I'm not particularly impressed by LLMs. I'm even less impressed by handwaving appeals to res cogitans to try to claim human cognition is magically different.

            1. doublelayer Silver badge

              Re: " throw in a few trick questions"

              "I'm even less impressed by handwaving appeals to res cogitans to try to claim human cognition is magically different."

              It doesn't impress me either, and I've argued against it on multiple occasions. I question whether you correctly parsed my comment, as I didn't say that or anything like it for this case. I don't even have to argue by neuroscience here. My argument is purely technical and concerns what we are capable of doing as independent organisms and what an LMM is capable of doing.

              At a very basic level, we are capable of taking an action and seeing what happens. We can drop an item and notice that it falls, and from this arrive at a guess that every item will fall when dropped. A bit more experimentation and we can start to guess that items will fall faster if they're more compact and start to create ideas about air resistance. An LMM is not able to drop any items or watch what happens, so any information it has about what objects do has to be told to it. It is able to answer questions about kinematics only because someone has written them down and it is repeating what someone else said.

              Similarly, with my discussion of whether LMMs reason (they don't). This does not mean that a program is incapable of reason. I think that, if we got a computer powerful enough and a program designed to make more general logical connections, it would be capable of making generalizations the way we do, and we definitely do as it's one of the core parts of education. LMMs, however, are not written to do that. Their only purpose is to write, and they have not been set up with logical constructs that are more advanced than writing likely passages of text. I am not saying that human brains are more powerful than computers in some way. I'm saying that human brains do things that this program was never intended to do, which is why our brains will be capable of things that this program never will. Give me a new type of program doing different things in a different way, and that could all change.

              1. FeepingCreature

                Re: " throw in a few trick questions"

                If you hook an LLM up to an action harness where a certain output results in an injection into the token stream of the result of that output, interpreted as an action, onto reality, as measured by a sensor, then the LLM is in fact capable of picking up quite involved correlations in that.

                That's the whole point of the plugin architecture.

  5. Groo The Wanderer

    ChatGPT has neither "intelligence" nor "understanding", just the mass statistics of the global idiot network to support it's "answers." It has little to no hope of EVER being accurate about much of anything because it simply statistically summarizes the details that accuracy requires. The details get lost in the statistical averaging, summaries, and noise.

    1. John Brown (no body) Silver badge

      ...and from what we keep hearing about "training data", there's little if any curating going on, it seems to be quantity over quality. Curating the training data, even to the extent of limiting the scraping to "reputable" sources, seems to be too much cost and effort, despite the claimed billions being spent on it.

    2. agurney

      i

      .. the mass statistics of the global idiot network to support it's "answers."

      One thing I've noticed from playing around with ChatPT is that its responses are reasonably correct grammatically, although the content is often dubious.

      If its/it's or their/there are used correctly I am sadly now more suspicious that the answer is AI generated.

    3. FeepingCreature

      "The details get lost" is also known as "abstraction", a key component of intelligence.

    4. Antipode77

      Actually you could say LLM's are 'idiot savants'.

      Sophisticated in a single area, but nothing else.

  6. Bebu
    Headmaster

    Global Idiot Network...

    Until now a creation arguably purely human.

    I was trying to think of a decent humanities question to give ChatGPT a real workout :)

    Something like "Compare and contrast an aspect of Shakespeare's depiction of the relationships portrayed in both 'Taming of the Shrew" and in 'As You Like It' relating your answer where possible to recent performances."

    I suspect you would likely get a great deal more twaddle from human students than from the AI although I do not think it would be an easy question to answer but probably an interesting one.

    I was labouring under the mistaken impression that ChatGPT etc incorporated theorem provers and some sort of reasoning system but the poor showing in mathematics probably means they are only elaborate neural networks or perceptrons on steroids.

  7. Bebu
    Headmaster

    What does this gem actually mean?

    There is a general consensus between educators and students that the use of ChatGPT in school work should be acknowledged, and that it will increase the competitiveness of students who are non-native English speakers.

    If one were to replace the use of ChatGPT with cheating one would have pretty much the same logic.

    Is this statement intended to apply to such students studying in English speaking nations or an implicit claim of the global supremacy of written English in communicating science, technology etc? If the latter I suspect the French might beg to differ. ☺

    Also curious to which activities this increased competitiveness is intended to apply. College admission, assessment (examinations etc), employment? Or indeed how any of this finally relates to capability and competence.

    Certainly interesting times in which we live, unfortunately.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like