back to article OpenAI claims GPT-4 will beat 90% of you in an exam

OpenAI on Tuesday announced the qualified arrival of GPT-4, its latest milestone in the making of call-and-response deep learning models and one that can seemingly outperform its fleshy creators in important exams. According to OpenAI, the model exhibits "human-level performance on various professional and academic benchmarks …

  1. Fruit and Nutcase Silver badge
    Headmaster

    British Citizenship Test

    How will it fare?

    "Currently the citizenship test is a 45-minute written test that features 24 questions on British traditions and customs."

    https://www.theguardian.com/uk-news/2022/nov/01/meghan-says-prince-harry-was-unable-to-answer-questions-on-uk-citizenship-test

    1. Pete 2 Silver badge

      Re: British Citizenship Test

      > British Citizenship Test

      > How will it fare?

      A good question!

      I looked at some of the online practice tests (though the language of the questions is so broken, I doubt these are "official" questions). The first question was Do You Know When did the First World War end?

      Now, strictly speaking the answer is "yes". Which was not one of the four options provided.

      Even more strictly speaking, WW1 did not end with the armistice, that was signed on [ omitted so as not to be accused of a spoiler] but it actually ended when the Treaty of Versailles was signed in 1919.

      So one question worth asking is whether the person who sets those citizenship questions would pass their own test, either?

      1. hammarbtyp

        Re: British Citizenship Test

        "Do You Know When did the First World War end?"

        The 2nd question should be "and can you rephrase this question so that it does not sound like its been written by a 10 year old asylum seeker"

        1. Anonymous Coward
          Anonymous Coward

          Re: British Citizenship Test

          how dare you offend our fully! qualified! staff!!!!

          btw, yesterday I signed a consent form for my son where:

          "Cadets will be dismissed at 21.30 hrs, long sleeves must be warn"

          On 2nd thought I decided against amending it to 'warm' and went for 'warned' instead. I don't think anyone will bother anyway, which is going to be VERY disappointing!

      2. Ken G Silver badge
        Headmaster

        Re: British Citizenship Test

        You could argue for 1920 and the Treaty of Trianon or 1923 and the Treaty of Lausanne too.

        1. Ken Hagan Gold badge

          Re: British Citizenship Test

          Or you could say "I'll let you know when it happens.".

    2. Jason Bloomberg Silver badge

      Re: British Citizenship Test

      How will it fare?

      It will almost certainly be able to prove it's more British than I actually am, and I'll be the one on the next plane to Rwanda - which might be doing me favour seeing how this country is going.

      The whole notion that 'these are things a Brit would or should know, must be known to be considered truly British' is deeply flawed.

      I would expect it to beat almost everyone in any exam - it's not a level playing field. Give human candidates access to the web, time to look things up and formulate an answer, and they would do much better.

      Where I have found Chat AI excels is in producing convincing bullshit and lies as quick as the guy I know down the local who always has the answer for everything, and every politician I have known.

      1. John H Woods Silver badge

        Re: British Citizenship Test

        r/angryupvote

      2. Fruit and Nutcase Silver badge

        Re: British Citizenship Test

        and every politician I have known.

        Perhaps we can replace them all with AI instances - it will be cheaper and we can turn them off when they start sprouting nonsense

        1. cyberdemon Silver badge
          Big Brother

          Re: British Citizenship Test

          TBH, I wouldn't be surprised if they already were, to some extent.

          A politician could already fire up ChatGPT and practice a few debates with a virtual opponent.

          "I am Rishi Sunak talking at PMQs, you are Kier Starmer and the Labour Front Bench. I have just announced my new policy to require government access to all Internet-connected devices, and a ban on unsigned or self-signed open-source operating systems which could circumvent such measures. We will also be installing automated antipersonnel weaponry along the beaches of Kent." What might be a typical opening question from my opponents?

          A Typical opening question might be, "These kind of policies sound like something from 1930s Germany! Has this government gone completely stark-raving mad or is it truly plotting to turn Britain into a Nazi police state?"

          OK and what would be a winning response which gets the media on my side despite tyrannical new policies?

          A Winning response would be to accuse Mr. Starmer of minimising the Holocaust, just like his predecessor Jeremy Corbyn....

    3. jmch Silver badge
      Boffin

      Re: British Citizenship Test

      Given that it's likely that most of the answers are factual and easily findable on the Internet, I wouldn't be surprised if it could ace a citizenship test.

      I'm equally unsurprised that it aced the bar exam. For all the fancy talk, most law cases hinge on lawyers being able to pore through vast volumes of written law and case law, and find precedents that meet the pattern. If a model has been fed the laws and a bunch of legal texts as part of its training it should do well enough. And doing well in mathematical tasks should be trivial, as long as it just needs to spout out results and not explain its reasoning.

      Having said all that, it's still pretty impressive!!!

  2. elsergiovolador Silver badge

    Templaton

    Most exams just test memory and ability to learn templates.

    If you give answers that are correct but not in the answer sheet, there is a chance the tutors wouldn't be able to figure out that your answer is correct, because likely they don't understand the topics they teach well enough (otherwise they wouldn't be teaching...)

    So yes, chat GPT-4 and other a"I" will be good add this as this does not involve thinking or reasoning, just finding matching patterns and spitting out whatever fits.

    1. b0llchit Silver badge

      Re: Templaton

      That is why in-person exams using face-to-face dialogue is far superior.

    2. Dan 55 Silver badge

      Re: Templaton

      Physics lecturer used ChatGPT on one of his exams: The results will shock you!

      (Well actually they won't, it got the answers wrong.)

      1. that one in the corner Silver badge

        Re: Templaton

        And wrong to the point where the Prof. said "if anyone gets *this* wrong then my whole course has failed" - i.e. after fluking its way to a few sort-of good answers it failed on THE significant idea!

        Which perfectly demonstrates that the decent answers came from anywhere *but* understanding the material.

      2. Binraider Silver badge

        Re: Templaton

        If you could connect ChatGPT to WolframAlpha...?

      3. Richard Pennington 1
        FAIL

        Re: Templaton

        I am one of the admins for the LinkedIn "Mathematical Olympiads" subgroup. One of the members fed a (relatively) simple mathematical question to ChatGPT. It started reasonably well, then made a series of mathematical and arithmetical howlers, failing several sanity checks on the way through.

        ChatGPT is not good at mathematics.

    3. Anonymous Coward
      Anonymous Coward

      Re: there is a chance the tutors wouldn't be able to figure out

      there are tutors?

  3. spold Silver badge

    The next versions...

    >>>OpenAI's GPT series by its very nature is a family of regurgitation engines <<<

    The next versions are in development, codenamed Hughie, Chuck, and Ralph.

    You will be able to interact with them through a virtual ivory telephone.

    1. Anonymous Coward
      Anonymous Coward

      Re: The next versions...

      excellent, but surely you mean porcelain --- unless you move your bowels in very refined circles.

  4. DS999 Silver badge

    Hardly amazing

    It has such a large data store it is basically like taking a test with Google available to you to find formulas, look up definitions of words (and synonyms, antonyms, etc.)

    Actually better than having Google available because with Google you have to be able to assess that you are looking at the right search result, while the training data given to GPT-4 can elide out all the confusing or incorrect stuff so it is like having a curated Google available to you.

    1. CatWithChainsaw
      Facepalm

      Re: Hardly amazing

      Was about to say, 700/800 on the math portion and it only took one Internet's worth of knowledge to get it there.

      1. Michael Wojcik Silver badge

        Re: Hardly amazing

        Yeah. I do not think the test results are at all impressive.

  5. bertkaye

    don't panic

    I asked GPT-4 a question and its answer was "42". Then it said the mice wanted a slice of my brain.

    1. hammarbtyp

      Re: don't panic

      But what was the question?

      1. Uncle Slacky Silver badge

        Re: don't panic

        How many roads must a man walk down?

      2. that one in the corner Silver badge

        Re: don't panic

        > But what was the question?

        Tricky. We will have to build a new GPT-X, a model so complex that GPT-4 is not worthy to describe its merest operating parameter. One where Life Itself will be part of the matrix.

        And it will be called: Amazon Mechanical Turk!

  6. Neil Barnes Silver badge
    Happy

    a family of regurgitation engines

    I just wanted to see that again.

    1. Arthur the cat Silver badge

      Re: a family of regurgitation engines

      Small children are excellent regurgitation engines.

  7. Pete 2 Silver badge

    So, more human than we would think?

    > GPT series by its very nature is a family of regurgitation engines, drawing upon material it was trained on taught and reassembling it to address your query.

    Which is what most people do, most of the time.

    Few people create unique responses, and then only rarely. Most of us rely on knowing (though education / experience) what to regurgitate in response to a given situation and simply do that.

    1. Anonymous Coward
      Anonymous Coward

      Re: So, more human than we would think?

      careful there, it might turn out 'some' people, in 'some' circumstances, repeat the fibs they heard before, defend them with all their might, and then come up with their own fake links to prove they're right...

  8. pip25
    Terminator

    Watched the "trailer" video

    Only seconds apart, the OpenAI people admit that GPT-4 can and will make mistakes, then promptly come up with the idea that we should use it to teach people. What could possibly go wrong?

    1. Anonymous Coward
      Anonymous Coward

      Re: Watched the "trailer" video

      so what' the difference between people, who can make mistakes and whom we use to teach other people? All you need is a somewhat more, narrower verification and - voila - UK gov solves the problem of missing teachers at a fraction of the cost, etc, etc. Repeat across 160 or so countries of the world. First they didn't come for the teachers, they have already come for translators, and probably a few other obscure professions.

      1. Anonymous Coward
        Anonymous Coward

        Re: Watched the "trailer" video

        well, I can easily see GPs in crosshair, another workforce shortage. Given that most suggestions for a variety of issues is: paracetamol / calpol / get x off the shelf, cause cheaper than prescription / rest and drink plenty of fluids / etc. - even chat gpt v 0.1 can handle that.

  9. amanfromMars 1 Silver badge

    :-) Beware and Be Aware of Trojan Horses that Lurk in the Light of Dark Shade and Shadowy Operations

    "It is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it," OpenAI CEO Sam Altman acknowledged, referring to GPT-4.

    "Still seems more impressive” is amusing and quite disarming, isn’t it .... and almost human like in its own coy self-deprecation.

    1. Fruit and Nutcase Silver badge
      Thumb Up

      Re: :-) Beware and Be Aware of Trojan Horses that Lurk in the Light of Dark Shade and Shadowy Ops...

      Ah, an expert opinion

  10. Uncle Slacky Silver badge
    Devil

    > Finally, Brockman set up GPT-4 to analyze 16 pages of US tax code to return the standard deduction for a couple, Alice and Bob, with specific financial circumstances. OpenAI's model responded with the correct answer, along with an explanation of the calculations involved.

    Sueball incoming from tax preparation software companies in 3...2...1...

    1. Ken Hagan Gold badge

      They should feed it the whole code and ask it to find as many contradictions as it can. Then we feed the output to the politicians and tell them to stop prattling on with ideological battles and start fixing the shit that they've foisted upon us.

      Repeat for all tax codes and the respective politicians, obviously.

  11. Anonymous Coward
    Anonymous Coward

    in a single sentence where every word begins with the letter "G."

    "Now all in g! A sonnet, trochaic hexameter, about an old cyclotron who kept sixteen artificial mistresses, blue and radioactive, had four wings, three purple pavilions, two lacquered chests, each containing exactly one thousand medallions bearing the likeness of Czar Murdicog the Head-less..."

    "Grinding gleeful gears, Giggling gynecobalt-6o golems," began the machine, but Trurl leaped to the console, shut off the power and turned, defending the machine with his body.

    - S. Lem, The Cyberiad, 1967

    1. that one in the corner Silver badge

      Re: in a single sentence where every word begins with the letter "G."

      "Don't forget the purple screws, they might come in handy"

      Mr Lem was scarily prescient about problems with AI: a highly recommended read.

      1. Anonymous Coward
        Anonymous Coward

        Re: in a single sentence where every word begins with the letter "G."

        Go-lem, Go-lem! ;)

      2. Arthur the cat Silver badge

        Re: in a single sentence where every word begins with the letter "G."

        Mr Lem was scarily prescient about problems with AI: a highly recommended read.

        If you can get hold of a copy of his non-fictional Summa Technologiae you'll find he was even more prescient than his fiction would suggest. A quote from Amazon (which hasn't got it any more)

        After five decades Summa Technologiae has lost none of its intellectual or critical significance. Indeed, many of Lem’s conjectures about future technologies have now come true: from artificial intelligence, bionics, and nanotechnology to the dangers of information overload, the concept underlying Internet search engines, and the idea of virtual reality. More important for its continued relevance, however, is Lem’s rigorous investigation into the parallel development of biological and technical evolution and his conclusion that technology will outlive humanity.

        1. Anonymous Coward
          Anonymous Coward

          Re: in a single sentence where every word begins with the letter "G."

          well, libgen.rs has you covered! ;)

          1. Arthur the cat Silver badge

            Re: in a single sentence where every word begins with the letter "G."

            My first reaction was "what has a Rust library got to do with it?" Now I know what libgen.rs is, yes you can find copies there. Definitely worth getting hold of to see just how much of modern computer technology Lem predicted in the 60s.

  12. Al fazed
    WTF?

    The problem

    being that 90% of folks couldn't pass a maths exam without cheating or looking up the answers.

    Back in tha days of study it happened that 200 students each year for 10 years gave the wrong answer to a maths question. They all gave the answer that was printed at the back of the book and passed the exam. Unfortunately around 2002 it was discovered that the printed answer was wrong.

    The majority of these students went on the "secure" a job in the industry.

    HELP

    ALF

  13. TheMaskedMan Silver badge

    "GPT-4 can pass a simulated bar exam in the top 10 percent of test takers,"

    That may have been a mistake. Midjourney et al came for the paint splatterers and couloured crayon brigade, gpt came for the writers, there are upcoming toys out there with vocalists, musicians and composers in their sights. None of them really have enough clout to do anything about it, and anyway nobody really cares - why bother with a temperamental arty type when software can make a good enough picture in seconds, for free, without hassle?

    But lawyers are a different breed. They make serious money, they (claim to) understand the law / legal system, and they are not going to be happy that said software can apparently do the same to them as it did to the artists.

    They're cunning, too, in a limited way, and they won't like the implication that maybe they're not as special as they like to pretend if a mere machine can pass their exams. They will brush it off, pretend to be unconcerned, but suddenly the artists will find that lawyers are falling over themselves to take up their cause (without ever admitting that it's now their cause, too, of course).

    Expect litigation against AI companies to skyrocket, along with instant heavyweight lobbying for regulation etc.

    1. Anonymous Coward
      Anonymous Coward

      re. They're cunning, too, in a limited way,

      yes, it's going to be an interesting boxing match, lawyers v. Big Tech. But I don't see how they could slow down this avalanche, let alone stop it. One way would be to argue on the copyright front, but unless they manage to provide a reasonably sound proof 'AI stole this poor artists' potential USD 10b tune!', there's not much they can do. And something tells me lawyers are no more popular with politicians than any lesser mortal. So, unless they convince politicians that AI is gonna remove politicians from the equation, the lawyers are gonna lose like everybody else. And, to speculate further, politicians, like all humans, are short-sighted, they see a low-hanging AI fruit and will try to use it to their own benefit. And once it becomes omnipresent, at that point, even if they suddenly realise it was a grave error, there's no politician in the world able to take it down or even curb the use. Think mobile phones, those evil slabs.

      p.s. when I use the term 'AI', it's only for convenience, you might call it 'chat-bots' or whatever. I don't really care whether it is 'intelligence' or not. If I'm fired because a toaster does my job faster and better, what's the difference if it's intelligent, omnipresent, or dumb. Interesting times ahead.

    2. Brewster's Angle Grinder Silver badge

      How the Butlerian Jihad begins...

      If GPT-4 really is a better lawyer,* lawyers who take up the case will lose...

      * In my limited experience of lawyers and solicitors and judges, there are plenty of low hanging fruit and only a few peaches.

  14. Paul Hovnanian Silver badge

    Can I ...

    ... bring an Internet connected device when I sit for my exam?

    Not really a fair comparison then, is it?

  15. CGBS

    But only if it's answers aren't based on large language conspiracy theories.

    What shape is the Earth:

    A. Spherical

    B. Elliptical

    C. Flat

    D. None of the above

  16. Brewster's Angle Grinder Silver badge
    Mushroom

    Robo sanctions

    In other news, I see Hunt will be using AI to sanction UC claimants. I suspect this is because humans will ultimately give humans the benefit of the doubt, but the AI has no compassion and can be taught to sanction mercilessly. Any mistakes, well, you can appeal to another AI...

    1. Another User

      Re: Robo sanctions

      Don’t worry. You will be rehabilitated. On a Monday night.

  17. Anonymous Coward
    Anonymous Coward

    I need to throw it at our tests for becoming an "Authorised Person" or "Senior Authorised Person", which are the tests that are part of becoming qualified to work on HV equipment in the UK.

    Much of the documentation and explanation that one would need to train on relies on combinations of drawings and written word that cannot be studied independently.

  18. Groo The Wanderer Silver badge

    Your AI systems are hallucinating?

    Maybe the team should stop dropping "Uncle Sid" and giving it to their machines... :)

  19. lamp

    Sophisticated Regurgitation

    is what it seems to me, minus an ethical framework. Original human thought? Could it come up with the differential calculus? Could it have come up with the Xerox Alto in 1973? Never.

  20. Anonymous Coward
    Anonymous Coward

    "OpenAI claims GPT-4 will beat 90% of you in an exam"

    Give me open access to a search engine and I can beat GPT-4 every time, 100%. And GPT-4 has, or has had, that, so comparing it to ordinary person is patentable BS.

    Also it's painfully obvious they *know* it's BS. Pure marketing.

  21. Anonymous Coward
    Anonymous Coward

    " GPT-4 can pass a simulated bar exam in the top 10 percent of test takers"

    So they first feed the questions and correct answers to it and then it can get answers about 90% right. Despite having fed *all* correct answers.

    And the sellers actually believe that's a *good* result? "It's guessing BS only 10% of the time" ... literally.

  22. Kameleonic

    Why are people bent on proving AI to be better than the human brain? Very weird… If I were in that field I would be working to support the human brain, not surpass it.

    1. Anonymous Coward
      Anonymous Coward

      I don't think this proves that AI is better than the human brain, and I don't think that is point of the result...if anything, it proves how simple the tools are that humans use to categorise each other. We already knew that though, the whole academic system is built around tools that the top 1% use to sort the 99%...and keep the 1% where it is.

      We've all worked with highly qualified simpletons before. I've worked with loads personally. So many in fact that I treat a Masters Degree from certain places as a red flag. Not because the places are bad per se, but because the people that roll off the production line there have a massively inflated view of themselves and tend to be quite fragile. Especially when you find flaws in their "lifes work" that they've been working on since year 1 of Uni.

      I found a really basic SQL injection attack vector one of these peoples "projects", guy didn't understand it, was too proud to pay me to fix it (I am a lesser human to him, with no higher level academic paperwork) and he was too embarrassed to disclose it to his business partners not because of the bug itself, but because of who had found it. Pretty sure he initially didn't believe it existed...something about him having a Masters from Cambridge and therefore it being impossible...until I offered to demonstrate it, which resulted in around 30 awkward minutes of listening to him in denial.

      The software was a healthcare practice management platform. His partners eventually did offer me the work to help fix the problem, but "Masters of Cambridge" became an absolute cock to work with, he wouldn't push code to the repo, he wouldn't release new builds to me to test...he went out of his way to ensure it was incredibly difficult for me to test for bugs and vulnerabilities. He was eventually cast adrift and a team of "non-academic" developers and engineers were hired to finish (or rather re-write) the entire product. His credit was therefore completely removed...last I heard (about 10 years ago), he was working in an ISP call centre as a second line support technician...probably works in Starbucks now.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like