back to article OpenAI unveils deep research agent for ChatGPT

OpenAI today launched deep research in ChatGPT, a new agent that takes a little longer to perform a deeper dive into the web to come up with a response to a query. According to OpenAI, the new agent will "find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research …

  1. theOtherJT Silver badge

    OpenAI deep research managed an accuracy of 26.6 percent.

    So what you're saying is that it was wrong pretty much 3/4 of the time. That's... not terribly encouraging, is it? I mean, yes, it's a lot better than three to four percent... but it doesn't really matter if it can "accomplishe(s) in tens of minutes what would take a human many hours" if three out of four times it does it wrong. That isn't, in any meaningful sense, doing the work.

    1. Brewster's Angle Grinder Silver badge
      Joke

      Re: OpenAI deep research managed an accuracy of 26.6 percent.

      But if you're a marketer, it's 8 times more accurate than the previous version...

    2. Eclectic Man Silver badge

      Re: OpenAI deep research managed an accuracy of 26.6 percent.

      I had look at the link to 'Humanity's Last Exam', and frankly, I don't think I'd get 1% of them right. So, given a response by this new 'deep research agent', I'd probably have no idea whether to accept what it said.

      Where is the ROTM icon when you need it?

      1. I ain't Spartacus Gold badge
        Black Helicopters

        Re: OpenAI deep research managed an accuracy of 26.6 percent.

        I had a quick look - there's a bunch of questions I could easily answer, with just a few minutes of looking up. There are others that I don't have even the basic knowledge of the subject matter to known how complex the answer is, without first having to do some research into it. But I'd expect to be able to do better than 1% - although can't evaluate how much better without spending time analysing the questions.

        But the real problem with "AI" - or at least the current trend for using large language models and claiming it's AI - is that theyre designed to put text together in realistic sounding ways. Had you actually designed something to do research, then you'd design it to look for information and then tell you if it's found anything or not. Even if it fails to find the answer, if it's able to give you its search history, you should at least have a map of useful places to look (or to avoid looking). But we're not using research toools, we're using language simulation models. So what the thing does is spit out some plausible sounding text - and if you're lucky it's actually based on the information you were looking for.

        At the end of which process you now have a document containing some correct information, some partially correct information and some totally made up bollocks. But no way to tell which is which , without doing the research yourself anyway! What we don't have is a tool capable of looking at a question and answering it - we have a tool designed to look at a question and give a plausible sounding answer that might (or might not) be based on searching the internet or some curated training data. Further mediated through a set of bolted on filters (guard-rails) designed to stop the output being obviously racist, stupid or dangerous - with unknown effects on the output.

        I'm still annoyed that he (she/it/they) broke the rotor blades though. Even if JPL were interrupting your BBQ?

        It's not even reached the level of amanfrommars1's output yet. And he (she/it/they) is currently broadcasting from a place with no running water, while being buzzed with helicopters and dodging NASA's laser-armed space tank.

        1. Eclectic Man Silver badge

          Re: OpenAI deep research managed an accuracy of 26.6 percent.

          Specific instances of AI can be genuinely useful.

          "AI helps researchers read ancient scroll burned to a crisp in Vesuvius eruption"

          "The papyrus, known as PHerc. 172, is one of three Herculaneum scrolls housed at the Bodleian libraries. The document was virtually unrolled on a computer, revealing multiple columns of text which scholars at Oxford have now begun to read. One word written in Ancient Greek, διατροπή, meaning disgust, appears twice within a few columns of text, they said."

          From: https://www.theguardian.com/science/2025/feb/05/ai-helps-researchers-read-ancient-scroll-burned-to-a-crisp-in-vesuvius-eruption

  2. GidaBrasti
    Angel

    A breakthrough indeed

    So "... It uses a version of the company's upcoming o3 model to trawl the internet for information,..." it will basically call a search engine behind the scenes ?

    That's so intelligent!

    1. LionelB Silver badge

      Re: A breakthrough indeed

      I am a research scientist. A large proportion of my working time is spent "trawling the internet" (well, some rather specific parts of it) for information relevant to the problem at hand. Before the internet I used libraries.

  3. captain veg Silver badge

    a deeper dive into the web to come up with a response to a query

    There's the problem right there.

    On aggregate "the web" is an excellent source of mindless crap, common error*, irrational confirmation bias and deliberate misinformation.

    What could possibly go wrong?

    -A.

    *A now ex-boss once tried to persuade me that advertize, with a Z, was a valid spelling of advertise, on the basis of the number of search engine hits.

  4. Zolko Silver badge

    I think I understand why ...

    ... the Chinese beat the US here : US companies seem to try to come up with some sort of über-intelligent computer, a general artificial intelligence that can behave like a human in most tasks. So they have to train their "schoolboy" everything possible on Internet, which includes a lot of crap. Whereas the Chinese don't give a s****t and only want to do something useful, and so train their "schoolboy" only on useful stuff, like coding and math, which is both smaller in quantity and easier to get right, because there are objectively correct answers. So while the US AI might know more things in general, a lot of it is wrong but it doesn't know which part; whereas the Chinese AI will know less things but those will be correct.

    I mean who the f***k cares about :

    Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.

    honestly, this can only be described as a dick-measuring contest.

    1. theOtherJT Silver badge

      Re: I think I understand why ...

      Thing is, that's a legitimate question to which there is a clear and correct answer.

      My mother was a librarian for 40 years. You ask her that question and she'd say, ok, that's a question about biology, and it's a really specific question at that, so forget encyclopedia, you're going to need a book about birds in particular, that's going to be in this section, I'm going to recommend the following based on their entries in our catalogue.

      If pressed, she'd be able to find those, parse their indexes, and either find the correct answer or tell you that the answer wasn't available within the books they had in their collection, so she's going to recommend that you contact a number of specialist collections - possibly there's going to be something more useful in the Radcliffe Science Library, and we can get it on inter-library loan from Oxford.

      One way or another you'd get that information - and if the public library actually had access to the complete text of everything in one of the reference collections like the one at the Bodleian or the British Library a qualified librarian would definitely be able to get you an answer.

      Given the utterly vast amount of information that these "AI" have been trained on, there's actually a decent chance that the answer is in there somewhere, but it's not finding it. The problem is they don't understand the question!

      They're just gluing sentences together out of probabilistically linked fragments. They have no idea if the answer is right or not, or how one discerns a correct answer from an incorrect one. This is the difference between genuine knowledge and... well, whatever the fuck this is.

      1. Decay

        Re: I think I understand why ...

        And equally importantly your mother didn't claim to be the fount of knowledge on all things ornithological. Just that she knew how to get the information you were looking for. And I would guess having found the information still wouldn't claim to be any the wiser other than she could have read the answer with about the same level of understanding any lay person would.

      2. Zolko Silver badge

        Re: I think I understand why ...

        that's a legitimate question ...

        In what way ? What useful information does the – supposedly correct – answer provide you ? What good would it do to you or anybody in the world if we knew for sure that hummingbirds have 3 pairs of tendons attached to ovaloïd bones ?

        ... to which there is a clear and correct answer

        Are we sure ? Let's try to find out:

        - a bilaterally paired oval bone...

        how many bones does that make : 2 (a pair, one on each side) or 4 (a pair on each side) ?

        - How many paired tendons...

        do we count tendons by pairs (so 2 tendons gives 1 pair) or do we count each tendon, even if they come by pairs ?

        So the answer is either something that someone has counted by opening a hummingbird, meaning that it's not intelligence but encyclopedic knowledge, or it's a logical problem in which case it depends on how you read the "pairs" : in this case the logical answer can be 1, 2 or 4.

        What I wanted to point out is that OpenAI is claiming that these are important questions in "Humanity's Last Exam" on which they measure the intelligence of their product. Which means that they have lost the war before it even begun. Their thought process is flawed.

        1. I ain't Spartacus Gold badge

          Re: I think I understand why ...

          Zolko,

          Humanity's Last Exam was designed as a benchmark. Specifically a benchmark for testing "AI". Which is odd, given we don't have any AI, we just have LLMs. But that's a different question.

          Whether tuning some sort of LLM-based research bot to answer the questions in that test is a useful measure is another question. I'd suspect the answer is no. But an accurate, automated research assistant would actually be a useful thing.

          1. mjflory

            Re: I think I understand why ...

            Is that a European or an African hummingbird?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like