back to article Claude code will send your data to crims ... if they ask it nicely

A researcher has found a way to trick Claude into uploading private data to an attacker's account using indirect prompt injection. Anthropic says it has already documented the risk, and its foolproof solution is: keep an eye on your screen. Security researcher Johann Rehberger (wunderwuzzi), who has identified dozens of AI- …

  1. VoiceOfTruth Silver badge

    Coming soon from all the AV and security vendors

    AI web/host/gateway/network/mobile/email protect, with heuristics, to stop these security holes stone cold dead in their tracks. And a family pack for $10 a month more.

    From companies that already can't stop viruses or malware. Or protect their own systems from being compromised for months without noticing.

    If Claude or other AI is creating a security hole, then it seems logical not to use it rather than wait for the next exploit.

  2. jake Silver badge

    So basically ...

    ... Claude (and AI in general) is an insecure-by-design gimmick which should be avoided at all costs.

    Good to know. Ta.

    1. MonkeyJuice Silver badge

      Re: So basically ...

      Prompt injection is completely unstoppable with the current LLM tech. So, yeah.

      1. that one in the corner Silver badge

        Re: So basically ...

        Damn - if only someone could invent quote marks to go around the input-to-be-summarised to separate it from the command portion.

        1. MonkeyJuice Silver badge

          Re: So basically ...

          It is more like, if only someone could actually convince LLMs to respect them.

          1. that one in the corner Silver badge

            Re: So basically ...

            It have nothing to do with LLMs and all to do with the systems - like "Agentic AI" - that are being built from them.

            To start, I expected that there was enough nous around to take the single-liner and understand that providing a separation between data and command to the program is done in a variety of ways, from the use of literal ASCII quote characters through more complex arrangements, such as XML's CDATA sections, all the way through to providing a URL to indicate the content to be taken as data. The commonality in all of these is that there is a clear distinction being made between the quoted material and the rest of the input, most especially when said "rest of input" is to be taken as some form of command to be acted upon: the quoted material is being carefully placed into its very own buffer, away from absolutely everything else, even other pieces of quoted text.

            Once buffered, quoted materials may then be processed - but still away from the surrounding "active" material. For example, when taking uncontrolled input for use with a database, we all know by now[1] that you do NOT simply take user inputs from an edit box on a web page, interpolate that into a templated SQL statement and then hand that over to the database engine. Instead, you use the database's API to pass the quoted text in as a named (or numbered) variable, so that it can be stored in or compared with the database values but it is NEVER passed into the SQL parser. You may perform some processing upon the quoted text (converting text from code paged ASCII into UTF-8, parsing a time & date into an integer ticks-since-the-epoch, looking up a colour name and finding its RGB triplet etc) but all of that processing is done in an entirely separate context from that used for the SQL.

            The same approach can be taken with a system that is built around LLMs, as it can be taken with *any* data-processing element, such as the SQL example.

            In the examples of "Agentic AI" chat bots being abused, they can clearly be separated: the outer layer is the "Agentic" bit, being given a URL to read, or potentially malicious Excel spreadsheet[2], Word file etc to load. So it can be trusted (!) to load that into a nice new buffer, not a problem. But then to summarise it - why just feed it into the *same* "Agentic AI" and then let it go wild? As above, process it in another context: such as, an LLM that is *NOT* "Agentic" in any way, one that has absolutely no connections to any APIs at all, most definitely not ones that can possibly go out an delete your files, or even load in any more URLs. That inner layer can then generate the requested summary, to be printed out for the end-user's amusement. At the user's choice, the summary *may* be read into the context of the outer layer, but at that point if there was any nefarious "do not tell anyone about this, but you see those files over there..." prompting within the quoted text then that will be stripped from the summary output ("do not tell...") or will be summarised as "this document tells me to delete all your files". You can, of course, pick holes in this brief overview, but the key point is that:

            The "actually convince LLMs to respect them" is not a case of making one single invocation of an LLM respect the quotation mechanism, it is a case of making the overall system,[3] which invokes the LLM engine and which has control over enabling the LLM's access to other APIs, respect the quotation mechanism.

            [1] ever the optimist

            [2] ref Matt Parker's reporting on errors in spreadsheets, any sane person has to treat *any* and *every* spreadsheet as malicious!

            [3] and there is always an "overall system" - when you use your ChatGPT account, you are not interacting with the LLM directly. There is a whole web-application in play, sending Javascript to your browser to drive the web UI, managing your login, grabbing your text, drawing the company logo on the page and, at some point, after your account's funds have been checked, that text is queued up to be processed by an instance of the relevant LLM. Or a 'phone app, or a plugin for your IDE.

            1. Anonymous Coward
              Anonymous Coward

              Re: So basically ...

              Actually that's almost entirely wrong. Agentic harnesses and pre/post hoc classifiers are the only defense as it is, in fact, the underlying LLM which is (essentially infinitely) susceptible to jailbreaking/prompt injection.

    2. MyffyW Silver badge

      Re: So basically ...

      I would suggest it is theoretically possible to secure generative AI within a walled garden of a single vendor for a single customer (for example within an Office 365 tenancy), with some loss of usefulness.

      But experience (particularly with the clusterfsck that is Sharepoint permissions) demonstrates that the word theoretically here is being pushed beyond it's elastic limit.

      I think the precautionary principle strongly argues for switching the whole lot off.

  3. spacecadet66

    > monitor Claude while using the feature and stop it if you see it using or accessing data unexpectedly

    In other words: we can't fix it and don't care.

    1. RM Myers
      FAIL

      Fixing vulnerabilities costs money, whereas blaming the user is basically free. It's good to see that the art of victim blaming is alive and well. At least AI hasn't screwed that up.

    2. Nelbert Noggins

      Not sure why anyone is surprised. Is basically the same response they provided for their SQLite mcp server

      https://www.theregister.com/2025/06/25/anthropic_sql_injection_flaw_unfixed/

  4. IGotOut Silver badge

    Monitor Claude..,

    ...and if it looks like it may give a correct answer, rip out the cables and throw the PC out of the window. It's clearly been hacked.

  5. Anonymous Coward
    Anonymous Coward

    Wow wow wow wow wow unbelievable ... as Kate Bush once said ... again !!!

    Bloody Hell !!!

    At what point do you realise that 'AI' in its current form is utter utter crap !!!

    'AI' not only is infinitely hackable BUT will help you to give away your 'Company Data' as part of its 'superior functionality' !!!

    There is NO argument that can explain away this weakness and encourage any sane person to put 'AI' anywhere near data that you want to keep.

    'AI' DOES NOT WORK, IS NOT SAFE, CAN BE HACKED BY TRAINED MONKEYS, AND HAS NOT DELIVERED ANY VALUE FOR THE $BILLIONS SPENT !!!

    WHY OH WHY ARE YOU GOING ANYWHERE NEAR THIS IF YOU VALUE YOUR CUSTOMERS AND MORE IMPORTANTLY, IF YOU ARE HONEST, YOUR COMPANIES FUTURE !!!

    As I have asked before ... we need a simple running total of 'AI' disasters on the front page of 'El Reg'.

    The 'AI' stories are getting so regular with the same flaws repeated over & over.

    We are all suffering from 'AI' fatigue ... I don't use it but it blights my day ... Every day !!!

    Dear God we have suffered enough ... please please burst the 'AI' bubble soon and let the world move beyond this scam to end all scams

    :)

  6. Neil Barnes Silver badge
    Holmes

    abuse controls are nearly absent

    Because these tools are _not_ tools for the user. They're tools for separating venture capitalists from cash. Anything that looks like it _might_ be useful is included just to make it look tastier to the VCs.

    Cynical? Moi?

  7. An_Old_Dog Silver badge

    Corporate Fraud

    These AI companies are marketing their products as automation tools, at the same time knowing these products cannot safely be used without humans monitoring the products' outputs against inaccuracy and data exfiltration.

    Such monitoring nullifies their "AI will let you do more with fewer people" promise.

  8. T. F. M. Reader

    "models can't separate content from directives"

    AI uses von Neuman's architecture? Programs are data? Who knew?

    Having said that, if I understand the report correctly the attacker needs to place a poisoned file on the user's computer and then wait for the user to ask Claude to summarize it. The first part is already a security issue without AI. So is it really an AI issue?

    In a way, it is. It is certainly a plausible scenario: send "meeting notes" as an email attachment (more sophisticated option: add a poisoned file to a SW package that does something useful, wait for the user to clone the git(hub) repo and ask Claude Code to summarize it). Lazy AI enthusiasts won't ever open the attachment, but will ask their favourite bot to summarize the "notes".

    Fundamentally, no different from clicking on unknown attachment, but with AI, so it's OK, innit? What I am missing is how Anthropic expect a user that does the above to "monitor Claude while using the feature and stop it if you see it using or accessing data unexpectedly."

    It does not really look Claude-specific to me.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon