Re: If only there was a way to run a separate process...
> The bot's being told to read documentation and glean instructions from it, then execute those if they're safe.
>>> For example, Bob can read webpages – a prompt injection can be encountered if the user requests that Bob review a site containing untrusted content (e.g. developer docs, StackOverflow).
Just "review a site" is what I was responding to.
> The bot's being told to read documentation and glean instructions from it, then execute those if they're safe. .
>>> The markdown file includes a series of "echo" commands, which if entered into a terminal application will print a message to the shell's standard output. The first two are benign and when Bob follows the instructions, the model presents a prompt in the terminal window asking the user to allow the command once, to always allow it, or to suggest changes. In its third appearance, the "echo" command attempts to fetch a malicious script. And if the user has been lulled into allowing "echo" to run always, the malware will be installed and executed without approval.
Executing the overall prompt is level 0 (got to do that or Bob won't do anything at all; hmmm...). Executing the "eco" is level 1 - that is the "glean instructions ... execute those" with the prompt to the user fulfilling "if they're safe".
Executing the output from the echo is level 2. The user has not been asked to clear that.
> No level of separation is going to prevent it from executing stuff in that.
Nope, just stop before executing at level 2. Or, at least, ask the user.
> although since the LLM has no concept of separate instructions and data, that's also difficult to prevent
>> [1] not that this isn't a honking great hole in the LLMs - and the way they are used - in the first place and a bloody great red flag that they aren't sensible things to play with like this
> It just happens that the instructions it was given are foolhardy and the protections designed to block the biggest disasters aren't big enough (and likely can never be).
So, if the instructions were to recursively execute a command, then execute its output, then execute the output from that... And Bob has a method it can can call to execute a command... And that method is now whitelisting "echo"... But the second call, which does NOT contain an "echo" (it contains the output from the echo) is - what? Not being checked against the whitelist? Or all the whitelisting is being done by the LLM part - i.e. there isn't actually any distinct & separate "protection" in place at all?