As I understand it, the problem with "AI agents" in their current guise is that to an LLM all inputs are prompts, i.e. they can't (because they simply don't) distinguish between data and instructions. I suspect the more guardrails you put in place to try to limit prompt attacks, the less flexible the system becomes.
Ultimately I think the only way to keep them from "running wild" is to simply not use them.