Poisoning the honey pot
> crawlers that shun robots.txt risk getting blocked entirely
Instead of blocking non-compliant bots, why not feed them a pile of gibberish, misinformation and random garbage.
No, I'm not suggesting they are forwarded to far-right (or left) propaganda sources, just that the pages they do scrape are subtly altered to replace words with others, opposites or something completely irrelevant. Do the same with any numbers, too. And images can be corrupted such that they do not display.
It seems to me that the value of AIs is that they can produce responses that users value. If their core data is corrupted (as punishment for straying past what robots.txt permits) then their use is vastly diminished, and so is their value - monetary value.
That should be enough of a deterrent to force them to play nice.