The following Training Data Poisoning scenario is proposed
> "A malicious actor, or a competitor brand intentionally creates inaccurate or malicious documents which are targeted at a model’s training data. The victim model trains using falsified information which is reflected in outputs of generative AI prompts to its consumers."
Which neatly puts all the blame onto that mean old competitor brand - when *all* the blame would lie on the shoulders of the idiots who just sucked up every random bit of garbage they could find to use in their training set.
In comparison, suppose we heard that the FBI and CIA announce that, acting upon information they received from a bound manuscript found lying on a park bench, they are creating a major joint taskforce to hunt down a "Mr Scaramanga"; this individual is described as an internationally wanted assassin who is believed to use a custom weapon assembled from a gold pen and a gold cigarette lighter. Would we blame Ian Fleming for deliberately misleading the Forces of Law and Order or should the finger be pointed at whoever picked up a discarded paperback and dropped it into the case files?
Addendum: the PDF does mention "training the model on unverified data" *but* that is treated as a separate example from the situation above.