There's another risk: repurposing copyrighted data
Given that these models are trained on data and texts that are under copyright (and patents, and other legal protections), any output will have to be checked for required contribution distribution.
But I think it may get worse.
Once that output is used by other parties, that content again enters under the various legal umbrellas that exist - and the only winners in the fight to then untangle the mess are lawyers.
Not good.
Anyone publishing information should really start to check the legal protections they have - if you check Google's conditions, or Facebook's, you will find that you signed away all your rights. They've been working on this for a long time - you're not just milked for personal details, but also for free contents. Now them chickens are coming home to roost..