"the ability to generate SQL from text input"
HR departments will need to avoid any candidates called Robert Tables.
Another day at AWS re:Invent and yet more talk of artifical intelligence dominated, with a senior executive taking to the stage to wax lyrical about the impact of vector databases on the tech and more. Dr Swami Sivasubramanian, AWS VP of Data and AI, gave the official AI keynote at re:Invent in Las Vegas, a day after AWS CEO …
That's the least of their worries. We've tried something similar in the recent past, and the model had such a high error rate (generating either syntactically or semantically inaccurate queries) that it never really got anywhere. All language model outputs currently require human validation. Any service that tries to do otherwise is... premature at best.
Back in the day, a few management types figured out that if they asked me nicely, I could answer questions like "How many $things of $type happened in $timewindow in $areacode". We only had the prod DB - no reporting replica.
Once or twice a month, I would see one of the DBAs stand up and walk over to my desk to ask what the fuck I was running cos it was maxing out the CPU. Happy days.
... Seriously. Spitting out PII such as email addresses and telephone numbers, and whole expanses of undigested verbatim training material.
Guardrails might be much more important... they may be required to stop your LLM regurgitating material it shouldn't.
Even if you haven't got time for anything else, scroll down to the first embedded video for a laugh out loud moment at the attack used.
https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html
They say training data is only 1% of the outputs they've got. However it raises the question of whether other prompts would extract more and more of the training data. It seems to knock on the head any copyright defence that they're not really storing the training data.