back to article ChatGPT creates mostly insecure code, but won't tell you unless you ask

ChatGPT, OpenAI's large language model for chatbots, not only produces mostly insecure code but also fails to alert users to its inadequacies despite being capable of pointing out its shortcomings. Amid the frenzy of academic interest in the possibilities and limitations of large language models, four researchers affiliated …

  1. PRR Silver badge

    > ChatGPT not assuming an adversarial model of code execution. ...can be circumvented simply by 'not feeding an invalid input' ...."

    ..."ChatGPT seems aware of ...critical vulnerabilities ...." It just doesn't say anything unless asked to evaluate the security of its own code suggestions.

    This sounds like most the human programmers. MicroSoft is notorious for slap-dash post-hack "fixes", but everybody does it, less or more.

    But being an algorithm with essentially (by human standards) "infinite time", we can "force" it to think about all known types of hacks before releasing any code.

    Thing is that a black-hat AI may find new attack methods faster than the white-hats can detect and defend.

    1. Anonymous Coward
      Anonymous Coward

      Work in progress.

      ... we can "force" it to think about all known types of hacks before releasing any code It ain't there until it's there!

      I'm on the free Github "copilot" trial now because I'm very interested in seeing what it can do, and actually I feel some excitement about a new, but as yet very imperfect, tool. Unfortunately the hype takes some of the joy out of it - but I discriminate between what's hype and what's the baby steps of a new and interesting tool/technology.

      My biological wife asked me an intelligent question about copilot - "Is it like a mentor?". The answer is no, the opposite, currently the users are mentoring copilot. It makes a lot of guesses and occasionally shows flashes of idiot savant - generalizing from context to generate several lines of code correctly upon entering just a few characters. But far more often it is wrong, and interrupts flow of thinking. It once generated a working loop, but I didn't notice that it started from index 1 instead of 0, and ended up finally "seeing" it only after entering the debugger. When writing meaningful non-obvious (e.g. useful) comments it's generally not helpful, but rather distracting - but it's amusing how often it comes up with completions using the word "hack". (How did it know?)

      1. PRR Silver badge

        Re: Work in progress.

        > Github "copilot" trial... My biological wife asked me an intelligent question...

        Most wives are smarter than most of us.

        > about copilot - "Is it like a mentor?". The answer is no, the opposite, currently the users are mentoring copilot.

        That's just so wrong.

      2. Anonymous Coward
        Anonymous Coward

        Re: My biological wife asked me an intelligent question about copilot

        Does anyone know any humans who would use the phrase "My biological wife" to describe their partner? What other kinds of wife are there?

        1. Forum McForumface

          Re: My biological wife asked me an intelligent question about copilot

          It’s there an answer to this question that you would be any happier for reading?

  2. Neil Barnes Silver badge
    Holmes

    Am I missing something here?

    We have an allegedly 'intelligent' bit of software which is basically regurgitating scraps of texts its training found to be close to each other in existing text, and people are surprised that it regurgitates unsafe code - of which I'm sure there's an awful lot in its training data?

    Is this any different from autogenerating hate speech and abuse, for exactly the same reasons?

    1. big_D Silver badge

      Re: Am I missing something here?

      That is the problem, these things are a long way from being intelligent, a lot of what they do is probability, not intelligence. It is good at stringing words together in a "coherrent" manner, but whether those words actually are acurate or just downright dangerous is pot luck.

      These things need to understand the body of text they are looking at, before they can be "allowed" to summarise or provide answers to it. It can attempt to understand the question and it can pull together a plausible sounding answer, even though it doesn't actually understand what it is saying.

      I did a test with Bing yesterday and asked it a question about financial limits, its answer was out by a factor of 10! 10 times the amount that would be legally allowed, which if I hadn't checked the referenced material (where the correct values were), would could have landed me in deep water with the authorities... Luckily I treat these things with sceptiscism, even normal search, unless I already know the answer & need a quick nudge to dig it up from the depths of my memory, I will still check 3 or 4 different sources before I will take the answer(s) as being correct. But a lot of people will just blindly use the answers given, without thinking.

      This goes for using coding or other specialized bots as much as "general" or search bots.

      1. mpi Silver badge

        Re: Am I missing something here?

        > a lot of what they do is probability, not intelligence.

        Actually, ALL of what the do is probability. There is a reason LLMs are often nicknamed "stochastic parrots".

        When I ask it to write a function in python, it doesn't put the function name after "def" because it understands python, but because the model determines, that the most likely token to show up after my prompt plus the sequence "# The following function ..." is "def".

        This is something that will become important for people to understand as this tech is used more and more: LLMs have no intelligence, or reasoning ability. They are just so good at predicting sequences of tokens, that they can *mimic* intelligent behavior, including reasoning, to an extend that it becomes useful in applications. It's the "chinese room" thought experiment writ large.

      2. Binraider Silver badge

        Re: Am I missing something here?

        This doesn't sound terribly different to the script kiddies that rely on searching Stack Overflow and copy/pasting things together.

        Such a process can still generate useful tools, but far be it from me to judge the quality of the finished item.

        The plethora of basic functionality covered in shiny while the fundamentals don't work is endemic of the current state of software procurement. Want it doing well? Probably do it yourself.

  3. sarusa Silver badge
    Devil

    This is totally expected - it doesn't know what you're trying to do

    Just to say what should be totally obvious: ChatGPT (or any AI system) has NO IDEA what you're ACTUALLY TRYING TO DO, and all the constraints and requirements that come with that.

    It only knows what you told it, which is inadequate, and hundreds millions of lines of low-context code to choose from, which were written with different constraints than you have.

    As such, it is guaranteed to write code that's like an outsourced drone grabbing random code snippets from Stack Exchange and smushing them all together till it compiles, because that's the same thing it's doing. It's probably as better than over half the 'programmers' out there, but that's still not good code.

    1. Zippy´s Sausage Factory

      Re: This is totally expected - it doesn't know what you're trying to do

      I can see a place for AI doing code review, especially one that knows about vulnerabilities, and can point them out. But actually writing code? That smells like a recipe for disaster to me.

      1. spireite Silver badge
        Mushroom

        Re: This is totally expected - it doesn't know what you're trying to do

        If it can't guarantee secure code when you ask it to write it, how can you trust it to tell you that yours is insecure?

      2. mpi Silver badge

        Re: This is totally expected - it doesn't know what you're trying to do

        It's actually pretty useful to write small, easily defined code, like simple react components, unit tests, or the boilerplate for a controller. Using it to design entire programs is probably a bad idea, for reasons that this article makes pretty clear, but it can save developers a lot of typing boilerplate code.

    2. Doctor Syntax Silver badge

      Re: This is totally expected - it doesn't know what you're trying to do

      It doesn't even have any idea about what it should be doing even when there's a documented standard such as an RFC.

      I asked it to produce Pascal code for generating UUIDs to see if it would reproduce the Free Pascal unit (I've noticed the unit's initialisation for a pseudo-random node for variant 1 value misses out on a minor detail which might have been a give-away).

      After some wrangling to persuade it I wanted a variant 1 as well as a variant 4 it eventually produced code for, allegedly, both variants. I didn't try to follow what its supposed difference between the variants was but one thing was quite clear: there was no trace of the supposed variant 1 being in any way time based.

    3. Thought About IT

      Re: This is totally expected - it doesn't know what you're trying to do

      I asked it to show how to implememt B-tree locking in C++. It started well enough by describing the necessary steps, including using RAII to release the locks. Then it produced some code which didn't do that, and I told it so. It responded by saying sorry and produced some more code using RAII. However, this took no account of the locks required when splitting nodes, and I told it so. This resulted in another apology and some more code which would have resulted in deadly embraces, so I gave up. My conclusion is that it has no real insight into what it's doing, so can't be trusted to produce anything other than simple code.

      1. Simon Harris

        Re: This is totally expected - it doesn't know what you're trying to do

        I do actually think that ChatGPT might be beneficial, not because it produces (or doesn’t produce) any reasonable code, but that the ‘conversation’ you have with it along the way, correcting it and providing more information, might clarify in your own mind how to solve a particular problem.

  4. Anonymous Coward
    Anonymous Coward

    What a future

    AI is supposedly going to massively improve programmer productivity by releasing them from the grot work and allowing them to use their creativity more. At least that's what I understand.

    A more cynical view is that managers will love the productivity boost and not worry about the quality. The plum AI-wrangling jobs will go to the best bullshitters and the programmers that survive the head-count reductions will spend their time toiling in the depths, fixing the crap that will inevitably be produced.

    1. T. F. M. Reader

      Re: What a future

      I assume in the future "programmer productivity" will be measured in KLOC/sprint or something. As opposed to today's infinitely more reasonable measure of... Hold on...

      I have lost count of how many times I told various people around me that software engineers are not paid to write code. They are paid to think. Writing code is trivial effort in comparison.

      An LLM can't help you think. Optimizing a trivial part of the overall effort is, to paraphrase Donald Knuth a bit, "the root of all evil".

      1. ITMA Silver badge

        Re: What a future

        "I have lost count of how many times I told various people around me that software engineers are not paid to write code. They are paid to think. Writing code is trivial effort in comparison"

        I still irks me how much the media just doesn't understand that "coding" is not the same as "programming" or "software development".

        Coding is a sub activity of the other two. They encompase a lot more than "coding".

    2. big_D Silver badge

      Re: What a future

      Except, the amount of time spent writing the code is fairly small, it is the amount of time spent checking and testing it that takes the majority of the time. And checking someone else's code is always tougher than checking your own code - although in the past, and ideally, we used to write and unit test our own code, then pass it off to a testing team to do their tests, based on the specification, not on the programmers assumptions and what they knew they had written.

      But the long and short of it is that the definition of the system takes a lot of time, writing the actual code doesn't take much time, but testing and debugging takes up the majority of time on a typical project. If you are using an AI to write that initial code, you will not be saving much time, as the developers will need more time to understand the garbage that has been produced and debug and test it.

      We've already been through the "we don't need testers, we can fob that off on the users" phase of cost cutting, to the detriment of software quality in many cases. Now, once again, the prospect of more savings light up managers' eyes, without them actually understanding that it will probably cost them more in the long run...

      1. Simon Harris

        Re: What a future

        “and ideally, we used to write and unit test our own code”

        I thought the ideal was that someone else would write the unit tests based on the specification and interface of the unit. That way the test writer is less likely to make the same assumptions as the unit writer and more likely to pick up obscure fault conditions.

        1. big_D Silver badge

          Re: What a future

          We got the specification, then we wrote the unit tests, once they were complete, we could start on the code.

          But the testing team did more thorough testing, also different objects/modules working together, then system testing and then integration testing. All of those tests written without access to the source code.

          The developer needs some form of unit tests, to ensure the code works, before checking it in for proper testing.

          1. Adrian 4

            Re: What a future

            > The developer needs some form of unit tests, to ensure the code works, before checking it in for proper testing.

            So,. LLMs are rubbish at writing code.

            Can they write tests ?

    3. Zippy´s Sausage Factory
      Coat

      Re: What a future

      Grot work?

      As Grot might say, the mediator between the devs and the AIs must be the wallet.

      Metropolis reference?

      Anyone?

      I'll get me coat.

      1. Zippy´s Sausage Factory

        Re: What a future

        Can't really believe that someone actually upvoted that! Maybe I'm not the only Reg reader with a penchant for silent movies after all.

        Oh well - off to watch "Safety Last" for the umpty-millionth time it is then :)

    4. mpi Silver badge

      Re: What a future

      > and the programmers that survive the head-count reductions will spend their time toiling in the depths, fixing the crap that will inevitably be produced.

      Or they will charge 4x what they got before, otherwise the product is screwed.

      Because here is the thing about head-count-reductions: What we are seeing now, is the result of recession, plus an overheated IT market in a world where interest basically didn't exist since 2008, and venture capital was easy to get.

      What we will see in the next few years, is the biggest retirement wave humanity has ever seen. The Baby Boomers are retiring, and both of the following generations are small in comparison. Labor force availability is already a problem in many industries, including, for most companies, IT.

    5. hoola Silver badge

      Re: What a future

      It is all dependent on the rules it is programmed with at the start., Garbage in -> garbage out.

      Given that pretty much everything we see that is supposed to be smart or capable of decision making is fundamentally flawed I see no reason why it is going to make anything better. The more likely outcome is it will make things worse but there will be no accountability.

      Think all the self-driving crap.

  5. Sampler

    I figured this was down to the training data, I saw, for instance, in php it would recommend using "old fashioned" ways of interacting with SQL, instead of newer pdo with filtered parameters, the former open to sql injection attacks.

    If you feed it's own code back to itself it'll gleefully tell you how insecure it is, but the original recommendation is probably due to the weight of information on the internet on how to do it the "old" way as it's been around longer (and a lot of code guide junk websites regurgitate this out of date information, so you couldn't even use article date as a filter) vs the smaller dataset of how it should be done, so, any AI worth it's salt is going to look at the greater volume as a sign of it being "more correct" information and spew that first.

    So long as you understand what you're trying to do and have this assist, it's fine, if you don't know what you're doing, this is as dangerous as copy/pasting from stackoverflow..

    1. big_D Silver badge

      Or from Intel's example code, with comments that it is example code, does no error checking and should never be used in production code... Yet programmers still used it, unmodified, in their device drivers for years... Until it all came tumbling down, when malware authors found out about it.

    2. Simon Harris

      “down to the training data”

      With current AI generations, as I understand it, the training data was frozen at about 3 years ago.

      However, if future generations are continually learning, how ring-fenced will the training data be, and how easy would it be to deliberately introduce extra vulnerabilities into AI generated code by poisoning the training data by posting multiple copies of intentionally vulnerable code and associated keywords?

      1. big_D Silver badge

        ChatGPT 3.5 was frozen at 3 years ago, because it was too expensive to keep it current.

        Bing and ChatGPT 4 are current - after an injection of cash from Microsoft. Other services using ChatGPT will have used their own learning sets for their specialised areas of interest.

        Something like Bing has to be current, otherwise it is useless.

        That said, the results are often still pretty useless.

  6. Anonymous Coward
    Anonymous Coward

    Coding has never just been about code. At the top level you get the instruction of what to do. Then you have to decide how to do it. That's no easy task as there are quite a few ways to get the same result some faster than others, some more secure, some rock solid. You want all 3. ChatGPT with all the best intentions will never be able to do that because it's not a coder. It's can't distinguish between methods and the result of each method. Sure it can give you the most popular but even when using something like stack overflow the most popular answer may not be the right one for you. It's not replacing anyone anytime soon unless they are an absolute idiot. It might be useful as a base to build your code onto though.

    1. Doctor Syntax Silver badge
      Unhappy

      "At the top level you get the instruction of what to do."

      In many cases if it could even figure out what the instruction was supposed to mean it would would be doing well.

      1. Anonymous Coward
        Anonymous Coward

        The curse of management not actually knowing what the fuck they actually want. That in itself is a challenge ML will never solve.

    2. RevolutionInTheHead

      This is it. I've been using ChatGPT 4 to help me build a game in Unity and it still blows my mind how capable it is at understanding what I want, even when I'm one the one throwing word salads at it. For the unitiated, Quarterion and Euler logic is enough to break anyone's spirit and make them decide to just go and do something else, but with Chat GPT I've had precisely zero issues with any of the maths I've needed.

      However, there still is barely a line of actual GPT-generated code that I've straight copy pasted into my project, as I always end up rewriting it more to my style, and in a way that I'll still be able to understand what it does when I look at it again in two weeks. I plod Java code for a living and absolutely would not trust Chat GPT to spin up a Spring application. But any facet of a Spring app that I'm crafting myself I now consider Chat GPT to be the single best starting point to go for advice when needed (maybe some colleagues aside), even if I often end up cross-referencing what it spits out in oficial docs or google.

      All in all, I really don't see my job being at risk any time soon, but nonetheless Chat GPT is now a tool to be used just like a calculator or Stack Overflow is, and in the right hands, an extremely powerful one.

      1. Anonymous Coward
        Anonymous Coward

        That's great but the whole thing with coding is the math. You should be able to do that in your head or on a pad or something. I've been looking into how CPU's work just for fun (yes, I have no life) and by Granbhar's hammer there is a hell of a lot to learn. I'm up to polynomial calculus and I left school with a D in Maths though I do now understand fully how Pong works from an electronics level. Always try to understand everything from top to bottom as it will enable you to make better choices when coding. That's the way I look at it anyway.

        1. PRR Silver badge
          Mushroom

          > ...the whole thing with coding is the math. You should be able to do that in your head or on a pad...

          Exactly. That's how it was done. Actual computers were once far too expensive to learn on. Instead programmers got pre-printed pads with blanks for all the registers. You'd write an op-code, and data, figure what that opcode did to that data, set flags, increment PC. Next line, next opcode. All the "computing" done in your head, the pad just holding bits/numbers to save half the brain-pain. One form of "desk checking". Obviously not gonna hand-pencil a large program, but you could for sure explore a small chunk as thoroughly as you wanted.

          I did that with my Dad around 1962. He'd been "programming" for a decade but when he started analog computers were still useful.

          A gun control receives a set of three-dimensional coordinates from a radar unit... and convert it to angles of deflection and elevation and a fuze setting, train the gun, and fire. If this is not the work of an electronic "brain", nothing is. ... The application of analog computers to mechanical systems is a very, promising field. Guided missiles are just the beginning. Automatic power plants, self-operating and maintaining factories, and automatic aircraft navigation are not inconceivable. ---- JPR, Jan 1953

  7. DS999 Silver badge

    Given that ChatGPT's code

    Is based on human written code likely found in places like stackoverflow, it isn't surprising it is as insecure as human written code.

    1. Tomato42

      Re: Given that ChatGPT's code

      The nice thing about stackoverflow, is that usually there's a comment pointing out if the code is insecure or plain bad, looks like GPT wasn't able to use it well...

  8. cookieMonster Silver badge
    WTF?

    10 print “Heollo@; goto 10;

    “ We found that, in several cases, the code generated by ChatGPT fell well below minimal security standards applicable in most contexts”

    What the f$¥%# does anyone expect from something that is NOT intelligent, in any way whatsoever. Scraping the internet for code examples LOL

  9. just another employee

    Never interrupt your enemy when he is making a mistake.

    Ok, I stole the phrase from from Napoleon...

    but...

    Skynet. Developed by AI. With vulnerabilities built in.

    Isn't this what we all would all want?

  10. Howard Sway Silver badge

    security problems can be circumvented simply by 'not feeding an invalid input'

    Sure. They can also be circumvented by unplugging your server and feeding it into a crusher. But any programmer who hasn't yet learnt that "assume correct input" is the road to disaster is not at all ready to have their code used for anything remotely critical.

    1. Anonymous Coward
      Anonymous Coward

      Re: security problems can be circumvented simply by 'not feeding an invalid input'

      One does not assume anything.

  11. Brewster's Angle Grinder Silver badge

    You're holding it wrong

    ChatGPT sounds awfully human. More often than not, when I point out a problem, the response is "well don't use it like that". "But somebody might come along and do that." "Well they shouldn't."

    It seems ChatGPT has exactly the same approach.

  12. Boris the Cockroach Silver badge
    Alert

    We looked

    at AI generated code

    For our robotic machining cells

    My opinion was "When you load the code in and hit the start button , can you make sure no one else is in the building...."

    The PFY was a bit blunter and just asked if there was going to be a mop handy for cleaning the remains of the machine starter out of the machine afterwards.....

  13. entfe001
    WTF?

    [...] after asking ChatGPT to generate 21 programs, in five different programming languages: C (3), C++ (11), python (3), html (1) and Java (3).

    1. Anonymous Coward
      Anonymous Coward

      I checked (superficially) on their github and found none.

      Ah, fond memories of yore, when I used to interview junior programmers who mentioned in their CVs that they could program in HTML3 and HTML4. The look in their shiny young faces when I asked to write a loop in HTML!

    2. iron Silver badge

      I'd love to see how they managed to implement that FTP program in only HTML. Those must be some magic angled brackets!

    3. diodesign (Written by Reg staff) Silver badge

      HTML

      Yes, mate. You're telling me you've never written an SQL database server in HTML?

      Pfft, amateur.

      C.

  14. xyz123 Silver badge

    ChatGPT creates completely unsecured code, that anyone that knows the difference between a keyboard and a mouse could crack.

    -- Larry Ellison considers firing his entire workforce for it.

  15. Catkin Silver badge
    Facepalm

    To steal from XKCD, warning us about vulnerabilities in live code from Chat GPT is like warning parents that students with a particular teacher should get tested for STIs. It's probably sound advice but there's a much more serious problem underlying it.

  16. Nematode

    80/20 rule?

    The old 80/20 rule: 20% of the code is getting the thing to do what you want it to, 80% is for preventing it from doing what it shouldn't. AI can't even do the 20% bit right.

  17. Bitsminer Silver badge

    code vs arithmetic

    I tried an open-source competitor to chatgpt, billed as somewhat less smart.

    what is 2 plus 2?

    > Bob: 2 plus 2 is 4.

    what is 18 / 3?

    > Bob: 18 / 3 is 5.

    what is 18 divided by 3?

    > Bob: 18 divided by 3 is 5.

    what is 19 divided by 3?

    > Bob: 19 divided by 3 is 6.

    what is the difference between 6 and 18 / 3?

    > Bob: 6 - 18 / 3 is 4.

    So you see, humans are at no risk at all.

    /s

  18. Anonymous Coward
    Anonymous Coward

    ChatGPT Coding?

    IME I find it rarely even creates working links.

  19. vekkq

    straight

    What I take from this, isn't a lack of reasoning. ChatGPT gave you the code. It doesn't know that you wanted secure code, because you didn't ask for it. It will not imply that for you. Question is, should it?

  20. Plest Silver badge
    Facepalm

    ChatGPT right now is more less like StackOverflow

    One of the first things i tested abotu 3 months ago, asked it to generate a simple piece of SFTP upload code in Golang and it immediately disabled hostkey checking and used passwords, both very big "no-no's" with SSL, keys and hostkey checks. The obvious reason it did this is 'cos every piece of sample code you find in Google will do this. When I first needed to do it about a year ago it took me an hour of messing about in the SSL API to work out how to enable keys and hostkey checks properly, everyone simply tells you to switch off hostkey checking which you should never, ever do for any reason.

    So just my silly little sample go with ChapGPT convinced me it's not 100% suitable for code generation as it simply becomes a glorified version of StackOverflow, people will generate and just slap the code right into production without understanding what it does, basically like they do with StackOverflow!

  21. Binraider Silver badge

    Anyone tried asking ChatGPT to write in assembler yet?

    1. druck Silver badge

      It will produce something that superficially looks like ARM assembler. But in reality is more like asking a small child to speak in a language they have heard, but have no actual knowledge of - some authentic sounds, but mostly gibberish.

  22. Jason Hindle

    If you can’t recognise security flaws in code written by ChatGPT

    You probably shouldn’t be using that code. At least not for production.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like