back to article ChattyG takes a college freshman C/C++ programming exam

ChatGPT was put to the test via a series of humdrum freshman C/C++ programming tasks and it passed – though not with honors. According to a Croatian research team, while first-year students can struggle with some of the assignments, the results [PDF] showed ChatGPT hitting proficiency targets that ranged between average and …

  1. usbac

    "While these didn't hinder the program's functionality, it did indicate a lack of optimization. It was as if ChatGPT sometimes took the longer route to a destination, even when a shortcut was available."

    Okay, so they are training it on Microsoft's source code. Now we know...

  2. JessicaRabbit
    Facepalm

    It doesn't learn from previous attempts. At best it 'learns' from absorbing millions if not trillions of samples from the internet. What is being described as learning here is nothing more than additional prompts that changes the weighting of its auto-complete-like predictions.

    1. JimmyPage

      Which could be a perfect description

      of most politicians

  3. Doctor Syntax Silver badge

    A better test would be to provide it with a problem for which there is currently no solution although it should not be impossible to produce one.

    For instance the open Document Foundation define a format with the file extension .odi as an image format. As far as I can follow it's a multilayered format in the same manner as, say .ora but can include SVG as well as PNG layers. Produce a paint application to generate, display and edit files which contain arbitrary arrangements layers of both types with variable levels of transparency in each layer..

    It should be possible but as far as I know nobody has produced on so there is no prior art to regurgitate. Imagining and realising a good solution to a novel problem is what distinguishes a developer from a coder.

    1. Brewster's Angle Grinder Silver badge

      Asking it to produce an SVG editor alone means producing software equivalent to Illustrator or Inkscape. And you could argue the PNG editor is akin to asking it to produce Photoshop or GIMP.

      Even if you restrict it to a rasteriser for the SVG, that's a fairly chunky piece of code, with complex bitmap filters, CSS, and animations. Although there do seem to be various libraries for subsets of it.

      1. that one in the corner Silver badge

        > Although there do seem to be various libraries for subsets of it.

        Or you make the editor run with a browser-based front-end (your editor is in JavaScriot or runs as a server) or use an embedded "browser-like" library, such as Sciter (the successor to HTMLayout, which had a far more self-explanatory name).

        Then your SVG editing comes down to tracking the mouse and updating a node position in the plain text.

        Similarly, there are multiple methods for handling PNGs - Fly is a seemingly forgotten but still very useful little program to wrangle bitmaps from an input script, leaving your editor to just handle the text.

        After all, "produce an editor" needn't mean it is as full-featured as GIMP or PhotoShop (many smaller editors exist and are *very* useful too: thank you, Irfanview) AND one thing you would hope to gain from using an LLM trained on much, much text from the web is that it has picked up info on lots of otherwise obscure libraries.

        Given the above, it does become a nice test to see if the LLM can join the dots in a non-obvious way (i.e. not "just" trying to replicate Photoshop) : far more interesting to see it do that than just using it to save your typing yet another simple loop!

        1. Doctor Syntax Silver badge

          Given that there are raster and vactor editors to learn from those aren't to trick parts of the puzzle. The bit that requires imagination is how to produce a workable paint program from it. I'd like to think that if you put that problem as a test the human would start by thinking what the interface would have to look like but what would a ML do?

        2. Brewster's Angle Grinder Silver badge

          Browsers/electron and javascript are not popular round here. But you're right, and I actually have a homebrew SVG "editor" that works like that. But my editor is nowhere near as fully functional as illustrator and nowhere near releasable, that's how I knew the request was epic.

          Re the file format: I don't know it, but I suspect my editor could be hacked to handle it. (That's exactly the kind of thing it exists to do - it even has rudimentary pixel hacking functions.) Given you can embed SVG and PNG (and HTML) inside SVG, I would, sight unseen, suggest it's turned into a single SVG file with the layers embedded in it. In fact, rather than an editor, I would suggest what's needed is a filter that can do that, and reverse the process to recreate the format. Existing editors can then be used.

  4. that one in the corner Silver badge

    Malformed questions worthy of ChatGPT

    > All of this raises an important question: Does ChatGPT always choose the best strategy, or does it sometimes default to learned but inefficient methods?

    Or neither, as it isn't a Planner and isn't picking any strategy at all!

    > Yet, when researchers revisited this challenge in English, ChatGPT seemed to have learned from its previous misstep, returning to the simpler method.

    Or it has reverted back to the same set of triggers it responded to the previous time it was asked in English and so produced a similar result again.

    Hey ho, let's download the PDF and see if the trial protocol was really as appalling as this article is describing. I may be some time.

    1. that one in the corner Silver badge

      Re: Malformed questions worthy of ChatGPT

      Ok, not so long - the paper is 11 pages long and the vast majority of it is just stating the programming problems.

      And the protocol is utterly dreadful, especially for testing an online service (which can, of course, be updated without your knowledge): the English and Croatian runs were separated by a week (implied to be the time it took to translate the questions) so no definite statement that they were guaranteed to be using identical software both times! Repeatability in Science? Pah, that is for weaklings!

      They did not bother to perform multiple runs in each language, to check for any changes due to randomisation in any part of the process or for changes due to subtle differences in the wording of the prompts.

      They openly admit that they do not know if the model is still learning/retraining (i.e. immediately incorporates its recent conversations back into itself[1]) and then ascribe the change in behaviour to that process - if you do not know, you can not ascribe! And, anyway, inputs spread over a week? How is that recent in terms of a public online service? How many other people's inputs would have been included if the model *was* continually updating itself?! Even if they do not believe that working on such shifting sands was a problem to their conclusions, they fail totally to even mention that, as if they had never even considered that!

      As this is a symposium paper, not something published in, say, Nature, there is some leeway to be given - but not that much!

      [1] btw, for various reasons, there is no good reason to think it is continually retraining[2].

      [2] yes, they probably are recording your sessions, to feed into the *next* model - and whatever half-hearted "protections" are being applied to your inputs to stop you asking naughty questions.

  5. that one in the corner Silver badge

    The real takeaway here

    > The real takeaway here is its human freshman-like adaptability: It wasn't just about getting the right solution; it was about refining, learning, and iterating.

    Nope.

    The real takeaway here is that The Register is presenting a symposium paper, which is little more than an anecdote, in the same style as it would a paper published in Nature[1] and then ending with its own clickbaity conclusion!

    Mutter mutter, El Reg never used to be like this, it was all so much better in the old days.

    [1] well, not quite - it would be crowing that they'd waded the whole way through a Nature paper!

  6. Bebu
    Headmaster

    "It was all so much better in the old days." - When was it ever not?

    《[1] well, not quite - it would be crowing that they'd waded the whole way through a Nature paper!》

    Normally need three days rations and a fairly meaningless life to embark on that journey. ;)

    1. that one in the corner Silver badge

      Re: "It was all so much better in the old days." - When was it ever not?

      > Normally need three days rations

      And good eyesight!

      SWMBO (who gets to sit in the clever corner) stopped our subscription because it is just too difficult to read that print in bed :-(

  7. Bebu
    Windows

    C++

    Training one of these AI/ML thingies on the corpus of the last three decades of actual C++ code (as against C) should send it gaga and produce the apparent impossibility of even more inpenetrable code. I can imagine the Sieve of Eratothenes being implemented with a generous helping of gratuitous templates and exceptions. :) What it obscenities it might inflict of a bubble sort let alone quicksort doesn't bear thinking about.

    The code that first year (freshman) students normally submit is sufficiently nightmarish with taking it to this new level.

    1. that one in the corner Silver badge

      Re: C++

      Why do people write such bad C++?[1]

      I've seen things you people would not believe: torturous code with deeply nested template declarations; use of string types without considering copying behaviour, terabytes of in-core duplication just to generate kilobytes of LAN traffic.

      But it is entirely possible to write neat, legible and efficient C++. To start with, if you're gonna nest the template decls, hint: "typedef"!

      > The code that first year (freshman) students normally submit...[1]

      [1] I blame the fetishisation of "Modern C++" and the first year's following "coding influencers"! Knuth and Dykstra[2] were good enough for you Grandfather's generation, they are good enough for you!

      [2] Even if you can't pronounce them![3]

      [3] Pah! Practice on this: car, cdr, cadr, caddr

  8. Pete 2 Silver badge

    A different line of work?

    > It was as if ChatGPT sometimes took the longer route to a destination, even when a shortcut was available

    And all the consultancy firms suddenly became interested

  9. thondwe

    So Chat GPT was helped?

    So the first answer is many cases wasn't right/complete, so a human helped it try again - IIRC outside help in an Exam isn't allowed?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like