Accountants
Only a management accountant could believe that AI software will make good profits.
Generating code using AI increases the number of issues that need to be reviewed and the severity of those issues. CodeRabbit, an AI-based code review platform, made that determination by looking at 470 open source pull requests for its State of AI vs Human Code Generation report. The report finds that AI-generated code …
This study sort of confirms the way I've been feeling about these tools - it's not surprising that a tool with no understanding of any kind would create serious bugs, and I would expect in many cases it takes longer to solve those bugs than it would have to write well-architected code.
It doesn't matter. We'll still be told we have to use it.
It's funny how AI - which harms the planet and makes our jobs worse - is mandated by our business leaders, while remote working - which helps the planet and improves our quality of life - is forbidden, as far as they can get away with it. I don't think these things are unconnected.
I still choose to be a programmer rather than a sloperator, as far as possible, but ultimately I need to pay the rent one way or another. Here's hoping the bubble goes up sooner than later.
So I'll say it again: LLMs write terrible code.
Having just taken over a project which was written by a poor programmer using Copilot to hide his lack of skill, I can confirm that LLMs have no idea about rational error handling or program structure.
They're useful for one-off, throwaway programs. They can be a great reference for all the billions of libraries out there.
Are they any use for writing production quality code?
That's an emphatic NO from me.
Agree with your points wholeheartedly.
I've just tried Claude - first dabble at this LLM stuff, after a mate (well, I thought he was a mate!) goaded me into it. He gave me some tips: build the system in baby steps; be specific etc etc
My conclusions/observations (in short, I'll get bored):
- there's no way an inexperience programmer could produce anything useful
- the LLM makes insane assumptions for _anything_ that it's not told to not do
- creating the prompts to limit the LLM's assumptions takes as long as writing the code, requires solid programming knowledge, and the code still needs checking
- the code is bloated.
- build the system in baby steps is bollocks. The iterative process modifies the little bit that worked as it didn't accommodate the next bit (how could it?)
Reading others' code can be challenging. Historically, that often occurs on a known working system, so it can be assumed that "it's mostly right". Unraveling this shite brings that to a whole different level. A level below and not worthy of BOFH's basement.
I can see it being useful, perhaps, for creating a framework for a large and complicated application. But the working gubbins? No feckin way. And if one's thing is creating large and complicated apps, chances are you've already got that bit sorted. Tidying existing code? Maybe: change all variables to Hungarian notation; use PascalCase for functions (don't hate me!). Woo fucking hoo.
I haven't used MCP self defined code chunks yet, I expect that'll help a lot. But I've been writing software for over 40 years, and still learning. Give that to a noob? Sheesh! *shudders*
> Reading others' code can be challenging.
Only if they are bad programmers. My first thought when I'm writing code is that either me, or someone else should be able to easily understand both what I am doing and why when there's a need to enhance or change the program in the future.
Meanwhile an LLM's first thought is... nothing! They don't have thoughts! They just do the average of what other programmers have done when they had a text input somewhat like the prompt.
Logically this suggests that the output will seem good to a below-average programmer and bad to an above-average programmer. I flatter myself that most examples of AI code I've seen suck. Of course, when an AI code review catches an actual error in my code (unusual enough that I remember every time it has happened over a lot of pull requests) it is mortifying.
> Only if they are bad programmers.
Weeeeell I wouldn't say "only". The old adage of "Give 100 programmers a problem..." If that problem is complicated and requires specialist knowledge, and assuming all 100 produce the correct results, it could be that there are techniques not known to all. And yeah, the comments above should sort it. But my maths might not be as good as theirs, and that's going to make reading that code a challenge. Whether that's "difficult to read" code or a different hurdle, I won't press that point.
As for understanding one's own code after a time: yup! You've really gotta cover that one :) I'm an advocate of self-documenting code. And properly documenting code. As an example of why, and understanding someone else's code that works, but no idea of how or why, imagine a young programmer on his first job (no names here!) arriving at the office on a Monday morning to review the code written on the previous Friday afternoon after a very long wet lunch. I They had no idea what that garbled crap was. But it worked!
upvote for at least you have to be able to read your old stuff. I've got old stuff I still must tweak for a new feature. That stuff was written 25 years ago sometimes. I look at it and those comments are priceless. I can add the tweak in very short order. Spend an extra 20% of time adding in comments and you in 10+ years will thank yourself.
It may have been said in jest but 100 times this.
When I write a new class method or something, it will start off as the method declaration and delimiters (name and brackets since I work in PHP), and then I'll write comments, in good English (proper sentences and all!) until I have described all the code I believe I need to put in the method. Also its purpose and parameters etc. Sometimes I'll end up with 30 or 40 lines of descriptive text within the delimiters.
Only then do I start coding. While coding, I always need to edit the comments a bit but easier to edit and refine while coding then try to add in good comments right at the end.
You think I'd jest about programming! You scoundrel! I challenge you to a.... well, maybe I do joke around a bit :D
But you nailed it there, especially the last sentence. Before I was disciplined enough to do that, or if writing code on the fly (as opposed to fully designed) I'd have the intention to comment it, but invariably be "on a roll" and, well... If the program's fully designed before coding, you're going to know what's going in there.
Bringing it back to the subject in-hand: Claude at least does write good comments, so you know what it thinks claims it's doing. And that really is the best thing I can say about it at this point.
[Icon: Claude after I've slapped it in the face with my gauntlet!]
Absolutely Right. I was just about to add that.
Comments Comments Comments
Every programmers nees to learn to comment every function or code block they create. I am 71yo have been programming for tegh last 46 years, including Cobol etc. and employing hundreds or programmers over the years in a number of businesses and corporates. It is so difficult to get people to comment code. Unbelievable actually.
As was said above elsewhere, different programmers have different approach to coding. The outcome will be the same. Getting there can vary widely, and thus can be difficult to understand sometimes when looking at it with fresh eyes.
Code commenting is simply mandatory for being able to maintain systems long term.
And yet, the Agile Manifestosios state that code should not have comments. Seems they think that programmers can't be arsed to a) write decent comments in the first place, and b) keep them up-to-date. While they are all-too-often correct about both aspects, that doesn't mean that they should throw up their hands in abject surrender to the basest elements of the Programming Brother- and Sisterhood. Isn't Agile (upper-case 'A') supposed to be about improving the overall Craft of Programming? (That's their claim, after all.) So, as widely (and wisely) documented here by what appear to be some of the better practitioners of the Craft, it would seem that the Manifestosios would want to encourage better documentation practices, not discourage them. But that's just me, I guess.
Full disclosure: I have worked directly with some of these Manifestosios. They are Smart People, and probably have forgotten stuff that I would never know. We have, however, in the past, had several animated discussions about just this topic. And I still document the stuffing out of my code, and require that those who work with me do the same.
Worse, habitually leaves the comments untouched when they rework the code, so there is no longer any connection between the two. Even just not removing the /* TODO: fill in the body of this auto-generated empty method */ wastes so much time: did they leave it because this *isn't* fully done yet?
Bonus points if they are still using the variable names that were already present, but now for completely different values/purposes.
If you need comments to explain how the code works you need to rewrite it.
Coding is explaining to humans what you want the computer to do. Making the computer do what you want it to do is relatively easy, making it clear to other people is the tricky bit.
That's how the modern developers in my company work. No comments anywhere.
Meanwhile the ancient assembler has oodles of comments, not all of them useful, but often explaining 'why' - the business use case for the exception to the standard code flow.
The modern code is almost entirely divorced from the business case it's solving. It's pretty, it's efficient, but it's not maintainable and the onboarding cost for new developers is steep.
Code for serious business functions has a long shelf life. I'm still working on transmission products designed in the 1930s (https://www.atchistory.org/a-brief-history-of-air-traffic-data-communications/), they are as relevant today as they were then - event-driven, store-and-forward, guaranteed delivery. Even AI has heard of them :D
Not quite right.
You can usually work out what the code does. But unless there are comments, you have no idea if that is what the programmer acually intended it to do.
The whole point of commenting your code is so that in future somebody can compare what the code was supposed to do, against what it actually does, and see how badly you fucked it up.
Comments are also great for notes, rants, jokes, ASCII art, etc.
Knowing when, and what to put, in comments is a fine art.
Where I work, we used to have a strict rule that every code change should be commented with date, and programmer's initials. This leads, inevitably, to old code that contains a slew of comments (sometimes more comments than code) that add no explanation, or context, to the code you are trying to read and understand. Sometimes, those comments might include an explanation of what has changed, as well; not necessarily sequentially, if different bits are changed at different times, so you are left reading comments all over the place to try to determine the code's history. This is what commit comments in source control are for, not code comments. The history is in git, not in the source file; that is the current version, which does what it does now.
I now remove this sort of comment whenever I see them. They add no meaning or context to the code, and I really have very little interest in which programmer made a change 15 year ago, if I can even remember who they were, from their initials. Nobody else does, either.
Explanatory comments are another thing, but, again, are best used in moderation. Well written code should be self-documenting to a certain degree. Method names that reflect what they do, and sensible variable names, are just as important as comments, and duplicating what these already tell us does nothing to make the code more readable or easier to understand, it just adds bloat. Property header comments that say "Gets or sets the value of Forename." above a Forename property with a standard getter and setter serve no purpose whatsoever.
Where comments are important, and often invaluable, is where the purpose or function of a piece of code is non-obvious, or where there is a particular gotcha, or workaround, that a future programmer would benefit from knowing about. Documentation for the purpose and usage of parameters, so that IntelliSense can show these to another programmer when writing consuming code are also useful, if these are non-obvious (it's always better to give those parameters a clear and meaningful name where possible).
By far the very best use for a comment, though, is for ranting, joking, making pop-culture references (the more out-of-date, the better), and insulting those co-workers who deserve it, but will never read them (and done in such a way that you can't get in trouble). Oh, and no matter how much you might want to, never use "rude words", you never know who is going to come along and search for them later, and then feign offence, in order to fluff their own ego.
One of my CS lecturers was telling us about his first job. He'd been tasked with reviewing some assembler code, maybe tidy it if he cold see a way to improve it, to make it more efficient. He got to a particular chunk and got a bit lost, so back tracked and tried again. Still stuck. Looking across to the comment, the original programmer had noted: I get lost here too!
Amen to this.
Long ago I learned (was taught) the importance of code comments, with a focus on "why". In particular, the value of the "top comment block".
E.g. at the top of the file, some comment lines which not only list things like date, author (who's ostensibly responsible for the mess) and such, but also a few words about what the program as a whole is intended to do, and why. Maybe mention the intended audience/consumers, assumptions and dependencies, reference to any exterior docs, etc.
More than once I've saved myself time and trouble when needing to re-use or re-deploy some old script I'd cobbled together many years prior, because I'd documented what the thing was for.
If it helped other folks too then that's grand; but I write comments for myself first. :)
> In particular, the value of the "top comment block"
I take that concept even further. In the comments at the top of a function or method I write out in bullet points what the function aims to do e.g.
This method allows an object to behave like a 'foo' even though it's a 'bar'.
1) Make sure we have been initialised as a bar
2) Check the prerequisites for being a 'foo'
3) etc
And then in the code just have short comments /* Step 1*/, ... /* Step 2 */ etc. marking the start of each bit. That way the commentary is in one place and reads as meaningful prose from top to bottom; but the comments in the code are easily moved around and retained. And if a change comes along you can easily amend step 2 to 2a, 2b etc without having to renumber. (And also, the person doing the change has to read the top comment in order to know where in the code to put the change, so at least you know they've tried to understand the function and the change they are making!)
Back in the day when I was writing in Assembler for PDP-11's, and assorted Vaxen, the rule was comment every line and have a comment block every 20-30 lines. Being able to read what the code was doing in plain language is priceless. Even now, I comment almost as much. Old habits die hard.
I'm afraid you have the order wrong.
Because what happens to your "correct" code when the requirements change? It's no longer correct. Almost no one is a Knuth.
The real world is constantly changing. A lot. Some small sliver of those changes almost inevitably turn into update requirements for a given piece of software.
Even a critical security failure requiring a major re-write of the code can be managed.
Remember when XSS attacks became a thing? Almost every website managing use accounts became imperfect overnight. At that point, there was code that could be maintained--and code that could not.
I agree with you claptrap, but let me rephrase: To even get to the correctness stage, the code must be legible.
Otherwise code review will also miss the hidden incorrectness. There's no perfect code, particularly when you are externally surfaced. Race conditions are particularly difficult to prove correctness for.
So yes, the code must run and pass its tests for temporaly proof of correctness, but it's only correct within the bounded context of how it's tested.
To be reviewable, it must be legible.
To be fixable for when it inevitably breaks, it must be understandable.
Your phrasing is still wrong. Correctness still comes first. All the readability and maintainability in the world will not make an incorrect program useful. A correct program that is not maintainable will eventually be a problem, which is why you can't stop at correctness, but without correctness, you don't have a thing. It does get more interesting when we consider when correctness has limits, for example when we don't have to get certain types of answers right because they don't matter yet, but for most things, that is the first requirement. Maintainability can occasionally be compromised and will be more frequently than that (and that it should). Your only argument for why it would be different is that, through lack of maintainability, the program becomes incorrect, and that serves as extra proof that the incorrectness is the bigger problem and the unmaintainability is a problem because it causes incorrectness.
Unmaintainability doesn't cause incorrectness. You're following the wrong trails.
Proposition: The requirements for code change over time.
Corollary: Perfectly "correct" code at t_x will become incorrect a t_y.
Observation: This change in correctness has nothing to do with the maintainability of the code.
Proposition: The claim that code is "correct" is properly a statement in mathematics.
Corollary: Unless you are a trained mathematician (at least to the level of passing your prelims), you cannot reliably prove such a statement.
Observation: The number of trained mathematicians working as programmers is vanishingly small.
Conclusion: Almost no code is known "correct" in the strict sense.
Corollary: It is far safer to assume that all code is broken, even at any given time.
Conclusion: If you code is unmaintainable, when you either discover that your "correct" code is broken all along, or the code will simply be deemed wrong because the requirements change, you will be in trouble.
The definition of "maintainable" code is that it is easy to fix what is broken. It is the necessary prerequisite for code being even correct-ish for longer than a particular snapshot.
The code is correct, if it passes the unit tests. You did write the unit tests, right? Right? ... Right?
If a new class of flaw is found, tests for it get added to the test suite. You weren't about to go and read through your entire code base and mentally parse every line to find those flaws, were you? Or were you just going to "wing it"?
The odd part is that LLMs are kind of great at the readability part but are so bad at the correctness and maintainability parts that it seems like it's mocking you. There are lots of readability problems that plenty of knowledgeable programmers perpetrate. My most frequent example is bad variable names. For example, I was recently checking some code in OpenSSL, a program I don't frequently modify, and got to read some relatively simple code that probably would have taken me about twenty seconds to read if not for the fact that they named all their variables things like addmd and mds. Doing things like if (addmd++) didn't help either because, while it's immediately obvious what that does, it's not immediately obvious why it's doing it (the answer is that addmd is basically a boolean they switch once from false to true, but they chose to do it that way instead even though they never use its actual value). LLM-generated code almost never does things like that.
The problem is that the clear, well-commented code it does write is riddled with bugs which you need to comb through to find and is never designed in a way that makes modifications easy, so when you ask for modifications, it's likely to just rewrite the function concerned. That's if you're lucky; if you're unlucky it rewrites several functions. It can over-comment, and I'm sure that would get annoying eventually, but we never get that far because the code that I'd quite enjoy seeing from some people doesn't run correctly and I still have to read it to find that out because the comments never explain that this is the part where the test function writes twenty lines that always compute to true which is why the test passes.
and yet, every company that has taken over the outsoucerer I was working at first thing is to strip out comments in the code. New outsourcerer than expects kids fresh from uni who learnt only one only scripting language, to understand complex code that requires deep familiarities with a specific language to understand, let alone modify.
That's just a pathetic ego/power trip to appear Übermensch to the less experience.
Unless he (just a guess!) was a Klingon (see #4). My favourite is #6 in this version!
Though I have played one at times during various roles over the years... It seems to me that in order to have AI write good code you'd have to have a VERY clear idea of EXACTLY how you'd want the program to work. The kind of detailed specs that are basically pseudocode written in English. Your "prompt" might have dozens of typewritten pages in size for a real world complex program.
The thing is, if you are a good programmer, and you have a spec of that quality available to you, you will write that program REALLY fast. The reason programs take so long to write is that you never have such a beautifully detailed spec available. You have a Trump style "concept of a plan" and you just start writing it, and figuring it out as you go. If you have users you toss it over the wall to them and they offer feedback and it just grows organically that way.
The reason many programs can benefit from a complete clean sheet rewrite is that IF (and this is a very big if) if really understand everything the current program is doing, you can essentially use that knowledge and the existing program as your 'spec' and your rewrite will be a lot cleaner and fix a lot of compromises you had to make because you didn't know where you were going when you wrote the original version. This never happens in the real world though, you are given some dusty deck code that no one understands but are full of years of small fixes/additions that end users have come to rely on. That's why such dusty deck rewrites fail so often.
If AI could just get to the place where you could feed it the existing source code as the 'spec' and it would rewrite that in a modern language in a modular/maintainable way (hopefully with semi intelligent comments) one could not overestimate what a massive win that would be!
> That's why such dusty deck rewrites fail so often.
Don't worry, if you haven't got a dusty deck ful of random mods made with a pair of compasses, but instead really now understand what the system was meant to do all along, the project will be hit by good old-fashioned Second System Syndrome and will still fail.
> If AI could just get to the place where you could feed it the existing source code as the 'spec' and it would rewrite that in a modern language in a modular/maintainable way...
Yeah, well, we *could* have been spending our time creating a system that could do that sort of thing, but nobody could really be fagged to do it properly. Ok, that is a *bit* rude(!), but way back when (1980s and on a bit) there was much talk of taking in source code and applying lots of lovely semantic analysis, leading to a representation that could be re-arranged in just the way you describe. Even adding in comments.
Trouble is, the demos that could be done were rather trivial, compared to the real-world programs that *need* such attention. And one big, big reason for this was that compilers for the ever-increasingly-complex popular programming languages were moving further and further away from being able to provide even the initial stage, conversion of the sources to a manipulable representation.
Consider the simpler problem, of generating documentation of the form that programs like Doxygen do: it tries to build up a representation of the program and how the parts of it interact but first it has to parse the language used. Oopsie, there isn't a nice compiler we can pull apart to get something that handles the entire language; this is a pain, but you can get a long way towards Doxygen's goals with a rather less rigorous parser, but that leaves far too many holes. Some attempts were made - gcc-xml for example, which could spit out (some of) the AST[1] - but unless they are taken onboard as a goal for the compiler team it becomes a never-ending race to keep the modified compiler up to date; so the project dwindles and dies. One of the great hopes of the LLVM project, in its later manifestation with the release of clang, is to have a compiler that *does* make access to the pre-parsed form a core goal. And LLVM is, indeed, giving us that, as can be seen by the replacement of gcc-xml with CastXML (Note: CastXML is limited in what it does, as it is only maintained by and for one project, ITK, but it shows the way).
You can't really blame the compiler writers, there is enough to do keeping up with the language specs, but just consider how long it has taken us to get to the point where our programmer's editors can do "simple" things, like letting us rename a variable and have all the uses of that variable be updated across all the source files. If things had gone slightly differently, we could have had that as a bog-standard feature of every screen editor since the 1990s or earlier[2] and it would be standard practice to let your compiler be invoked for that purpose[3].
Anyway, cutting the story short (too late!) we don't want LLMs tackling that rewriting task, we want to have a pile of front-end processing, based upon sound compiler theory and practice, doing it. And, as part and parcel of that, make use of techniques from the AI labs to work on the processing: pattern recognisers and rewriters, heuristics that can be reversibly applied... Once you've done that, you could even use Machine Learning to help build up and apply those mechanisms. And, as with extant compilers, you know (bar the inevitable bugs) at each point in the process you *still* have a perfectly correct representation of the original program, so even if the generated modern language output isn't as modular as you'd like it is still good: and you can try again with another set of command options and see if it improves.
But, instead, the best we appear to be able to hope for is that "AI gets to a place", realising that it is far more likely that people will just try to keep shovelling cycles and random text scrapings into an LLM, fingers crossed the few remaining human programmers will be able to fix the glaring holes in the "translated" version of our program (but it *must* be modular now, as it has been broken into 7532 files, each containing no more than two screen's full of text, just like all the old company coding standards scraped from the web said)[4]
[1] a weird consequence of software "getting better" but not if you look at it from another p.o.v.: when we *had* to write multi-pass compilers, we *always* had to have a full representation of the complete AST, complete with line numbers (but lacking the comments), in a defined Intermediate Language format, which could survive being written to disc and read back in.
[2] I have a friend who was working on a "syntax-directed editor" in the mid 1980s who was stymied by the lack of parsers.
[3] Obviously, not the commercial compiler and/or editor authors, they'd shun this or only make it work with their own products <cough>Visual Studio</cough>, which is why this whole thing is a pipe dream.
[4] dang, now I'm implying the LLM can "understand" those scrapings...
Your first paragraph is spot on IMHO.
As for paragraph two "The lack of a decent spec" that surely happens in certain environments (I'm going with my usual scenario of a room full of accountants with Excel and VBA). However, I've always been a bespoke programmer and PM, so leading up to the actual programming (scope/spec etc) is around 60-70% of a project pre go-live. The client is the expert in their field, and if I've got to understand (eg) polymer compounding those iterative steps before programming starts are an imperative. So with the path {specs -> programming -> coding} you already know the code before you code. Hence paragraph one.
Many times I've encountered paragraph three "The evolved application". At this stage I would *not* trust an LLM to do anything other than make a greater mess of legacy code. Like I sad, maybe prettify it, but change working code? FRO! And there're already tools for that, and if not, I'd write 'em! When dealing with legacy code it's reverse engineer to understand, write a spec and start clean. Modifying existing code by definition weakens it, as it was never designed (cough) to bear the extra weight. It's doomed. That legacy code holds nuggets of info (eg. formulae) that, with one misstep, makes the whole thing totally useless. Give that to an LLM? Not on your beloved aunt Nelly! You'd take it from something that works in an unknown way to something that doesn't work in an unknown way.
So, the final paragraph? Well I'll leave that to the likes of Grandmasters such as Asimov and Banks for the time being, may they rest in peace.
[Icon: to the Grandmasters!]
This is actually one of the very few cases where "AI" does a reasonably good job. Ask it to take those 6,000 lines of spaghetti code, written in an obsolete language, and summarise its purpose and function. Don't take everything it says as absolute truth, but it can massively reduce the mental load of comprehending such ancient messes. This can allow you to design the rewrite without missing important cases that are hidden in the complexity of the code, but at the same time, without having to spend time eliminating all those elaborate null-ops that have evolved from changing requirements over time.
The pertinent point here, is that LLMs are a tool; they aren't intelligent, don't have agency, and thinking that they know or comprehend anything is a category error. However, they are pretty good at pattern analysis; the mistake people are making is using them for pattern generation, and not realising that when running in this mode, they are producing patterns that look correct, not patterns that are correct.
What, you mean that if you don't know what it to ask it then it does stupid stuff?
Here's a suggestion, try a project written by an experienced developer using Copilot.
Your argument is basically "I saw an idiot crash a car so I'm sticking to horses."
You'll get N suggestions of "what ifs" and "maybes" that you can peruse as food for thought. Half could be practical to the experienced, half would steer the inexperienced in the wrong direction.
[Edit: that's know working code, dunno about bug ridden stuff as all of my code is perfect of course!]
FDiv is generally the slowest instruction on the part that doesn't pull from main memory. If you are doing multiple divides by the same thing, you can save time by storing the reciprocal. Of course, you introduce an additional 1/2 ulp of error. But there is an obvious reason to prefer multiplying by the reciprocal.
Dealing with some Laravel code (PHP) I found that chat GPT sent me round in circles and mainly suggested solutions which were either wrong or might work but though a convoluted and terrible bit of code.
On the other hand I've been really impressed with claude code running in a shell. It's helped me to root cause and fix a couple of tricky bugs and to generate some really useful, well structured front end components. It's saved me loads of time compared to writing it myself and when I review its diffs they're mainly very sensible.
Occasionally it still completely missed the point of suggests a really complex solution instead of something obviously much simpler - I would certainly agree that for it to be useful you need to understand the code it's writing so you can steer it when it makes bad choices - juniors or non technical managers trying their hand at a bit of vibe coding will probably still create an unholy mess.
I like it so much I'm actually paying them real money for it
but when you only know a tiny bit of Powershell, sometimes using chatGPT is useful to just slap something together. I know its flawed, but can help but also seeing how shit it is yet management in companies think its amazing and can replace staff.
CoPilot, despite using ChatGPT, with a proper license, struggles after about the 5th question given to it about writing a powershell script. So bad it freezes for about 5mins a time, then gets so bad its just unusable. You'll get it to basically do what you want, ask it to add a bit and it breaks the 1st part. You'll get it to fix it, which it does, ask it to add something and it does but puts the broken code back in the first part that you told it was broken earlier.
I switched to ChatGPT direct which was much faster. But you'll ask it to do something where ' are involved and it knows full well that if it uses smart unicode ', powershell breaks yet still uses it. The code was failing with the incorrect error by powershell but looking at the code I spotted the smart unicode. ChatGPT admitted it was wrong, so why use it in the first place.
Again, I dislike most of the AI shit but its useful for slapping stuff together without people on reddit or stackoverflow saying "If you don't know the basic starts there, blah blah blah".
"ChatGPT admitted it was wrong, so why use it in the first place."
Because LLMs simulate a thing that is capable of holding lots of rules at the same time, but they don't. A human can easily hold a simple rule like "Unicode quote marks [I assume] don't go in code" with almost no mental effort at all. LLMs don't. They can be prompted with that rule, and they'll stick with it for a while, but eventually, that rule falls out of the context window and is lost. The more they're using training data drawn directly from code, the more they'll stick to that rule, because the code was written by humans who did. It's chance, weighted by a bunch of factors like what's in the system prompt that got sent before your session started, the prompt that gets sent every time with your request, the length of your request, the specific model, the weights in the training data that are being used actively this time, and some that are explicitly random. There are so many that it would be impractical to predict what it's going to do without having written the LLM software and hard even if you did.
You might be able to get better results by having the LLM self-examine. Check this out:
1.Write a program that does this one thing.
2.Now check it and see if it works.
3.Ok fix those obvious errors, now try again.
Because I’ll tell you what, it’s pretty rare for me to come up with a program that works first try!
If these LLMs are meant to "know" how to create a working program, why should we have to still tell it how to do the *most* basic things about the process?
If I am running a compiler[1] I would *really* not be impressed if I had to tell it "ok, try again, but this time remember to do the phase two parsing after running the lexer".
Using an LLM in the expectation that it'll manage the difficult bits (picking an algorithm and a representation, selecting libraries, gluing all this together...) whilst knowing that it isn't capable of the simplest bits (i.e. just bothering with a test run *and* remembering to ingest its results) all on its own. Bit of a strange mismatch there?
And having the "specialised code-writing AI" turn out to be the same LLM, but this time running from a batch file that loops to save you typing in "Wro-ong, do it again"[2] isn't any better.
[1] in the most basic fashion, such as just "cc fred.c -o fred", to avoid getting bogged down in pernickety arguments about how you can actually pass these options on the cli and it'll ... but then you have to ... yourself...
[2] "you can''t have a pudding until you've eaten your meat!"
>"AI coding tools dramatically increase output, but they also introduce predictable, measurable weaknesses that organizations must actively mitigate."
That sentence makes about as much sense as building a cake factory that produces 100 cakes an hour, 99 of which are burned to charcoal, and then say "this factory has a great output, but it also introduces a predictable, measurable rate of cakes that fail QA and have to be discarded". NO, gods damn it, that factory produces 1 cake an hour. The sentence is a long-winded way to obscure the fact that the factory has a crap output.
In coding, real output is after testing (ideally, after user acceptance). The only reason we ever measured lines-of-code was because humans have a roughly consistent error rate, and even then it was a bad idea. If AI coding reduces time to write code but dramatically increases testing and fixing time, the total effect is that it reduces output.
In fairness, it's a bit more complicated than that, because it could be more efficient to test cakes and discard them than to ensure that every cake is perfect during production. That's true in various manufacturing things, and it's the incorrect comparison people sometimes make to LLM programming. If it was faster to fix LLM errors than to write it yourself, maybe there'd be an argument. In my experience, it isn't, because the LLM creates so many more types of errors than any individual human tends to. Also in my experience, the people who say that it is tend to be the same people who don't bother to fix some of the errors and therefore produce something faster with very different quality to what they would have written themselves.
That gives an interesting perspective shift.
I've taught a few people to program, not many and always one to one. I think as a teacher, of any subject, it's important to see one step ahead of the student to foresee their next mistake. From my own learning curves, I know where I've had problems, and we've observed others' struggles too. Basically, the path that meatbags take to gain knowledge. ("Don't worry about it, every programmer's done that at some point!") But with LLMs it's anything from anywhere. As the LLM "fails in the wrong way" it becomes frustrating. So I have a choice. Do I treat the LLM like a wayward student, and adjust my nurturing of it? Perhaps, if I were confident that I won't have it repeat the same shit every time. So, I need to delve into the MCP side of it, to minimise the reoccurrences. Oh yeah, and get a virtual LART. I'm still not convinced, but as a geek, it's something to poke.
i've been always a backend programmer (and doing server-driven frontends with vaadin and the like) and currently need to handle frontend as well, with the react stack. here is my experience.
the good ones:
(1) for brainstorming and prototyping new stuff for demonstration or feedbacks, it excels. as long as you don't use the code it generates to ship.
(2) when you suspect that there can be a better way and you have time on your hand, it can suggest different code from several other perspectives - it inspires me to write the code that i've never tried before, provided that you have the skill to understand what it suggests, and you choose one and update the code it writes to fit your coding style, etc.
the bad ones:
(1) my css skill is poor (mainly dev backend, and not usually using css to customise appearence) and i use chatgpt and claude for frontend - chatgpt spits out some css and react js code which kind-of looks good, but it's so repetitive that i use much time to refactor them - at least it work and not spitting out tons of errors in my browser dev console. for claude, i'm not sure if it is trained with codes developed by a css expert i cannot understand it - in the beginning it also kind-of works but as requirements evolved it goes downhill.
(2) when i give it the, say, 10th requirement, it tends to forget the first requirement, and it becomes more and more forgetful - i have to remind it like hey, i have said this before and you forgot it, and it do what it does best - apologies, remembers the first requirement, and start to forget the second requirement.
(3) as other has said before, it doesn't respect the comments that i write - it randomly strips out the comment and add its own one (but for one thing, it _strips_ out most of the time instead of moving the comment to another random place, which could be worse)
(4) for the same piece of requirement prompt, it manages to generate wildly different codes for each iteration - so my approach is ask it to do tiny task each time with very specific prompts (like teaching toddles how to program...) and when something works, NEVER feed it to ai again (before requirement changes that makes it needed) or it will introduce random bugs into the working code. it's very slow, and i still have to convince myself how it justifies my extra time for teaching it to write instead of coding myself.
my verdict is:
(1) if you use ai to code it for you, somewhere down the road (especially when you are less experienced) you will need the same ai to update the code for you. and don't try to update code generated by chatgpt with claude - it will be quite mess-ier than if you use one ai to complete your work. so i suspect that it is an ai version of vendor lock-in as you need to keep paying for one ai engine if you still want your code updated properly (or less improperly than otherwise).
(2) use ai when even the worse bug it introduces won't explode the whole project - for me, front-end codes can be partly ai, but keep the back-end code humane. in that way, at least it only doesn't look good, something shifted on screen here and there, instead of a new zero-day or CVE.
honestly... If you train your AI coder on examples from sites like SourceForge (and many more) the GIGO rules will apply.
I'm still coding for a few SME's and all of them have asked me about my use of AI in my work. I've told them clearly that I don't and won't use any AI tooling.
They are all happy with that. One 20 person company is migrating away from Office 365 over the Christmas/New Year break. The forcing of Copilot onto the users was the last straw.
Once everyone is happy with LibreOffice they want me to look at changing their backend servers to Linux.
Is there going to be a report next week claiming the AI review platform made up responses and introduced errors in its assessment of code generated by the ai code slingers?
If AI can’t be trusted to generate quality code, how can an AI code reviewer be trusted to generate quality reviews?
Which LLM provider does the review platform use? Do code from agents using the same provider get higher ratings than code from someone else’s LLM
Maybe we need another AI platform to assess the quality of the code reviews and that it is unbiased in its assessment.
"AI-generated code contains significantly more defects of logic, maintainability, security, and performance than code created by people"
Of course -- it's been trained on all the half-arsed examples submitted by novices and crap coders to open forums. So what else would you expect?