back to article ANZ Bank test drives GitHub Copilot – and finds AI does give a helping hand

GitHub Copilot has steered software engineers at the Australia and New Zealand Banking Group (ANZ Bank) toward improved productivity and code quality, and the test drive was enough for the finance house to deploy the generative AI programming assistant in production workflows. From mid-June, 2023 through the end of July that …

  1. Anonymous Coward
    Joke

    Did Copilot write the report

    “However, considering the quantitative and qualitative analysis of the data generated in this experiment and subject to further analysis on security of the code suggested by Copilot, it is recommended to productionise GitHub Copilot at ANZ Bank.”

    1. Anonymous Coward
      Anonymous Coward

      Re: Did Copilot write the report

      "AN EMPERICAL STUDY ON GITHUB COPILOT" - Frankly, I think copilot would have done a better job. Excerable.

    2. MyffyW Silver badge

      Re: Did Copilot write the report

      I suspect it might have written the bit that said "Copilot generated more code, although the quality of software generated was worse than human-built software".

      We don't necessarily want more. I can remember a time when just having more code was a bad thing. I would argue it still is.

      1. Michael Wojcik Silver badge

        Re: Did Copilot write the report

        And this is the crux.

        Writing more lines of code is not a good measure of productivity. Completing coding tasks faster is not a good measure of productivity. Those miss at least two critical metrics: the quality of the software in practice over the long term (including ease of maintenance), and the development of both programming and domain knowledge among developers.

        Results on the former, when using LLM assistants, are mixed at best. I haven't seen any methodologically-sound studies on the latter, but my estimate is still strongly that using LLM assistants badly impairs it.

        From the article: One study from Microsoft, which now owns GitHub, found coding with an AI assistant improved productivity by more than 55 percent. But what's that study's definition of productivity? Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. That's a rubbish task. Any version of HTTP after HTTP/1.0 is complex and difficult to implement correctly, since a correct implementation has to conform to the standards. Implementing it quickly is not grounds for praise.

        Producing lines of code quickly is not good coding. Coding is not the whole of programming. Programming is not the whole of software development.

        Chasing "AI" assistants like this is a race to the bottom, a process of creating fungible, know-nothing "coders" who don't understand what they're doing.

  2. CowHorseFrog Silver badge

    How many engineers have Microsoft fired and replaced by co-pilot ?

    Oh thats right zero, what does that tell you ?

    1. elsergiovolador Silver badge

      Co-pilot is quite okay as a code completion tool. Very often it suggests exactly what you want to type in, which saves time.

      For other uses I found it quite poor. It sometimes introduces errors to the code that untrained eye may not spot and if you don't have a proper test suite - or even worse use such tools to write it. Good luck ;-)

      I noticed that some junior developers now produce a lot of code and they are confused why it doesn't work or doesn't work as expected.

      They don't know how to figure it out, even if you hold their hand and explain bit by bit how you found that something isn't correct, they cannot repeat it next time they have an issue.

      I sometimes wonder if this AI stuff is actually a Trojan horse created to destroy productivity in the West.

      Imagine in few year's time, there will be no competent developers still active and people straight off the Facebook advertised 6 month's coding bootcamp armed with a bag of malarkey spewing AI start working on firmware for industrial or healthcare devices. There is going to be a lot of accidents and catastrophes.

      1. CowHorseFrog Silver badge

        Code completion is a solved problem, for quite some time. If one knows the type of the variable, its a pretty simple task to find the available methods/functions, no intelligence there.

        Wht you are seeing isnt AI....

      2. Michael Wojcik Silver badge

        Very often it suggests exactly what you want to type in, which saves time.

        If typing dominates your software development time, You're Doing It Wrong.

        Moreover, there are a number of benefits to actually writing code, rather than waving your "AI" magic wand and letting it do the work:

        * Better cognitive focus and attention to the code you're producing.

        * Better retention of what the code does and is meant to do.

        * The friction of actually writing code is an inducement to refactor and do things properly rather than just inserting boilerplate everywhere.

        * Less or no risk (depending on whether you're consulting some other sort of source) of reading what's being handed to you, seeing what you expect instead of what's there, and sticking it in without catching the error.

        1. CowHorseFrog Silver badge

          The people who want AI are often lazy people who do a bad job in many other ways. They skip writing comments, their messages are poor and useless.

  3. GoneFission

    Hope at least someone at ANZ Bank is getting kickbacks for pushing Copilot and that sweet monthly $19-$39 per user onto the entire org's 5000-count developer base.

    > It required participating software engineers, cloud engineers, and data engineers to tackle six algorithmic coding challenges per week using Python. The group that had access to GitHub Copilot was able to complete their tasks 42.36 percent faster than the control group participants

    Oh gee, an AI assistant trained on homework that is good at answering test questions.

    1. ExampleOne

      It won’t be a popular point, but… if the reported gain in productivity of 40% is correct, this is the single best investment they could make.

      Doing some basic math, $39 per user per month is break even for a developer being paid $1000 a month if it gives a 4% improvement in productivity. They are claiming a productivity boost 10x that. Though it isn’t mentioned, I suspect they are using developers paid a lot more than $1000 a month.

      1. John Miles

        They'd like people to infer there is that there is a 40% improvement - but that would suggest coding is analogous to "short algorithmic coding challenges" that have been done before and consumed by the LLM.

        Then there is the fewer code-smells issues, that suggests they aren't using code scanning tools in IDE or CI/CD pipelines (or ignoring them) which would highlight such things.

        1. AdamWill

          Yeah. As soon as I read "The bank experiment examined what effect Copilot has on: Developer sentiment and productivity, as well as code quality and security. It required participating software engineers, cloud engineers, and data engineers to tackle six algorithmic coding challenges per week using Python. Those in the control group were not allowed to use Copilot but were allowed to search the internet or use Stack Overflow" I went into full rolleyes mode.

          Software engineering is not algorithmic coding challenges. You are proving nothing with "studies" which just involve artificial scenarios that are extremely amenable to LLM training, but of very limited usefulness for your bottom line.

        2. Michael Wojcik Silver badge

          that would suggest coding is analogous to "short algorithmic coding challenges"

          And that a savings on coding results in an identical savings in programming, and that in an identical savings in software development. Which is patently untrue, since each successive category is significantly larger than the previous one.

          (Anyone who can't figure out how "coding" is a considerably smaller subset of "programming", or "programming" versus "software development", needs to get the hell out of the industry.)

      2. Headley_Grange Silver badge

        It also ignores how much of the development cycle is actual coding. The gains will be much lower if they include user requirements analysis, design, specs writing, code review, unit test, integration, system test, qualification, deployment and support.

      3. elsergiovolador Silver badge

        How do they know it improves productivity? How is it measured?

        If they deliver code quicker? What if the code is faulty, but they don't know that at the time of assessment.

        After honey-moon period, savings, managers collecting bonuses and jumping ship, they may end up with unworkable pile of code debt and application that is not quite working as expected.

        1. CowHorseFrog Silver badge

          Simple like all power point presentations, they LIE. After all how to do you determine they are wrong ?

      4. CowHorseFrog Silver badge

        The person who is reporting these umbers is probably a professional bullshitter who never does any real work at the ANZ. THey are prolly making unfounded powerpoint presentations with graphs using numbers plucked out of nowhere. Pathetic that these jobs exist in the first place, they are a tax on humanity and the environment.

  4. ChoHag Silver badge

    Expert Python? It's not very surprising that an LLM would excell at copy pasta programming.

    1. elsergiovolador Silver badge

      If you squint, Python could look like a pasta from afar.

  5. Winkypop Silver badge
    Devil

    Oh YES!

    Fewer staff

    Trickier software and conditions of service

    Greater fees passed on to customers!

    Bank CEO’s wet dream.

    1. Anonymous Coward
      Anonymous Coward

      Re: Oh YES!

      "Bank CEO’s wet dream.

      Until the CEO finds out that the back-office reconciliation software that he got on the cheap has somehow lost last week's trades.

      1. elsergiovolador Silver badge

        Re: Oh YES!

        He will be long gone enjoying his bonus.

  6. Anonymous Coward
    Anonymous Coward

    All brought to you by the people who spent the last twenty years failing to get a simple tagging/autocomplete system like Intellisense working properly.

    Got to love a magic, cutting-edge tech solution when most people still ignoring compiler warnings.

    1. Michael Wojcik Silver badge

      Yes. That reminds me — since updating to the latest MSVC runtime a few weeks ago, all of our projects get a compiler warning from one particular bit of braindamage in one of the MSVC standard C++ headers. How can Microsoft have missed that? Every damn build shows that warning. Apparently that team can't even manage to release a C++ library implementation that compiles cleanly.

  7. Headley_Grange Silver badge

    I'm not a software engineer and this is a genuine question to help me understand this CoPilot stuff.

    I can code a bit. I've done it for work (utilities and test gear, not production) and still do for personal stuff. These days I might want an Apple script and my process is to have a look on the web, copy some code that does something similar to what I want, inspect it to understand it, then fiddle with it until I make it work. This works OK for me, but I doubt my code would be considered good. There's not much error/exception management, no memory management (I've read of it but don't really know what it is and I assume that Apple takes care of it if it's important). It's also surprisingly slow - which might be a down to Applescript but more likely due to me.

    My question is - is CoPilot simply a massively more "intelligent" version of what I do, benefiting from a huge database of "learned" code which it can regurgitate and does the code it produces risk having the same issues as mine in terms of memory, security, exception, etc. management?

    1. Anonymous Coward
      Anonymous Coward

      I find it most useful for doing the opposite, taking my basic code then saving me time in adding exception handling etc. as I just add the first few letters of a standard validation/log/exception block and it guesses what I would want 90% of the time.

      Other times when I'm crafting a tricky algorithm, I know it will give a bad answer but it's still a starting point which I can then tweak to actually make work, i.e. it helps me with writers block

      1. Snake Silver badge

        It rather makes sense: writing code is essentially describing logical processes to compute a result. Therefore AI should be a positive addition as the algorithms of the AI assistant can understand the known logical flow of a well-written program, based upon prior examples. If it is not good now, it will learn as corrections are make to its code designs.

        Whilst AI in the general world must learn non-logical and non-linear processes, AI for code development assistance can work within a structure of known elements.

    2. Michael Wojcik Silver badge

      CoPilot works like any LLM: token prediction. It has a high-dimensional parameter space containing a manifold shaped by the model weights, which were created by digesting a large training corpus. The session input thus far, or as much as will fit in the context window, effectively supplies a point in that space. (The tokenizer and transformer architecture act as a function that transforms that input vector in various important ways which I'm ignoring here.) From there it follows the gradient, with some pseudorandom jitter (the temperature parameter) to anneal it out of local basins.

      So, yes, it's just fancy autocomplete. Fine-tuning attempts to make it "prefer" (be more likely to produce) "good" code, for some values of "prefer" and "good". Code-generating LLMs do have the advantage that you can tune them with reinforcement learning using mechanical judges — successful compilation, compiler diagnostics, static analysis, and so on.

      But what CoPilot really does is provide coders with an excuse to not think. That's the value it's adding.

  8. Andy 73 Silver badge

    Insert snark here

    I've got to do the joke: isn't "Python Expert" an oxymoron, like "Military Intelligence"?

    OK, Coding tools can genuinely make a difference, and where you have a write-compile-run loop, even one fewer laps around the circuit can make a significant reduction in time taken. I'm all for tools that improve that.

    However, switching regularly between Java/IntelliJ and C++/VisualStudio, I note that the language has a big impact on the ability of the tooling to make useful suggestions and interventions. A strongly typed language with regular conventions and no pre-processor (Java) allows IntelliJ to do some remarkable code generation and refactoring tricks - as a small random example I can write a call to a method that I haven't written yet, and a single keypress will generate that method signature and body in the correct file elsewhere in the project. In C++, it often doesn't even know if a method I've just written actually exists when I try to call it from another class. VisualStudio regularly imposes additional compile cycles just to find out if my code is 'complete'.

    In those cases where the language doesn't assist the IDE in making useful suggestions, checks and refactoring options, then I would imagine an AI solution may become more useful in improving poor tooling - though not as an absolute guarantee that the suggestions being made are going to be correct (once more around the compile loop).

    But... I may be wrong.

  9. yogidude

    Programming challenges

    Have I got this right? Some artifical code challenges were more efficient when using an AI helper. Yep. That makes sense. That's not solving any real world problems that the programmers might encounter during the course of the work the bank actually employs them for.

    So the AI helper hasn't really been proven to make ANZ programmers more efficient at their work. Here is an alternative take: Microsoft is actually just trying to show investors that devs are using their tools. So they can announce a return on investment for all those LLMs they built with VC. Shareholder value is the benefit they are trying to sell here. Not real world development.

  10. Steve Davies 3 Silver badge

    Bank finds AI useful

    It just makes giving your poor customers the 'Computer Says NO' a lot easier.

  11. Anonymous Coward
    Anonymous Coward

    More helpful for expert programmers

    > It required participating software engineers, cloud engineers, and data engineers to tackle six algorithmic coding challenges per week using Python.

    Perhaps it was more useful for experts because experts are used to dealing with problems at a much higher level, and haven't felt the need to reimplement basic sorting algorithms and whatever in several decades?

    1. Brewster's Angle Grinder Silver badge

      Re: More helpful for expert programmers

      So your definition of an expert is "knowing to call the sort method, rather than implementing a shell sort from scratch"?

      Harsh, but fair. (Because, we've all known those "programmers".)

  12. vtcodger Silver badge

    At Long Last ...

    So, at long last, Clippy has found a real job. About damn time. Now let's see if he can keep it.

    I have to admit to being a trifle nervous about the job being with a bank. But what the heck, it's not my bank. And better a bank than a hospital or a nuclear power plant.

  13. Anonymous Coward
    Anonymous Coward

    A bank's got 5000 programmers!

  14. Adair Silver badge

    It's just a tool

    Like all tools, especially new ones, it takes a while to learn to use it well, and it takes a while to refine the tool to make it as focussed and functional as it can be.

    With coding, the AI needs to be built explicitly for coding, maybe even confined to a single language, so that the amount of rancid mashup code is minimised, maybe even recognised and excluded.

  15. Jason Hindle Silver badge

    One intriguing finding…

    “ One intriguing finding is that Copilot was the most useful to the most experienced programmers.”

    Far from intriguing, I would describe that as a statement of the bleedin’ obvious. It has been pretty clear to me from the start that someone with a good foundation and experience in a given subject matter would benefit from the emerging LLMs. For those who lack foundation and experience, it probably becomes the crutch that stunts their development.

    1. Pomgolian
      Holmes

      Re: One intriguing finding…

      Absolutely no surprises there.

      It's only when you have a bit of experience that you know what questions you really should and should not be asking. This is experience born out of years of figuring out the hard way how to do things. When you learn like this you retain that information and have a deeper understanding of the problem. By contrast if someone is always telling you what the answer is you never remember and you keep asking the same dumb questions. Thus in 30 years time when the last of the experienced programmers have actually died out there will be no one to take their place. Then again, if the output of AI generated code is fed back into the input, the quality of the output will inevitably fall just like continually photocopying a photo copy. The only question is whether this all comes to fruition before all the greybeards have hung up the keyboard.

    2. Michael Wojcik Silver badge

      Re: One intriguing finding…

      someone with a good foundation and experience in a given subject matter would benefit from the emerging LLMs

      I believe that remains to be demonstrated, particularly when the benefits are properly quantified and the indirect costs are assessed.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like