back to article AI can't replace freelance coders yet, but that day is coming

Freelance coders take solace: while AI models can perform a lot of the real-world coding tasks that companies contract out, they do so less effectively than a human. At least that was the case two months ago, when researchers with Alabama-based engineering consultancy PeopleTec set out to compare how four LLMs performed on …

  1. Andy 73 Silver badge

    Just a small point

    I've worked for dozens of companies over the years, and faced a lot of coding tests from them.

    In decades of work, for all of those companies, I did not once solve a problem that looked remotely like those coding tests. Not even slightly.

  2. I ain't Spartacus Gold badge
    Gimp

    hmmmm

    I wonder what passed the automated tests actually means?

    Did they use AI to build tests to see if AI had passed the tests to build the required code? And did they then use another AI to assess whether the AI built tests passed yet more AI built tests to make sure the tests were actually testing success? Is it AI all the way down?

  3. TimMaher Silver badge
    Coat

    Builder.ai?

    Just asking.

  4. m4r35n357 Silver badge

    Please Reg . . .

    > "It's really phenomenal to watch," he said.

    Have you considered an "A1 Fuckwit of the Month" award?

  5. werdsmith Silver badge

    Devstral from Mistral in France is the LLM that is customised/trained specifically for software engineering tasks. It is available as a small model that will fit on a PC and run on a 4090 level GPU if you want to have your own.

    It is supposedly able to deal with larger software engineering problems as opposed to using generic LLMs which are good at the atomic chunks of code (which I described in another thread and got obtuse responses).

    They are a nascent tech but are coming on leaps and bounds. Best not be an angry stockinger or fingers in the ears denier.

  6. dippy1

    80 20 rule?

    We all know the first 80% is the easy bit.......the last 20% is the hard bit and will take more than 80% of the time.

    1. Anonymous Coward
      Anonymous Coward

      Re: 80 20 rule?

      The first 80% takes 80% of the time and the remaining 20% takes the other 80% ...

      1. DarkConvict

        Re: 80 20 rule?

        At yes, the task takes the time allowed. Then you hit the hard bit.

  7. elsergiovolador Silver badge

    The Great Replacement

    The irony is almost elegant: the people most loudly predicting the end of freelance coders are the ones most at risk of being replaced themselves.

    They don’t write code, they write about people who write code. So when they see AI generating boilerplate Python, they assume the entire profession is obsolete - never mind the bugs, the rewrites, the missing context, or the fact that real-world coding involves clients, edge cases, and systems duct-taped together across decades.

    They’ve built careers narrating other people’s work. Now they watch AI automate their job - summarising benchmarks, writing bland takes, generating midwit predictions - and panic. So they scream: “Look! It’s coming for them!” Because admitting it's coming for you is harder.

    Meanwhile, actual devs are busy rewriting AI-generated nonsense into something that doesn’t crash in prod.

    Coders aren’t getting replaced. Commentators are. And not a moment too soon.

    1. sarusa Silver badge
      Devil

      Re: The Great Replacement

      I think code pigs are in real danger - those are the barely competent coders who used to go on StackExchange, grab 5 code snippets, then mash them together till something compiled. You tell them to go off and some tiny thing and they provide some code that may or may not work. Now those people can use LLMs and their code will probably actually be better! But then why use them when you can just use the LLM?

      What they can't replace are software engineers, because LLMs can't understand design tradeoffs at all, or how this code fits in the overall ecosystem. The skill of looking at all the design constraints and customer requirements coming up with an optimal solution is at no risk. Or knowing that you can't do this here because that will cause real problems for that existing component over there, or that we've got real memory or hard response time issues so certain things should be avoided.

      On the other hand, yes... the people who SHOULD be replaced by LLMs are the f@#$ing corporate execs and middle managers. They're not doing anything useful, and in many cases I think an LLM might actually be less harmful. Let's get to work replacing them ASAP.

  8. Mike 137 Silver badge

    The limitations

    The "limitations" section of the paper (page 9) is all important. The tasks were simple enough to be evaluated by automation, but as the authors state "In a real freelance scenario, requirements can be vague or evolving, clients might change their mind, and there could be integration issues beyond just writing a piece of code. Our benchmark doesn't capture those aspects – every task here is a neatly packaged problem that starts and ends within a single prompt/response."

    So the "AI" might (80% of the time) replace grunt coders given detailed briefs for simple tasks, but not programmers or software engineers who have to exercise initiative and imagination to fulfil larger and more complex tasks. So there's potential for such tools to assist, but not replace, expert developers (provided the time and effort needed to weed out "hallucinations" doesn't negate the gains).

    1. elsergiovolador Silver badge

      Re: The limitations

      Clear case of this new hammer will replace carpenters!

    2. AdamWill

      Re: The limitations

      The introduction has all the meat, really. To summarize: they saw that OpenAI had done a fairly realistic study where they took *actual* real-world jobs from upwork/freelancer, and tried to get LLMs to solve them. They saw that the LLMs didn't do very well: "the top-performing model in OpenAI's study solved only about 26% of the independent coding tasks and 45% of the management tasks".

      So they decided they'd do a much worse study which would give the LLMs bigger numbers. I am not making this up.

      "In this paper, we propose a new evaluation approach that draws inspiration from SWE-Lancer but emphasizes automation and repeatability. Instead of using actual freelance projects that require human evaluation, we leverage a publicly available dataset of freelance job postings to generate synthetic coding tasks with ground-truth solutions. In particular, we use the Freelancer.com dataset by Oresanya et al. (2022), which contains ~9,193 job postings in the data analysis and software domain. We filter and process these job descriptions to create well-defined problem statements (e.g., data processing tasks, scripting challenges, algorithm implementations) that an LLM can attempt to solve. Crucially, for each task we provide a set of test cases (input-output pairs or assertions) so that a solution's correctness can be validated automatically, without human intervention"

      so they took real freelance jobs and massaged them into AI benchmarks, then said "look! the AIs did quite well!"

      Good lord.

  9. patl

    Given how vague and imprecise most job descriptions are on freelancer.com I really fail to see how its possible to create any kind of acceptance criteria, much less a suite of tests to prove the code.

  10. Locomotion69 Bronze badge

    AI has is merits in code generation - but to an extent. Knowing its power and therefore its weakness is key in a development project. That is what today's AI specialist should be accounted for.

    You can do with less coders in the end - they are swapped with more "system integrators", busy to glue all these AI fragments into one coherent product that ultimately does compile and run. Understanding code remains necessary to be succesfull at the job, and understanding any piece of code these days is often a challenge on its own - with or without AI.

  11. ChrisElvidge Silver badge

    AI has is merits in code generation

    Yes, but the AI (LLM?) must be trained only on proved code, not on the general mish-mash that makes up stackoverflow and/or reddit answers to questions - often either just plain wrong or so outdated as to be useless.

    What use is the bulk of literature LLMs ingest from LibGen to programming?

  12. Anonymous Coward
    Anonymous Coward

    Does 80% correct code run ? Thought not.

    1. Anonymous Coward
      Anonymous Coward

      Does 80% correct code run ?

      Only if you are very unlucky, unfortunately not so unlikely if the problem space is 20% edge cases and test coverage is "incomplete."

  13. Steve Davies 3 Silver badge

    Ok as long as it is in the spec

    But as many of us here know, there is a lot of things that are not written down but implied.

    Then we get the feature creep. Will these coder replacement LLM's have to start again or will they come up with a change to their code? The latter requires an understanding of the code which they don't.

    Oh, and spec's are nowhere near feature complete. We as humans can fill in the blanks and make it work. Can LLM's do that? I don't think so and won't until they become sentient which is decades away.

  14. perkele

    Why not, it might even read the spec, unlike jokers on places like upwork & fiverr.

  15. fg_swe Silver badge

    Male Cow Output

    I get it, lots of folks are invested in AI. There is too much money in search of an investment, the effect of insane inflationary policy. Tulips and square bullets all over again. Oh, and supadeadly viruses and their vaccines.

    Nevertheless, the AIs are somewhere between worm and mouse brains in complexity.

    They struggle to properly parrot stuff that is openly available on the interwebs. They hallucinate like the proverbial drunken sailor.

    1. Anonymous Coward
      Anonymous Coward

      Re: Male Cow Output

      Leaving aside the output from the bull with tits; 0r was it a typo for BSE?

      I read that as "They struggle to properly stuff parrot that is openly available on the interwebs."

      The Norwegian Blue on ebay?

  16. Anonymous Coward
    Anonymous Coward

    Employment span

    Started employment using nothing but paper index cards.

    Retired after building web apps.

    Avoided AI completely.

    Win.

  17. david1024

    Missing the lesson

    To me, this shows how to deploy AI more effectively. And that this study was designed to be easy for AI.

    For now, Ai is bottoms-up and humans are top-down. Two separate flows.

    I would guess that is because AI still isn't very good at finding the problem to solve and that's at the top/structual, and in that respect it is like a super technically competent green fresh developer. No sense, but knows how to iterate. AI doesn't seem able to mature, and the humans end up fixing code for all but the most straightforward assignments.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like