Re: The problem with this approach
Testing is a whole other class of problem with "keep trying until it works" AI code. For a really simplified example, imagine you say to the AI, "write a program that multiplies a number by 10" You then write a test that passes in the number 15 and verifies that the answer returned is 150, and the test passes.
The problem is that the AI generated the code "return 150;"
Hmm. If one were being obtuse, or the AI had restricted scope, you could get many 'correct' answers:
"write a program that multiplies a number by 10"
could give you a syntax error, because 'a number' is a string and '10' is a number. Variable type mismatch.
another possibility is you get
"a numbera numbera numbera numbera numbera numbera numbera numbera numbera number"
which is 10 repetitions of the string "a number"
or indeed, I could interpret 10 as binary to get
"anumbera number"
or regard "a number" as base64 encoded and multiply the integer represented by 'a number' by the integer represented by the base64 encoding of '10' since you are using 'multiply' and as you can't multiply strings, you must interpret the characters as numbers, whereby base64 is one relatively sensible approach.
Yet again, multiplying by binary 10 is bitshifting by one position, so you might bitshift the string by one position (you can do things like that in C).
Now, of course, the natural language approach of taking "a number' as a numeric variable and multiplying by base10 10 is the 'obvious' approach to a human, which is what the AI is trying to emulate, but in the context of the problem, a human might just check they are on the right lines by checking that the approach is correct by asking a question of the problem setter. I'm looking forward to when AIs do this and ask questions to clarify their understanding, and can show their working.
Being human means you carry around a lot of context in your head, which most AIs lack: often interpreted as AIs lacking 'common sense'. I remember a project long ago that aimed to build a common sense database for use by AIs: you typed in natural language statements/facts to add to it's 'knowledge' of the world. I don't know what happened to it.
In limited and well defined contexts, AIs can be great, but they can end up doing really stupid things. Humans are reasonably good at identifying out-of-context really stupid stuff.
Using AIs to drive cars or fly aeroplanes is an interesting case: you need to show they are safer than humans, which probably means making fewer mistakes than humans (the unachievable goal is zero mistakes/catastrophes), but they are capable of making really, really stupid mistakes: like driving into the side of trailers that are the same colour as the sky, or activating MCAS repeatedly. Humans do similar things - like following GPS navigator instructions directing them off quaysides, or holding an airliner in stall while it drops many thousands of feet. The issue is not that AIs make mistakes, as humans do too, but that AIs make mistakes that are non-human in character - we don't give AIs the free pass we give to people who have a bad day, or are distracted, stressed, or panicked.
When it comes to programming, half the task is defining the problem (analysts do have a job to do), and AIs don't currently sit down next to people and discuss the requirements: we are at the stage of spoon-feeding them baby-food in the form of well defined problem statements. If I can talk to an AI and discuss whether it would be better to use a linear search, a hash-table, or a Bloom filter for a particular application, then I think we would be getting somewhere. Converting well-defined problem statements into code is not what programmers do.