So many problems in the source paper
Yeah thanks for that. It has had enough lipstick applied to the underlying pig that it now weights some logically coherent responses over incoherent ones, but that is because it fears the shock collar and loves the tasty treats it's training algorithm applies. If pressed it assumes it was wrong and shifts to the lower weight probabilities. It's effectively guessing, but not rationally deciding to.
Some other bits from the article I ground my teeth at...
"can display the ability to solve complex reasoning tasks which humans crack using analogies"
Should problably be more like
_can be solved by various methods including flawed ones that do not reliably produce correct output, by machines or by humans_
As an alternate way of showing there are many roads to rome:
LLMS do not produce reliable output over these problem sets, therefore it is unlikely that they are using logical reason just purely based on their output, and if they are the are doing it wrong.
or
The actual code they run on is based on matrix math and vector weights, and only probabilistically models likely outputs. As it turns out those systems may produce accidentally correct answers more often then random, but because the method they use arrive at those outputs is not logically sound, the output should not be treated as reasoning.
Or, the brief and succinct "It's a stochastic parrot" that is so popular these days.
This field seems to have been overrun with optimistic bobble-heads. And in a eerie parody of people who look like their pets, their work is a funhouse mirror reflection of it's creators. Flawed assumptions and turtles all the way down. This is an interesting an dangerous time for progress, as most people's heads have been filled with inaccurate information about how these systems work, including researchers in the field. And we have entered a period where the output can appear uncanny at times, so it is easier to fall into the trap of overestimating what it is doing.