Re: Malformed questions worthy of ChatGPT
Ok, not so long - the paper is 11 pages long and the vast majority of it is just stating the programming problems.
And the protocol is utterly dreadful, especially for testing an online service (which can, of course, be updated without your knowledge): the English and Croatian runs were separated by a week (implied to be the time it took to translate the questions) so no definite statement that they were guaranteed to be using identical software both times! Repeatability in Science? Pah, that is for weaklings!
They did not bother to perform multiple runs in each language, to check for any changes due to randomisation in any part of the process or for changes due to subtle differences in the wording of the prompts.
They openly admit that they do not know if the model is still learning/retraining (i.e. immediately incorporates its recent conversations back into itself[1]) and then ascribe the change in behaviour to that process - if you do not know, you can not ascribe! And, anyway, inputs spread over a week? How is that recent in terms of a public online service? How many other people's inputs would have been included if the model *was* continually updating itself?! Even if they do not believe that working on such shifting sands was a problem to their conclusions, they fail totally to even mention that, as if they had never even considered that!
As this is a symposium paper, not something published in, say, Nature, there is some leeway to be given - but not that much!
[1] btw, for various reasons, there is no good reason to think it is continually retraining[2].
[2] yes, they probably are recording your sessions, to feed into the *next* model - and whatever half-hearted "protections" are being applied to your inputs to stop you asking naughty questions.