Not surprising
The thing is, LLM have no clue of what chess actually is and which moves are actually valid. My kids used to play like that, more like Calvin Ball...
The Atari 2600 gaming console came into the world in 1977 with an eight-bit processor that ran at 1.19MhZ, and just 128 bytes of RAM – but that’s apparently enough power to beat ChatGPT at chess. So says infrastructure architect Robert Caruso, who over the weekend posted the results of an experiment he conducted to “pit …
I've been playing some games against it lately to track how well it's reasoning and deduction are going - not well.
A favourite is to play "wheel of fortune" with it, I'll spin up an online game of it and feed it a screenshot and a reminder of the rules (no adding/removing spaces, no altering of the letters already in, the clue is x etc..) and. the. amount. of. times. it. breaks. one. of. these. basic. rules. is. astounding..
These aren't something I'm expecting it to know by context of the prompt, they're spelled out right in there and it can't hack it, just apologises and promises to do better. but then immediately doesn't.
I get it's not actually intelligent and pretty much just a really big complicated search, but even by those standards, it's ignoring half the search term.
Absolutely outstanding that people are willing to trust it to do bigger and more important things - had a call this morning with the boss wanting to replace our call room people with it, good fucking luck mate...
<......."to track how well it's reasoning and deduction are going ".....>
These things don't do any "reasoning and deduction", they don't "know" anything.
You say that you get that it isn't actually intelligent, so why are you expecting it to exhibit characteristics which would only be shown by being "actually intelligent".
They are at heart as you say, just a glorified search emgine.
I once wrote an entire chess engine which played a half decent game. The tricky bit wasn't encoding the rules of chess and the allowed moves, it was writing algorithms to have strategies to win. It adds a new dimension to the game when you have to approach it from this perspective. The program code has to be highly efficient too as it faces an exponentially increasing amount of processing required with each move ahead it analyses. That is the main limiting factor in its ability to play a good game. It sounds like ChatGPT doesn't even have the rules figured out let alone any strategies. Not sure you could train it on chess game records as there are virtually unlimited combinations of piece layouts.
> It sounds like ChatGPT doesn't even have the rules figured out let alone any strategies.
Of course not. Why would it? That's not how LLMs work. They don't, by design, encode "rules" (or heuristics) in the sense of game-playing strategies.
> Not sure you could train it on chess game records as there are virtually unlimited combinations of piece layouts.
Well, humans manage that - and the most successful human chess players most certainly do train on game records. As do highly successful game-playing programs like Alpha GO (not an LLM).
> The tricky bit wasn't encoding the rules of chess and the allowed moves, it was writing algorithms to have strategies to win ... virtually unlimited combinations of piece layouts
Curiously enough, back in the day, that was called "AI research": "Can a computer ever play Chess?". Yes[1]. Then it morphed into "Can a computer ever beat a human at Chess?". Yes[1]. Then "Can a computer ever beat a Grandmaster Human at Chess?." Yes[1]
One way that we can tell that these LLMs are still flailing around trying to actually be useful and cost-effective is that they are still being flogged by the salespeople as marvellous toys that all the cool kids have. If the mechanisms are ever entirely subsumed into the day-to-day "100 Algorithms That Every Effective Programmer Must Know" then we will see that they are actually worthwhile - but by then, guess what[1]/
[1] "It it works, it isn't AI"
> Playing chess requires intelligence.
Apologies for stating the bleeding obvious, but there are (non-LLM) chess programs that can compete at grandmaster level. Are you prepared to call them intelligent?
Chess (and other game) playing occupies at best a small niche in the ecosystem of problems/tasks that require what we might like to label as "intelligence". LLMs are clearly not suitable for that niche - there are other ML systems that fare much better, because that's what they were designed for. Whatever LLMs are designed for (and that's far from clear to me), chess is clearly not included.
> Only in the sense that they encapsulate the human intelligence that went into their design.
Well, you could say the same for LLMs.
> A1 is less intelligent than 32kb of (non-intelligent) machine code created by a human.
No, it's just a whole different thing. You can't do what LLMs do (i.e., generate plausibly human-like responses to text prompts) in 32kb.
I take it, BTW, that when you say "intelligence", you actually mean human-like intelligence. That's a very high bar indeed, and it's bleedingly obvious that neither LLMs nor game-playing programs have that.
No I could not! I would rather say that LLMs encapsulate the ignorance and hubris of their creators (attempting to distill the idea of "intelligence" to a single generic model). I dispute _any_ potential for intelligence in a NN.
The whole point of these discussions is that LLMs _are_ being presented (by tech "bruvs" and credulous journalists) to the public as intelligent. There is no point attempting to trip me up on a "technicality", as LLMs are clearly trounced by algorithms in any domain-specific task.
There is no generic solution.
> I would rather say that LLMs encapsulate the ignorance and hubris of their creators ...
Rubbish. I've met quite a few researchers and developers (including colleagues) of recent ML techniques such as the transformer architecture underlying LLMs, and they are neither ignorant nor hubristic. They are, on the whole extremely smart people who are intensely self-aware about what they are doing, which is not ...
> ... (attempting to distill the idea of "intelligence" to a single generic model).
I believe you're picking a fight with the wrong people (maybe you meant those hyping and marketing AI?)
> I dispute _any_ potential for intelligence in a NN.
It seems to me that you've set the bar of "intelligence" at "capable of solving all the problems humans are capable of solving at least as well as humans". Congratulations; you've set AI up to fail by definition - an AI would have to be human to satisfy that requirement. But hey, it's fun to sneer...
Personally, I see "intelligence" as a graded, and highly multi-faceted thing. So humans are pretty nifty at solving human problems. Bats are really good at solving bat problems. Octopuses are terrific at solving octopus problems. Oh, wait, you thought those weresimple problems requiring little intelligence? I imagine your average octopus might regard you and I as rather dim - I know I'd be pretty rubbish at solving octopus problems. I'm just not, y'know, designed for that.
> The whole point of these discussions...
Is it?
> ... is that LLMs _are_ being presented (by tech "bruvs" and credulous journalists) to the public as intelligent.
Personally I'm not engaged or engaging with that debate. Yes, we know bad people hype up some new tech to exploit silly people, yadda yadda, but I'm not getting involved in a pissing contest about who can be the angriest about this. It's tedious and repetitive, as Reg comment sections amply attest to.
We ARE angry. Consider the grotesque over-engineering which threatens our energy distribution networks. The inefficiency of training & running LLMs is an affront to everything I learned about engineering.
It is also wrecking any hope of privacy, and appropriating artistic works on an industrial scale. It is enriching some of the vilest and most dangerous people on the planet, whilst threatening the financial future of the general public.
You fucking bet we are angry.
Okay, okay, be angry - I'm more than a little annoyed about it myself - but please, if you can manage it, try to post something a little more interesting/original than "AI is shtoopid and it sucks and it's not even AI" (which summarises what 1,000 other commentards have already posted, usually with many, many more words). And please bear in mind that you're preaching to the converted here. If you're that angry perhaps you can find a more effective platform to articulate your rage.
Also, let's try to avoid throwing out the baby with the bathwater: ML (if "AI" pains you too much, although they have become effectively synonymous) has genuinely useful applications, and some of the current research is genuinely interesting and worthwhile; it would be a shame if the entire area were to be tarred by the brush of a bunch of unscrupulous bros and gullible droids.
A highly sophisticated string-slinging contraption by all appearances. Do you think this is how actual brains work, or are we skipping that part & going on to develop a new form of "intelligence" from scratch?
No need for mathematics, logic, or artistry, just enough conceptual fluff to dazzle most of the people most of the time.
> Do you think this is how actual brains work, or are we skipping that part & going on to develop a new form of "intelligence" from scratch?
I guess you were being flippant, but that's actually an interesting question. I mentioned earlier (not in so many words) that it seemed to me that your conception of AI equates to a technological simulacrum of humans. Which is both virtually impossible and hugely pointless (there are better, simpler, and more fun ways of engineering humans1 and, honestly they're not in shortage).
So my strong suspicion is that whatever we design and engineer that might one day be widely accepted as constituting some form of genuine intelligence (though not, of course, if your notion of intelligence equates to human simulacra) may well be distinctly non-human-like. Which, personally, I find quite an intriguing, if somewhat unnerving prospect.
1With some invaluable assistance from a colleague I've even made one myself. His uptime is around 22 years now, and he is remarkably robust and adaptable, if occasionally unreliable.
I have followed up pretty much everything you have claimed, but you still seem to be clinging to hope rather than being realistic. You seem to be satisfied with an "AI" that is not intelligence as we understand it, while not tolerating criticism from those of us who prefer words to mean what they say.
Appealing to "emergence" is just pumping more smoke into the mirrored room.
> ... an "AI" that is not intelligence as we you understand it, ...
If you were talking about current "AI", I agree (and if you read back what I have written you will see I never claimed otherwise). But my views on intelligence are both graded (it's not an either/or thing), and clearly more multi-faceted and more nuanced than yours. Sorry, not everyone agrees with your viewpoint; please don't pretend otherwise. I have argued that the "human simulacrum" version of intelligence that you subscribe to is not only unachievable (and hence designed to fail, a straw man), but also limited in scope and imagination (one implication of your stance would appear, for instance - but correct me if I'm wrong - that you reject the idea that other animals can have any kind of intelligence).
I am, as it happens, "satisfied" with any technology which is useful; I really don't give a fig if you or anyone else chooses to, or chooses not to label some technology as "intelligent". If you're happy to abide by a self-limiting version of "intelligence" which by definition excludes the possibility of it being "artificial", fine - knock yourself out. But, again, don't pretend that everyone shares that stance.
> Appealing to "emergence" is just pumping more smoke into the mirrored room.
Erm, I did not do that. If I expressed a thought that perhaps (some) future AI might turn out to be non-human-like, I was not suggesting that would just "emerge" out of nowhere. In fact, if it does, I imagine it'll "emerge" - like all technological advances - through hard maths, science, and engineering. Sorry if you misunderstood.
A lot of those chess-playing programs were trained in a very similar way that LLMs were trained. Throw a bunch of chess situations at it, have it make moves, reward it for winning, punish it for losing, go home for a while, see how good the thing is eventually. LLMs were trained with a completely different goal (how likely are these words to appear next to each other). In both cases, a lot of thought went into the mathematical aspects of this process, but if you're suggesting that the chess-playing bots are following algorithms written by chess masters that they're merely evaluating, that's not generally what happened. You can make a chess engine that is good at playing chess without being able to play chess well yourself.
Playing chess does not require intelligence. Therefore, I have a feeling we will agree about the intelligence of current software. LLMs are not intelligent. Chess engines are also not intelligent. Neither of these will ever become intelligent as long as they keep training it for the set of goals they have. But a chess engine is capable of playing chess so well that one way to see if people at competition level are cheating is to run some of them and see if the human is getting too close to their moves, and it isn't doing that by having some intelligent human personally tell it how to make the right moves. It is doing it by being able to calculate much faster than the human can.
I'm intrigued by what people understand by "intelligence" (the thing which AI doesn't have).
So, for instance, where do you stand on non-human animal intelligence? Is a chimpanzee intelligent? A dolphin? An octopus? A crow? A dog? A bat? A lizard? A hive of bees?
All those organisms continually solve hard real-world problems in order to survive and reproduce (Of course their problems tend to be rather different from ours.) Never mind human-like artificial intelligence; we haven't even achieved anything near bat-like or octopus-like artificial intelligence.
> Try training your AI model on chess game records and you would have, I suspect, a very different outcome.
Well, ChatGPT - like the other LLMs - was trained by pulling in everything that could be found on the web, and every other digitised text source, which includes the entire rules of Chess (many times, in book and web form), more books discussing the Great Players and their strategies and, of course, play-by-play records of games. At every level (chess by email, forum post etc).
So it had all the data that you suggest - and yet it failed.
Now, using *sensible* ML techniques on Chess rules *will* create a machine that can play - you can literally even do it with (big enough) matchboxes, in precisely the same way as you can train matchboxes to play tic-tac-toe - although it will take some time for that mechanism to complete its learning phase[1]. But, logically, it will work. What is the MAJOR difference between that ML and ChatGPT? The matchboxes are trained and punished/rewarded against a specific goal and with specific metrics. ChatGPT - nah.
> Why would ChatGPT be any good at chess?
Why would ChatGPT be good at *anything*? Unlike the matchboxes, it has not been trained with any sane goals - it won't be good at designing airplane wings, or baking bread, or writing Python code.
> not a problem to which general knowledge, or even basic understanding of the rules, gives a great solution.
There are NO problems to which general knowledge is the solution - except for winning pub quizzes or beating The Chase[2]. Knowledge plus nous - that'll get the job done. Heck, nous, time and an inquisitive mind will get over the lack of general or even specific knowledge[3].
So what great solution can ChatGPT or its cousins give us?[4]
[1] the Heat Death of The Universe says "hello"
[2] and the prizes you get for either do not compensate for building an entire LLM.
[3] we call this "research"
[4] well, comments on El Reg do say that it works as a super-ELIZA, but there are cheaper ways to achieve that goal.
> Why would ChatGPT be good at *anything*? Unlike the matchboxes, it has not been trained with any sane goals
Well, to be fair, it does have an ostensible "goal": to generate plausibly human-like textual responses to textual prompts. Whether you think that's a sane (or even a useful) goal is another question.
Why all the downvotes to mr. 42...39?
The notion that there's no reason to expect a LLM to be good at chess doesn't seem that controversial to me. It seems a reasonable claim to me, given that chess is not a language task. LLMs are notoriously bad at math too. But even if you disagree, it's not a stupid or offensive concept.
Is it the concept that general knowledge and basic understanding of the rules are not sufficient for "a great solution" to chess? Again, is that controversial? I have general knowledge and I do know the rules of chess, but I totally suck at the game. Not as much as ChatGPT, apparently, but I'm really far from "great". There's a lot more to good chess than knowing the rules.
The suggestion that you would get a better chess program if you trained a machine learning model on chess game records, also does not sound controversial to me. I mean, of course you would. How much better, that's hard to say without giving it a try, but it would definitely be a lot better than a general-purpose language model. Actually, isn't that how the grandmaster-beating program was made?
Am I missing some subtext here?
I would guess that many took this sentence:
"Try training your AI model on chess game records and you would have, I suspect, a very different outcome."
And interpreted it as this sentence:
"Try training your AI model large language model on chess game records and you would have, I suspect, a very different outcome."
If you read it like that, then they're probably wrong, because there were chess records in this LLM's training data. Of course, you could train an LLM only on chess records, which might make it a little better, but probably just shoot its language abilities in the foot without helping too much with the chess because words are far less reliable to encode chess position than a chess board data structure. But if you read it the way they actually wrote it, then they're right, because people have used non-LLM machine learning with the goal of having it be good at playing chess and it worked very well. Of course, if they were still thinking of an LLM there, then the interpretation could be correct, but I didn't think that was what they were saying.
Try training your AI model on chess game records and you would have, I suspect, a very different outcome
No you wouldn't. LLMs are at heart prediction machines using tokens based on words. It could probably do a pretty passable imitation (if you didn't look too deeply) of describing a hypothetical chess game using words since it has probably ingested a number of articles written about some of the great chess matches. But nothing about the way it works would give it any real knowledge of how to play a game let alone play it well.
Levy Rozman (GothamChess on YouTube) has run 'competitions' between lots of the AI models. All were terrible. Pieces disappearing and reappearing, new pieces materialising from nowhere, illegal moves etc.
I doubt that any could beat the ZX81 1K chess game at this point.
Curious about which processor that was. According to WikiP it was a 6507 a 28 pin version of the 40 pin dip 6502. It had 12 pins amputated including A13-15 leaving only 13 address lines (8k).
A decent chess program running in 128 bytes of ram and an 8k address space is truly impressive. Beating ChatGPT at anything cleverer than tic-tac-toe† not so impressive I would have thought.
† anyone losing a game of noughts and crosses is probably true manglement material and definitely C suite.
There was a ZX81 program called 1K Chess, which ran on an unexpanded ZX81 which had, (guess what), only 1K of memory to hold the program itself, as well as the game state.
This included drawing the board and the screen memory (but the collapsed screen file probably only used 72 bytes of memory), but did rely on some of the display routines in the ROM.
It was not super clever, but did give a surprisingly good beginner level chess game.
I used to play for my school and university college Chess team earlier in my life (before home micro's), but I remember giving up chess completely when White Knight Mk.11 on the BBC micro would beat me more times than not.
What does your algorithm look like for that, because I had a program that played it which I could beat if I started first, but only because it had what I assume was a hard-coded algorithm rather than matching mine. Thus, if I started with a corner cell, it would take the middle one instead of blocking me with the opposite corner, and it would always obligingly pick one of the remaining corners for its second move, allowing me to guarantee a win. It had two ways to guarantee a stalemate. The only way to win that game is the other side isn't thinking right and you are. Once both players are older than about five, I think that covers most of the games.
>> All you have to do to win a game of noughts & crosses is to be the one who starts ;)
>.... I had a program that played it which I could beat if I started first,
The current iterations of AI have no "memory", in the sense that each prompt is evaluated against the data for that model with no awareness of previous prompts. It's a large learning model, but has no concept of "learning" from past results until data is fed into a new model. So something like a chess game is going to completely fail to get a decent solution since the AI cannot evaluate future moves against past ones.
It's not actually learning though. The model is static. If the model is updated then it would presumably have improved on whatever has changed between that and the previous model, but ChatGPT and other LLMs do not actively learn through interaction.
Even if it says it will do better, and various other platitudes, if you ask it if it actually does learn from its mistakes, it will admit that it cannot, because the model hasn't changed.
Chess is a game with very strict semantic rules, and a punitive branching factor, so good solvers need very efficient depth first-like heuristics, combined with long term forecasting, and responding to an opponent's changing strategy.
In contrast LLMs are assumed to be context sensitive language based, at least in training, on mostly English scraped data where semantics at best are decided on the fly, where no real hard rules apply.
They have some use as improved search engines (with lots of ducktape) or to figure out previous solutions to related queries, but that's about it. Some parts of the hype are essentially Moravec's paradox in reverse, they seem to solve tasks that cost us time as if by magic, but get stumped by problems we consider much harder (theorem proving, combinatoric search).
Most LLMs ironically perform atrociously in context free languages, where they could leverage the rules of the language, instead favoring correlation and massive context windows.
Ask an LLM to get a bibtex entry for a paper, and you get very high hallucination rates.
It is in part why LLms can suggest code that will never compile, let alone make semantic sense, whereas a heuristic with constraints from the language's grammar would never suggest it because it's a search space domain error.
ChatGPT is a Large Language Model - it (and others like it) are at heart very extensive, very complicated and very sophisticated word processing programs.
I am staggered that anyone would expect them to be able to play chess, even more staggered that at least one person was sufficiently misguided as to try it out. It is equally staggering that otherwise sane minds will argue about the outcome.
It just goes to show the extent to which gullible people have been conned into thinking that these things are in some way intelligent and are able to 'think' for themselves.
One day it will be something more serious that one of these morons tries to do with an LLM, and the consequences of the failure will have far reaching repercussions; in all probability, all bad.
In fact when I look at the orgasmic frenzy some coporate heads are working themselves up into over their percieved need to integrate this crap into their businesses, I suspect that it is already happening:(
This post has been deleted by its author
Yeah, but it is good-clean-fun™ to cut overhyped Jack-of-all-trades systems back down to size every now and then imho, here in a retro-computing steampunk sort of way, especially given the wild claims and hubris from peddlers of 'Artificial General Intelligence' and 'superintelligence' ...
Meanwhile of course, we can get a higher level of satisfaction by challenging the likes of AlphaGO, at go, with success, or Noam Brown's Pluribus, at six-player no-limit Texas hold'em poker (on todo list?).
More broadly, I'd imagine some sort of Kasparov's Law could apply in this, that 'a human plus a machine will beat a super-computer [by itself] quite handily' at games and more general tasks. The key challenge is to develop software that is properly helpful I think ...
This isn't exactly surprising.
Why Chess? ChatGPT can't even handle a basic game like tic-tac-toe.
It all comes down to how ChatGPT functions - doesn't matter how vast the difference between processors, graphics processors, etc., ChatGPT simply isn't built to function that way. It determines a response based on the statistically most probable answer to a given question/request.
Its empty platitudes about doing better are just that - empty platitudes. Nothing is actually changing in any meaningful sense - it still uses the same model, and even if it takes into account the fact that you said it was incorrect, that may or may not influence what ends up being the statistically most probable answer.
You can literally play tic-tac-toe with ChatGPT and give it all of your moves in advance, have it play through based on those moves, and it still loses - every time.
So, if it can't handle tic-tac-toe, there's no point worrying about Chess.
You're better off asking ChatGPT to program a Chess opponent - it would likely be better at that.
https://www.reddit.com/r/AnarchyChess/comments/10ydnbb/i_placed_stockfish_white_against_chatgpt_black/
Apparently the king can jump over the bishop, the knight can move like the bishop at will, the queen can jump over knight, pieces appear "at will"...
No surprise a PROGRAM actually "knowing" the rules beats AI.
What about Tic Tac Toe Atari 2600 vs ChatGPT? Actually Pre-Atari2600 should be enough...
All,
I’m the person who ran the experiment. A few people have asked whether ChatGPT even understands chess. It actually does—and the experiment was its idea.
During a conversation about chess AI, it explained the differences between engines like Stockfish and Gemini, then claimed to be a strong player in its own right, insisting it could easily beat Atari’s Video Chess, which only thinks 1–2 moves ahead. Curious about how quickly it could win, it requested that I set up a game using the Stella emulator.
Since I had mentioned that I was a weak player, it offered to teach me strategy along the way. When it had an accurate sense of the board—either by tracking it correctly or with my corrections (the norm)—it followed through: quizzing me on moves, explaining options, and offering solid guidance. At times, it was genuinely impressive. At others, it made absurd suggestions—like sacrificing a knight to a pawn—or attempted to move pieces that had already been captured, even during turns when it otherwise had an accurate view of the board.
Regardless of whether we’re comparing specialized or general AI, its inability to retain a basic board state from turn to turn was inexcusable. Is that really any different from forgetting other crucial context in a conversation?
Can we agree that it should be able to remember the board layout from turn to turn just like any other data in a conversation? There were times when it had the board correct, and then I communicated a move made by the Atari using the chess notation it had previously explained to me. However, when I then asked it to describe the board's current state, it often had 4–5 pieces misplaced. Does retaining the board configuration from turn to turn require a chess engine?