To be fair this has made me feel better about myself.
I'm shit at chess, so I'm glad my eventual replacements are equally as shit at it too.
Not content with humiliating ChatGPT at the hands of Video Chess on an Atari 2600 emulator, Robert Caruso has tried again, this time with Microsoft's Copilot. Theoretically, the result would be the same, and Copilot would take a similar drubbing. Yet... what if Copilot triumphed where ChatGPT could not? "There's no reason to …
My neighbor across the street (hi Ms Sandi) and her family had an Atari in 1983, and I could never beat any of the games: Chess, Frogger, Indiana Jones, Combat, or any of the others for that matter.
I didn't get good ay any game until Hard Drivin' in 1989 where I could write sentences on the high score board. I loved that game, and the redhead (Tally) who ran the arcade -- pity its all been downhill since then.
The program was great great on the 2600. The game console had very little RAM, If memory serves, about 64 Bytes though there might have been a bit more on the ROM cartridge. When it was the 2600's turn, the screen would flash colors as it needed the memory space for the screen in order to figure out its move. The result of this was that you got to stare at colors flashing on the screen and not the board while it thought. Higher levels blanked out the screen sometimes for minutes. I was told by another player that it would sometimes cheat, which I doubted at the time. Afterwards I made closer note of the pieces on the screen. Randomly a piece would appear on the board for Video Chess, maybe once in 20 or so games, which had been previously taken by me. One time a new queen spawned for the 2600 and not by a pawn making it to the last rank against me.
The program also allowed the user to preset the starting position. If one set up two rows of opposing Kings then started play, the screen would turn grey and lose sync and it would make a high pitched whine from the TV's speaker.
Great fun and not half bad at playing, even when it didn't cheat!
> The game console had very little RAM, If memory serves, about 64 Bytes though there might have been a bit more on the ROM cartridge
This does make me wonder about how well these LLMs would do against the infamous 1k ZX Chess. Which as the name suggests, ran on an unexpanded zx81 with just 1k of ram!
https://en.wikipedia.org/wiki/1K_ZX_Chess
Admittedly, the zx81 had a faster CPU and around 400 bytes of RAM to spare, but still...
[*] 3.25mhz z80 versus 1.19mhz 6507, though differences in architecture meant that the speed delta was probably only around 20%, given what I remember of similar speed-comparisons from C64 vs Speccy flamewars in the days of yore
This post has been deleted by its author
I imagine if there could be a physical manifestation of these two A.I.'s, neither would stand a chance in a game of battle chess. I expect they might show all the physical agility of the robots playing football in this short clip https://www.bbc.co.uk/news/videos/c5ylkyrkjnzo
The future is looking so bright! </sarcasm>
"It claimed it could think 10–15 moves ahead — but figured it would stick to 3–5 moves against the 2600 because it makes 'suboptimal moves' that it 'could capitalize on... rather than obsess over deep calculations.'"
Imagine, if you will, someone on reddit or a comments section (maybe even here), or perhaps usenet (RIP), spouting off about something they have only a passing knowledge of. Now imagine that you're Alan Turing and you're attempting to distinguish between that and what we see above.
As for Copilot and chess, I conducted quite a different experiment recently, as our work 365 subscription now includes it. I asked for a t-shirt design with a particular chess opening on it and a specific text. Obviously it failed, first producing a kind of Etsy-esque view of half a design alongside half a t-shirt with a similar design (the design in question of course not being what I'd asked for).
After managing to refine to just giving me the print image, but getting ever further away from anything resembling a chess board as opposed to an assortment of chess themed images, I asked if it could just give me an image of a chess board in the starting position. There should at least be a good number in the training data right? What I got back is be best described as Howard Staunton's fever dream. The 9x10 board did have 2 rows of chess pieces at each end. In the centre file of which stood a monstrous queen with a spreading crown of spikes, appearing to rise out of the picture they were quite a bit taller than any of the other pieces, including the two kings that flanked each one. For some reason black's pawns were three dimensional while white's laid flat. As you stared closer into it you realised that many squares shaded white into black. Lesser details like the strange hybrid bishops and the half-round, half-square rooks have faded in my memory. I haven't tried it since.
(The knights were surprisingly normal.)
Much misunderstood.
Was it really proposed as evidence of AI, or the idea that a naive human could be fooled by a chat bot? If it's about fooling a naive human, then the proposition has been true since the 1960s, but by programs that have almost no practical value at all. What cruel executive decided "chatbots" could be used for customer support? That should be a crime with a jail term. So much frustration caused to so many.
I asked Google's AI to create an image of chickens bowling. I got back a normal looking bowling lane but with chickens facing the wrong way, some holding bowling balls, random pins everywhere, but with a nice barnyard mural on the back wall while sheep and cows sitting on bleachers spectated. Everything was drawn very Far Side-esque (with apologies to Gary Larson) All in all a cool image, but not resembling actual bowling.
I just tried your query against ChatGPT to make an image of a chess opening board, it's hilarious. It's 8x7, with squares labelled A-H across the bottom but on the left and right sides it's got numbers 5,2,4,5,6,7 and blank. The pieces look weird, like the knights are mixed with rooks. And it seems like white has 2 queens whilst black has 2 kings.
> Talk a good game
AI Overlord could refuse to play with a Real Life board - we'd have to play with the board that existed inside the alleged mind of the AI at any given moment. Not an 8x8 board? Too bad. Two white kings? Sounds like mediæval England: there's precedent!
They are great at OCR and denoising images. Just... don't ask it to do anything that involves reasoning or thinking because it can't do either. It can simulate something like it if the problem space is small enough, but like Apple found out, it can do N disk Tower of Hanoi then completely fail at N+1 disk Tower of Hanoi because it's just brute forcing, not reasoning at all.
Nobody knows why, but it seems likely that some chess games snuck their way in the training corpus.
See https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/
gpt-3.5-turbo-instruct is still available: https://platform.openai.com/docs/models/gpt-3.5-turbo?snapshot=gpt-3.5-turbo-instruct
As Microchess is estimated at 1200 elo and Turbo Instruct at 1750 elo, I suspect that would be a better fight. Make sure to use PGN text.
The worst thing about the 'AI' replacing people at work thing is that the very obvious best people to replace with an LLM are managers and corporate execs (I swear you could replace 75% of them with ChatGPT and nobody could tell), but no, they have to go after the people who are actually doing work.
This post has been deleted by its author
I guess I am confused as to why anyone would think LLMs would be good at chess? They are mediocre at most things, at least competent at writing (which is arguably the thing they are designed to do), and terrible at anything technical (at least my field). I feel LLMs are jack of all trades but master of none.
Not insulting the researcher, at all. Reasearch like this is useful. But anyone's surprise to this would be like them being surprised a riding lawn mower is bad at haircuts. It could theoretically do it but isn't really designed to do so.
A ML model that is designed to compete in chess or work in my field would be / is far more interesting that any LLM. I guess that is why I don't understand all the effort to make LLMs, they are kind of shite at everything when models for specific applications would actually be more interesting and useful.
I asked Gemini and it provided some weird image with 30 black pieces adjacent to each other and only 14 white pieces.
So I pointed out it's error. It came back with this:
I apologize for the inaccuracies in the previous response. You are absolutely correct, and I appreciate you pointing out my mistakes.
My apologies for the incorrect image and the assertion that the request was completed without providing references.
Let's correct this.
Revised Chessboard Setup:
You are right, a standard chess game has 16 pieces per side, not 44 in total, and they are indeed positioned on opposite sides of the board.
I am unable to directly generate or redraw images at this moment, but I can describe the correct setup clearly for you, which is fundamental to the game of chess:
... it then goes on to correctly describe the opening position board.
When I asked why it couldn't redraw it said:
Currently, my ability to generate images is distinct from my ability to process and re-edit them. When I use the image generation tool, I send a prompt (a text description) to a separate system that then creates the image from scratch. I don't "see" the image in a way that allows me to manipulate its pixels or move objects around within it, like a human using a graphics editor would.
Think of it this way: I can ask an artist to paint a picture based on my description, but if the artist paints something slightly off, I can't then grab their brush and fix it myself. I would have to ask them to paint a new picture with a corrected description.
So, when an image I generate isn't quite right, I can only generate a new one based on a refined or corrected description, rather than directly modifying the existing one. It's a limitation of how my tools are integrated.
So it "knows" the answer from references it, it just can't describe them fully to the image generator to create an accurate image.
I don't get why this is surprising - chess isn't a game based on language.
I'm often quite impressed by how AIs can summarise documents and tidy up text, despite being an AI sceptic. But I don't expect something based on combining words in a logical way to be able to do the same for chess moves.
Crosswords, riddles, stuff like that they should be brilliant at.
Copilot and ChatGPT got schooled by an Atari 2600’s Video Chess, proving their “grandmaster” swagger is just a 128-byte bluff! These LLMs played like pawns drunk on pixelated wine, blundering into checkmate faster than you can say “insert cartridge.” Stick to crosswords, boys—your chess game’s so weak, it’s practically a knight-mare!