back to article Microsoft Copilot joins ChatGPT at the feet of the mighty Atari 2600 Video Chess

Not content with humiliating ChatGPT at the hands of Video Chess on an Atari 2600 emulator, Robert Caruso has tried again, this time with Microsoft's Copilot. Theoretically, the result would be the same, and Copilot would take a similar drubbing. Yet... what if Copilot triumphed where ChatGPT could not? "There's no reason to …

  1. wolfetone Silver badge

    To be fair this has made me feel better about myself.

    I'm shit at chess, so I'm glad my eventual replacements are equally as shit at it too.

    1. Anonymous Coward
      Anonymous Coward

      ah, painful memories...

      My neighbor across the street (hi Ms Sandi) and her family had an Atari in 1983, and I could never beat any of the games: Chess, Frogger, Indiana Jones, Combat, or any of the others for that matter.

      I didn't get good ay any game until Hard Drivin' in 1989 where I could write sentences on the high score board. I loved that game, and the redhead (Tally) who ran the arcade -- pity its all been downhill since then.

      1. David 132 Silver badge

        Re: ah, painful memories...

        I used to absolutely thrash my younger brother at Combat, whether it was the tanks or the biplanes.

        Made up slightly for the fact that he thrashed me at everything else!

      2. Andrew Scott Bronze badge

        Re: ah, painful memories...

        missile command, only game i could ever get past the top level and force to start over. not as much fun as star raiders, but i couldn't afford it.

    2. redpawn
      Pint

      Video Chess would occasionally cheat!

      The program was great great on the 2600. The game console had very little RAM, If memory serves, about 64 Bytes though there might have been a bit more on the ROM cartridge. When it was the 2600's turn, the screen would flash colors as it needed the memory space for the screen in order to figure out its move. The result of this was that you got to stare at colors flashing on the screen and not the board while it thought. Higher levels blanked out the screen sometimes for minutes. I was told by another player that it would sometimes cheat, which I doubted at the time. Afterwards I made closer note of the pieces on the screen. Randomly a piece would appear on the board for Video Chess, maybe once in 20 or so games, which had been previously taken by me. One time a new queen spawned for the 2600 and not by a pawn making it to the last rank against me.

      The program also allowed the user to preset the starting position. If one set up two rows of opposing Kings then started play, the screen would turn grey and lose sync and it would make a high pitched whine from the TV's speaker.

      Great fun and not half bad at playing, even when it didn't cheat!

      1. juice

        Re: Video Chess would occasionally cheat!

        > The game console had very little RAM, If memory serves, about 64 Bytes though there might have been a bit more on the ROM cartridge

        This does make me wonder about how well these LLMs would do against the infamous 1k ZX Chess. Which as the name suggests, ran on an unexpanded zx81 with just 1k of ram!

        https://en.wikipedia.org/wiki/1K_ZX_Chess

        Admittedly, the zx81 had a faster CPU and around 400 bytes of RAM to spare, but still...

        [*] 3.25mhz z80 versus 1.19mhz 6507, though differences in architecture meant that the speed delta was probably only around 20%, given what I remember of similar speed-comparisons from C64 vs Speccy flamewars in the days of yore

      2. redpawn

        Re: Video Chess would occasionally cheat!

        Looked it up, the Atari 2600 had not 64 Bytes but 128 Bytes of RAM. No wonder it often beat me!

      3. This post has been deleted by its author

  2. Neil Barnes Silver badge

    Let's hear it for...

    Sargon!

    1. Anonymous Coward
      Anonymous Coward

      Re: Let's hear it for...

      Am I reading 'El Reg' through a broken 'Stargate' or something !!!???

      Sargon Chess was/is so so far back in time, I can barely remember it !!!

      Was it any good ???

      I cannot remember !!!

      :)

      1. Neil Barnes Silver badge

        Re: Let's hear it for...

        I still have the book with the Z80 source code on my bookshelf. Sadly, I never managed to get it ported to 6502...

        1. Strahd Ivarius Silver badge

          Re: Let's hear it for...

          There was Sargon II on Apple ][

    2. spireite
      Joke

      Re: Let's hear it for...

      Vietnamese developers?

  3. cyberdemon Silver badge
    Alien

    Horsey takes King Prawn

    Turns out both ChatGPT and Copilot have the IQ of 10,000 PE teachers?

    1. davefb

      Re: Horsey takes King Prawn

      APRIL FOOL!

      1. wolfetone Silver badge

        Re: Horsey takes King Prawn

        We're talking jape of the decade...

        1. theOtherJT Silver badge

          Re: Horsey takes King Prawn

          We are talking April, May, June, July AND August fool...

          1. NoizeBoy

            Re: Horsey takes King Prawn

            Wheeze of the week…

    2. Anonymous Coward
      Anonymous Coward

      Re: Horsey takes King Prawn

      What kind of school did you attend? My physics teacher (dr/coach whatshisname) also coached the varsity football team.

      1. cyberdemon Silver badge
        Facepalm

        Re: Horsey takes King Prawn

        for the yanks

      2. Benegesserict Cumbersomberbatch Silver badge

        Re: Horsey takes King Prawn

        My high school Biology teacher was also coach of the rugby team. That was because he was from South Africa and thus the only one who knew how to play.

        1. Ian Johnston Silver badge

          Re: Horsey takes King Prawn

          My history teacher also coached the cross country running team. Well, not so much "coached" as "turned up at the end of practice and watched the boys in the showers", but much the same thing, really.

          1. ThomH Silver badge

            Re: Horsey takes King Prawn

            I guess that hitting El Reg's "report abuse" button probably won't do the right thing in this circumstance?

          2. David 132 Silver badge

            Re: Horsey takes King Prawn

            Hmm… I wonder if we went to the same school? My history teacher had a very bad syrup-of-figs and a slightly creepy manner around us young adolescents…

      3. Not Yb Silver badge

        Re: Horsey takes King Prawn

        Not everyone gets to go to the Colorado School of Mines...

    3. spireite

      Re: Horsey takes King Prawn

      Both AI engines were really poor, they were langoustine in their failure.

  4. original_rwg

    Battle chess

    I imagine if there could be a physical manifestation of these two A.I.'s, neither would stand a chance in a game of battle chess. I expect they might show all the physical agility of the robots playing football in this short clip https://www.bbc.co.uk/news/videos/c5ylkyrkjnzo

    The future is looking so bright! </sarcasm>

    1. Anonymous Coward
      Anonymous Coward

      Re: Battle chess

      The future's so shite, I gotta wear braids.

    2. LBJsPNS Silver badge

      Re: Battle chess

      Yeah, we're all screwed, but that clip was hilarious.

    3. Antony Shepherd

      Re: Battle chess

      The best part of the robot football match was when one of the robots took a dive in the penalty area!

      So realistic!

    4. HappyDog

      Re: Battle chess

      Couldn't help thinking about Father Ted when watching the robots https://www.youtube.com/watch?v=4Z7ONzqeoTE

  5. Eye Know

    It reminds me of ASIC chips

    You can be a jack of all tasks and a master of none, or a specific piece of software that operates very efficiently.

  6. Phil Miesle

    Reasoning…

    o4-mini , when challenged, determined it would use python-chess to manage board moves and update state, maintaining FEN string in context.

    I suspect a tool-capable reasoning model would beat the 2600 :)

  7. ibmalone

    "It claimed it could think 10–15 moves ahead — but figured it would stick to 3–5 moves against the 2600 because it makes 'suboptimal moves' that it 'could capitalize on... rather than obsess over deep calculations.'"

    Imagine, if you will, someone on reddit or a comments section (maybe even here), or perhaps usenet (RIP), spouting off about something they have only a passing knowledge of. Now imagine that you're Alan Turing and you're attempting to distinguish between that and what we see above.

    As for Copilot and chess, I conducted quite a different experiment recently, as our work 365 subscription now includes it. I asked for a t-shirt design with a particular chess opening on it and a specific text. Obviously it failed, first producing a kind of Etsy-esque view of half a design alongside half a t-shirt with a similar design (the design in question of course not being what I'd asked for).

    After managing to refine to just giving me the print image, but getting ever further away from anything resembling a chess board as opposed to an assortment of chess themed images, I asked if it could just give me an image of a chess board in the starting position. There should at least be a good number in the training data right? What I got back is be best described as Howard Staunton's fever dream. The 9x10 board did have 2 rows of chess pieces at each end. In the centre file of which stood a monstrous queen with a spreading crown of spikes, appearing to rise out of the picture they were quite a bit taller than any of the other pieces, including the two kings that flanked each one. For some reason black's pawns were three dimensional while white's laid flat. As you stared closer into it you realised that many squares shaded white into black. Lesser details like the strange hybrid bishops and the half-round, half-square rooks have faded in my memory. I haven't tried it since.

    (The knights were surprisingly normal.)

    1. Michael H.F. Wilkinson Silver badge
      Happy

      Sicilian defense anyone?

      I now have this mental image of what might happen if you asked for an image of the Sicilian defense. Some strange Staunton kin/Mafia Don hybrid, perhaps, or mount Etna in the middle of the board

      1. ibmalone

        Re: Sicilian defense anyone?

        I originally asked for the Vienna opening, but it (disappointingly in hindsight) did not include Mozart or Midge Ure in the response.

        1. Paul Crawford Silver badge

          Re: Sicilian defense anyone?

          That meant nothing to me.

        2. LBJsPNS Silver badge

          Re: Sicilian defense anyone?

          How about sausages?

      2. stiine Silver badge

        Re: Sicilian defense anyone?

        Or Jean Reno from "22 Bullets".

      3. mr.e

        Re: Sicilian defense anyone?

        "Rookie mistake! You've played the frenchman's cumsock and in approximately 37 moves I'll have won!"

    2. Mage Silver badge

      Turing Test

      Much misunderstood.

      Was it really proposed as evidence of AI, or the idea that a naive human could be fooled by a chat bot? If it's about fooling a naive human, then the proposition has been true since the 1960s, but by programs that have almost no practical value at all. What cruel executive decided "chatbots" could be used for customer support? That should be a crime with a jail term. So much frustration caused to so many.

      1. steelpillow Silver badge
        Trollface

        Re: Turing Test

        I think the problem is that too many executives and PE instructors would fail a genuine Turing test. Does make it hard to tell an AI trained on their shit from the originators.

      2. The commentard formerly known as Mister_C
        Facepalm

        Re: Turing Test

        Unfortunately, if you're persistent enough to get past the chat bot then you end up with a talk bot, and their limited script set means that most will also fail the Turing test.

      3. Unicornpiss
        Meh

        Re: Naive humans

        Humans are easily duped. Without pointing fingers, look at what they will vote for.

    3. Unicornpiss

      For grins

      I asked Google's AI to create an image of chickens bowling. I got back a normal looking bowling lane but with chickens facing the wrong way, some holding bowling balls, random pins everywhere, but with a nice barnyard mural on the back wall while sheep and cows sitting on bleachers spectated. Everything was drawn very Far Side-esque (with apologies to Gary Larson) All in all a cool image, but not resembling actual bowling.

      1. LBJsPNS Silver badge

        Re: For grins

        Somehow I don't think actual chickens bowling would much resemble actual bowling either.

        1. Yet Another Anonymous coward Silver badge

          Re: For grins

          I don't know, they are pretty spherical

    4. Jedit Silver badge
      Stop

      "Now imagine that you're Alan Turing"

      The Turing test only applies to machines. It wasn't invalidated just because humans started failing it.

    5. Robin

      I just tried your query against ChatGPT to make an image of a chess opening board, it's hilarious. It's 8x7, with squares labelled A-H across the bottom but on the left and right sides it's got numbers 5,2,4,5,6,7 and blank. The pieces look weird, like the knights are mixed with rooks. And it seems like white has 2 queens whilst black has 2 kings.

  8. Mage Silver badge
    Alert

    LLMs good at some things.

    Other than boasting, (or advertising copy – is that the same thing?) what are LLMs good for?

    1. Dan 55 Silver badge

      Re: LLMs good at some things.

      I don't know, but boasting and hubris followed by failure seem a good match for management, perhaps why they think LLMs are great and are trying to foist them on the rest of us.

    2. Mishak Silver badge

      You make them sound...

      like an ideal replacement for Trump.

      Or has that already happened?

      1. steelpillow Silver badge
        Joke

        Re: You make them sound...

        Judging by Joe Biden's performance, Trump replaced an earlier generation of AI.

        1. Unicornpiss

          Re: You make them sound...

          ..except a chat bot has more sound reasoning than the Orange One.

      2. Mage Silver badge
        Terminator

        Re: You make them sound...

        Read Roland Perry's "Program for a Puppet", approx 1979!.

    3. Just Enough

      Re: LLMs good at some things.

      LLMs don't need to be very good at chess. They just needs to talk a good game and convince the gullible meat-sacks that it is very good at chess. This is how they will take over.

      1. Jonathan Richards 1 Silver badge

        Re: LLMs good at some things.

        > Talk a good game

        AI Overlord could refuse to play with a Real Life board - we'd have to play with the board that existed inside the alleged mind of the AI at any given moment. Not an 8x8 board? Too bad. Two white kings? Sounds like mediæval England: there's precedent!

      2. Mishak Silver badge

        It's kind of like a cat "fight"

        Most of it is noise and bluster, only rarely resulting in combat.

    4. sarusa Silver badge
      Devil

      Re: LLMs good at some things.

      They are great at OCR and denoising images. Just... don't ask it to do anything that involves reasoning or thinking because it can't do either. It can simulate something like it if the problem space is small enough, but like Apple found out, it can do N disk Tower of Hanoi then completely fail at N+1 disk Tower of Hanoi because it's just brute forcing, not reasoning at all.

      1. ChoHag Silver badge

        Re: LLMs good at some things.

        > It can simulate something like <thinking>

        To be fair, many humans cannot, even without *any* problem space.

      2. Mage Silver badge

        Re: great at OCR and denoising images.

        Maybe software branded as AI. Not LLMs as such. AI is a meaningless "catch-all" marketing term now.

  9. FeepingCreature

    gpt-3.5-turbo-instruct is the only LLM that was ever good at chess

    Nobody knows why, but it seems likely that some chess games snuck their way in the training corpus.

    See https://blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

    gpt-3.5-turbo-instruct is still available: https://platform.openai.com/docs/models/gpt-3.5-turbo?snapshot=gpt-3.5-turbo-instruct

    As Microchess is estimated at 1200 elo and Turbo Instruct at 1750 elo, I suspect that would be a better fight. Make sure to use PGN text.

  10. Anonymous Coward
    Anonymous Coward

    it could not keep track of the board.

    Yes, I recently had to diagnose an issue with ChatGPT and discovered that it "remembers" fuck all from previous conversations.

    If you ask it outright it will admit it. However if you aren't careful, it will lie about it.

    1. Anonymous Coward
      Anonymous Coward

      Re: it could not keep track of the board.

      Ah, my ex-wife. You found her, you keep her.

  11. vogon00

    "the absolute confidence with which an AI can spout nonsense"

    It sounds like some managers I know... I wonder who taught who?

    1. sarusa Silver badge

      The worst thing about the 'AI' replacing people at work thing is that the very obvious best people to replace with an LLM are managers and corporate execs (I swear you could replace 75% of them with ChatGPT and nobody could tell), but no, they have to go after the people who are actually doing work.

  12. Oh Homer

    LLMs are good at...

    1. Generating hype

    2. Being unintentionally funny

    3. Making lists

    Note: I am probably not an LLM.

  13. 0xF09F8687

    Searching for Deep Blue in all the wrong places.

  14. Unicornpiss
    Happy

    Jack of all trades and master of none?

    Instead of building AI data centers out of cutting-edge Nvidia chips and HPC, clearly they need to use intentionally crippled 6502s running at 1MHz or less for optimal performance.

    1. This post has been deleted by its author

  15. Anonymous Coward
    Anonymous Coward

    Pop goes the Bubble

    I can't wait for this shit to end.

    For a giggle I asked Meta AI to do some *really* simple tasks (ASCII codes to binary) and it kept answering that it didn't understand Albanian yet until I asked it why and told it the answers were nonsense.

    These things are shite.

  16. DrSunshine0104

    I guess I am confused as to why anyone would think LLMs would be good at chess? They are mediocre at most things, at least competent at writing (which is arguably the thing they are designed to do), and terrible at anything technical (at least my field). I feel LLMs are jack of all trades but master of none.

    Not insulting the researcher, at all. Reasearch like this is useful. But anyone's surprise to this would be like them being surprised a riding lawn mower is bad at haircuts. It could theoretically do it but isn't really designed to do so.

    A ML model that is designed to compete in chess or work in my field would be / is far more interesting that any LLM. I guess that is why I don't understand all the effort to make LLMs, they are kind of shite at everything when models for specific applications would actually be more interesting and useful.

  17. archie99

    Gemini got it wrong too.

    I asked Gemini and it provided some weird image with 30 black pieces adjacent to each other and only 14 white pieces.

    So I pointed out it's error. It came back with this:

    I apologize for the inaccuracies in the previous response. You are absolutely correct, and I appreciate you pointing out my mistakes.

    My apologies for the incorrect image and the assertion that the request was completed without providing references.

    Let's correct this.

    Revised Chessboard Setup:

    You are right, a standard chess game has 16 pieces per side, not 44 in total, and they are indeed positioned on opposite sides of the board.

    I am unable to directly generate or redraw images at this moment, but I can describe the correct setup clearly for you, which is fundamental to the game of chess:

    ... it then goes on to correctly describe the opening position board.

    When I asked why it couldn't redraw it said:

    Currently, my ability to generate images is distinct from my ability to process and re-edit them. When I use the image generation tool, I send a prompt (a text description) to a separate system that then creates the image from scratch. I don't "see" the image in a way that allows me to manipulate its pixels or move objects around within it, like a human using a graphics editor would.

    Think of it this way: I can ask an artist to paint a picture based on my description, but if the artist paints something slightly off, I can't then grab their brush and fix it myself. I would have to ask them to paint a new picture with a corrected description.

    So, when an image I generate isn't quite right, I can only generate a new one based on a refined or corrected description, rather than directly modifying the existing one. It's a limitation of how my tools are integrated.

    So it "knows" the answer from references it, it just can't describe them fully to the image generator to create an accurate image.

    1. Not Yb Silver badge

      Re: Gemini got it wrong too.

      No. It doesn't "know" anything. This is a stochastic (ie, somewhat random) lossy text decompression engine, not a knowledge base. Current AI systems will never "know" anything.

  18. Anonymous Coward
    Anonymous Coward

    Shall we play a game?

    I am not surprised whatsoever. Until recently, ChatGPT and Copilot could not beat the user in a simple game of tic tac toe. It was always the same problem, the LLM understood how to win, but failed to prevent the player from winning.

  19. rgrgrgrg

    Why is this surprising?

    I don't get why this is surprising - chess isn't a game based on language.

    I'm often quite impressed by how AIs can summarise documents and tidy up text, despite being an AI sceptic. But I don't expect something based on combining words in a logical way to be able to do the same for chess moves.

    Crosswords, riddles, stuff like that they should be brilliant at.

  20. BiffoTheBorg

    Copilot and ChatGPT got schooled by an Atari 2600’s Video Chess, proving their “grandmaster” swagger is just a 128-byte bluff! These LLMs played like pawns drunk on pixelated wine, blundering into checkmate faster than you can say “insert cartridge.” Stick to crosswords, boys—your chess game’s so weak, it’s practically a knight-mare!

    1. JulieM Silver badge

      Crosswords? Can ChatGPT really solve something like "Come late somehow to fashion disaster? (5, 3)" or "One bothered by radiation, we hear, in Cornwall (7)" ?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like