back to article AI racks up insane high scores after finding bug in ancient video game

An AI bot managed to exploit a hidden bug in popular 1980s arcade game Q*bert to rack up impossibly high scores after being programmed using evolution strategy algorithms. Evolution strategies (ES) offer an alternative to the more traditional reinforcement learning (RL) methods that have been used to train machines to beat …

  1. IceC0ld Silver badge

    If that was the case, then it would be extremely hard for standard RL to find the bug: if you use incremental rewards you will learn strategies that quickly yield some reward, rather than learning strategies that don't yield much reward for a while and then suddenly win big.

    yea, that's the real reason I am so pants at ANY game, I try to go big too soon, should have floundered about a lot more :o)

  2. sloshnmosh


    I guarantee you that no AI/quantum computing could EVER beat me at Asteroids or Stargate!

    (showing my age)

    1. chivo243 Silver badge

      Re: Ha!


      Loved Stargate Defender! Wasn't to shabby at Robotron 2084 either. Bring on Galaxian! Xybots was also fun!

      1. sloshnmosh

        Re: Ha!

        I loved the fact that in Defender Stargate the high score wasn't limited to just putting in your 3 initials, you were able to write almost a full sentence next to your high score.

        I was in heavy competition with another intense Stargate master and we would write terrible things about each other when we beat the others high score.

        My friend and I were in a (real) Asteroids competition that was sponsored by Atari.

        The owner of the local convenience store where the Asteroids machines were located kept a log of our scores and hours that we played.

        My Asteroids partner and I would have so many extra lives (ships) stored in memory that the Asteroids machine would start glitching out and the rocks (asteroids) would sometimes become detached at their vector points.

        Atari had to replace 2 of the Asteroid consoles due to memory corruption.

        My Asteroids partner and I had to forfeit the competition because our parents wouldn't allow us to miss school for the playoffs.

        An older guy won the competition and was awarded a brand new stand-up Asteroids Deluxe machine.

        (I never did like Asteroids Deluxe)

  3. JakeMS

    Typical Losers

    You must have cheated!

    In this case, the AI must have cheated! It can not be the case it simply kicked your butt.

    #JokeAlert (in case someone doesn't see icon on mobile...)

  4. Bob Wheeler


    .. that the AI found a 'bug', or otherwise know as a cheat - as in acting outside of the published rules of the game to achieve it's aim of a high score.

    Now if the game is to 'maximise food stores', the AI might discover the optimum cheat is to kill everything that eats food.

    1. S4qFBxkFFg

      Re: Interesting..

      "Now if the game is to 'maximise food stores', the AI might discover the optimum cheat is to kill everything that eats food."

      You're overthinking it; anything that eats food is food. A carelessly developed foraging robot would immediately kill its creators, and diligently gut/bone/joint/render/refrigerate them.

      1. James Loughner

        Re: Interesting..

        Logan's Run.

        1. Michael Wojcik Silver badge

          Re: Interesting..

          Fish, plankton, sea greens... protein from the sea! Wait for the winds. Then my birds sing. Overwhelming, am I not?

  5. poohbear

    Not the real thing

    That's not the real Q*Bert ... looks like some sort of cheap clone.

    1. juice Silver badge

      Re: Not the real thing

      At a glance, it looks like the Atari 2600 version - i.e. it's a cut-down and simplified port.

      Oddly, that's pretty much the same as happened a while ago with Microsoft - they made a similar fuss about how amazing their AI training was, because it was able to beat the Atari 2600 version of Ms Pacman ...

      1. Anonymous Coward
        Anonymous Coward

        Re: Not the real thing

        Yes, that's the 2600 version as far as I can tell; the slightly off appearance of the boxes is (presumably) due to limitations in the graphics architecture. (#)

        Ultimately, the fact it's the 2600 version rather than the original arcade isn't that important though- the story isn't "someone found a new feature in Q*Bert" or even Q*Bert per se, it's that the AI found and exploited a previously unknown bug in a game, regardless of what the game was or what platform it was running on.

        (#) I note that the coloured tops of the boxes are on separate screen lines to the sides, which is (I'm guessing) due to the VCS' limitation on the number of objects that'll fit a single line.

        1. juice Silver badge

          Re: Not the real thing

          But even that's not particularly exciting. Back when I was tinkering with Palm development, there was an app called Gremlins which would randomly hammer all the buttons to try and trigger a failure[*]. This is just a variant on that, albeit with a guiding hand to steer the button-hammering towards a specific desired result.

          What they've essentially proved is that if you set a specific and limited goal in a deterministic system and then throw infinite monkeys at it, you'll probably get some wierd outlier results. Which has already been proven, time and time again.

          [*] Looks like variations on this theme are still available - e.g.

          1. Michael Wojcik Silver badge

            Re: Not the real thing

            This is just a variant on that, albeit with a guiding hand to steer the button-hammering towards a specific desired result.

            That's like saying an automobile is just a variant on a wheel. Arguably true, but manages to miss the point entirely.

            All ML algorithms that incorporate stochastic processes are "variants" of "randomly hammer[ing] all the buttons". So what? All computable functions1 are variants of - take your pick - switching signals, integer arithmetic, lambda calculus, Turing Machines, Post Machines, compression, 2PDAs, 2-tag systems, etc (and all actual implementations of computable functions in machines are not formally more powerful than DFAs, since a time-space-restricted UTM or equivalent can be converted to a DFA simply by enumerating all its possible states).

            What matters is what you do with it, and how much it compresses and optimizes its state space.

            1Assuming the Church-Turing Thesis holds.

      2. Unicornpiss Silver badge

        Atari 2600..

        So it took a mere 400 modern CPUs, using no doubt many gigs or even terabytes of memory and storage many hours to find and exploit a flaw in a 35+ year old video game. A mediocre port of an arcade game that runs on a stripped-down version of the venerable 6502 CPU in under 4KB memory space. Impressive.

        Not to totally minimize the achievement, as this technology is in its infancy, and it did find something that apparently generations of old school gamers hadn't found. Based on this, maybe this tech would be great for finding software bugs. Let it play with the next gen MS Office suite for a week or so before Microsoft releases it. Pretty Please.

        1. onefang

          Re: Atari 2600..

          "Let it play with the next gen MS Office suite for a week or so before Microsoft releases it."

          The end result will be Microsoft copyrighting new versions of Shakespeare created "on a computer", and an infinite number of unemployed monkeys.

    2. agurney

      Re: Not the real thing

      The BBC's article has the following:

      "Rather than the original, the researchers used an updated version of the game, and seven others, to make it easier for their AI creation to try out different strategies."

  6. juice Silver badge

    Not that exciting...

    This sounds like a variation on the standard AI training technique: start off with a set of random choices, pick the ones with the best results, mutate things a little and repeat until you reach a given level of performance.

    It's infinite monkeys poking away at an infinite number of keyboards - except that since we don't have an infinite supply of monkeys (no matter how it seems when I glance at newspaper forum posts), we take a finite set of monkeys and nudge them in the direction we want them to go.

    It wasn't new when I learned about it at university *ahem* decades ago, and it's not new now. The only real difference is that we can throw more monkeys at the problem than we used to be able to.

    Still, it's a good reminder that "life"[*] will always find a way to game the system - from memory, one of the best examples of this type of experiment involved building an oscillator with the fewest parts possible - however, instead of building a timing circuit, the "AI" built a radio receiver and piggybacked onto the signal from a nearby computer...

    Though by the same token, this also highlights the issue with "evolutionary" approaches like this; if the the computer was further away or switched off, the circuit would have failed. Similarly for this experiment - a different ROM version or a different map would likely cause this "hack" to fail.

    It's not AI, it's mechanical single-action optimisation, and as such is highly susceptible to Darwinism if conditions change.

    [*] Life, scammers, investment algorithms; if there's a way to get an advantage over your competitors, then sooner or later it'll be used!

    1. Michael Wojcik Silver badge

      Re: Not that exciting...

      It wasn't new when I learned about it at university *ahem* decades ago, and it's not new now

      You've read the paper and confirmed there's no new work here, eh? Care to expand on that?

      Ah, the Register readership. So much brighter than anyone doing actual research.

  7. Anonymous Coward
    Anonymous Coward

    400 CPU's expensive?

    at standard instance pricing EC2 m5.24xlarge with 96 vCPU's is around a fiver, for one hour - dropping to 60p if you can use spot pricing and dont care when the learning run is run.

    so you can probably get 400 CPU's rented for the same price as a helping of fish n' chips, (except at Christmas)

    I agree that it's not AI , it's just ML

    1. Deltics

      Re: 400 CPU's expensive?

      Even if it is ML, is it useful ML ?

      I put my hand in a flame and it hurts. You could say that I learn from this not to put my hand in the fire. But if I only apply that to the original fire, and try putting my hand in a different fire then the learning was not especially effective, and this is the type of "learning" that this form of ML provides. Only once the machine has put it's hand in ALL fires does it "learn" that "all fire is hot".

      Weighting outcomes can give the impression of learning before "ALL" fires have been sampled, but the machine still cannot explain why it has reached the conclusion that fire is hot or indeed why.

      A machine also cannot make the intuitive leap to the knowledge that fire being hot means that may also be used to boil water or cook food, given equipment suitably adapted to the task.

      Machines don't do that sort of learning. Not AI. Not ML. Not any specific algorithm, model or approach to such things.

      The day they do, "Machine Learning" or "Artificial Intelligent" become meaningless since a truly learning machine is not a machine and true intelligence is intelligence, not artificially so.

  8. MiguelC Silver badge


    That name was surely obtained through ES, because no RL algorithm would ever get there...

    1. TheRealRoland

      Re: "Chrabaszcz"

      Nicknamed "Scrabble" ?

      Actually: Marc Rzepczynski's (MLB Pitcher) nickname...

    2. PM.

      Re: "Chrabaszcz"

      Amusingly , "chrabaszcz" means "Beetle" in Polish,and beetle IS a bug , right ? Pozdrowienia z Polski :-)

  9. andy gibson

    But when he got the high score

    Did he type his name in the high score table as POO or ASS?

  10. wayne 8

    "or even looses a life"

    "or even looses a life"

    The lose/loose bug is going to be around forever.

  11. Doctor Evil

    Evolutionary Stategy

    So that's what they're calling genetic algorithms now. Huh.

    (showing my age too, I guess)

    1. Random Q Hacker

      Re: Evolutionary Stategy

      Like Restful queries used to be called GET queries, or Material design used to be called flat, lazy, and boring. Kids these days...

  12. The Oncoming Scorn Silver badge


    The nice thing about MAME (Apart from no longer needing a bottomless pocket of 10p pieces or to get another pint with 36p in change from a quid note) is that I can take the time to develop strategies or notice patterns of machine behaviour's that wasn't available to the 17 year old version of me in the early 80's, busy drinking beer & feeding the machines in the games room of The Seven Stars.

    Excuse me I feel the need for a quick game of Bagman.....

  13. spambi

    what a load of crap

    So basically, they failed so hard they weren't even able to port the game properly, and then wrote a paper about it.

  14. Sven Coenye

    What the f*ck

    Shirley you meant "What the @!#?@"

    1. Unicornpiss Silver badge

      Re: What the f*ck

      Don't call me Shirley..

  15. JeffyPoooh

    Optimization, random searching, local maximums, odds

    Excuse me, YAWN.

    The essential points in this news item could have been written in the 1970s or early 1980s.

    There has been endless work done on various approaches to randomly searching the nearly-infinite solution space, desperately looking for the global optimum.

    Climbing peaks, setting aside a fraction of the next generation of child proto-solutions to explore more distant options, while the others continue to climb the local hill. "Ooh, let's make the binary string into conceptual DNA, and perform string sex!" There's still a good chance that you've settled on a local peak, missing the tiny width but towering height global optimum just over there.

    Yawn. Welcome to ancient history. This is all 1970s or 1980s tech, just a frosting of Moore's Law.

    If you want to impress me, then how about using these machine learning approaches to determine the optimum algorithm to manage the machine learning itself? Last I've heard, they're still essentially hand coding the algorithms that manage the machine learning itself.

    If anyone had bothered to apply some recursive conceptualization (just one example), then A.I. wouldn't be stuck in the disco era forever.

    2018 GOTO 1978


POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020