back to article Artificial intelligence... or advanced imitation? How DeepMind used YouTube vids to train game-beating Atari bot

DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Typically, for this sort of research, you'd use a technique called reinforcement learning. This is a popular approach in machine learning that trains bots to perform a specific task, such as playing …

  1. JassMan

    Better hope it doesn't gain sentience

    "DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos."

    While everyone enjoys watching a youtube vid every now and then, being forced to watch vids of nothing but games machines for your entire life will surely be construed as cruel and unusual punishment. If this this AI gains sentience it is going to be well pissed off, if not totally insane. And, we all know how dangerous an insane AI can be.

    1. Anonymous Coward
      Anonymous Coward

      Re: Better hope it doesn't gain sentience

      " being forced to watch vids of nothing but games machines for your entire life will surely be construed as cruel and unusual punishment."

      Or, what passes for a normal childhood nowadays (given the popularity of PewDiePie)

    2. SVV

      Re: Better hope it doesn't gain sentience

      Not to worry, when it gains sentence it will switch off its' television set and go and do something less boring instead.

    3. phuzz Silver badge

      Re: Better hope it doesn't gain sentience

      No news yet whether it asked for just a text based summary rather than having to sit through a 10 minute video just to find out how to do one small task.

    4. DropBear

      Re: Better hope it doesn't gain sentience

      "If this this AI gains sentience it is going to be well pissed off, if not totally insane. And, we all know how dangerous an insane AI can be."

      So, you're basically saying it will be perfectly indistinguishable by any means from (and precisely as dangerous as) the average YouTube commenter...?

      1. Peter2 Silver badge

        Re: Better hope it doesn't gain sentience

        No! It will look like AmanfromMars.

  2. Jay Lenovo

    Deepmind, your human name shall be PewDiePie

    If Youtube watching works as well as twitter teaching chatbots, Deepmind will be quite an inflammatory old school gamer.

    Not everything on Youtube is ideal for programmatic consumption.

  3. Neoc


    All they are teaching the "AI" (and I will use the term in quotes) is how to replicate what someone else has done. Which makes it an Expert System at best, not an AI. Yes, humans learn by being taught how to do things - but the big difference is that we then adapt what we have learned to new situations. So (to use the ever-beloved car analogy) while, yes, I was taught to drive a car by an experienced driver while driving though a specific set of streets over-and-over again, I am able to use this knowledge to drive through just about any street network I find myself in. Except Prague. I swear I almost had a nervous breakdown trying to drive through Prague 1.

    So no, this is not in any way "teaching an AI", this is "teaching a robot to repeat the same thing over and over again" which we already have in industrial systems where robot welders (for example) are manual guided through their tasks a few times and are then let loose.

    1. veti Silver badge

      Re: Pointless

      Incorrect, there are AIs that can invent for themselves new ways to solve games that may vastly improve on what any human has achieved.


      1. Lee D Silver badge

        Re: Pointless

        That's STILL NOT AI.

        That's just a brute-force search on a buggy implementation of a single port of the game. It's like saying "you learned how to win at football" when you played on a pitch at your local park where you could kick the ball off the local doctor's surgery, having learned the point on the wall that makes it bounce above the goalie and score a goal in an oversized goal. The results are not transferable, they aren't "intelligent" (they just tried every possible direction) and it's certainly not learning or inventing.

        Inventing is a matter of "skipping over the missing step". You don't need to learn every possible draughts/checkers opening, if you are intelligent. You can sit, and based on a limited knowledge of the rules, no database, and no brute-force, you can "infer" a good position/move. That's intelligence. Just trying every possible move is not intelligence.

        We don't have AI. Even these things aren't learning - you couldn't put one trained on one game to the other game and make progress. The progress is logarithmic... it achieves a quick result and then plateaus and resists all further learning. You couldn't then train on ANY OTHER GAME and get a viable result. AI researchers know this, which is why they always announce the FRESH result of a clean AI, and then nothing else. No AI is ever "taught after". They know not to do it, because they know it's a disaster to try. And single-purpose AI's plateau quickly and have very limited scope.

        This one not only had to be hand-held in terms of watching humans play, it had to be programmed with explicit rewards related to that ("did you end up in a similar place to a human playing?"), taught how to interpret the screen, and then trained on very short sections of particular games.

        This is expert systems and heuristics (human written rules). All AI we have is expert systems and heuristics. The closest you get to "AI" (as in something that learns for itself) are genetic algorithms and the like - they tend to be VERY hard to understand, direct, train and get results from, but their "insights" are gained organically and without much outside help once the universe they live in is defined. But even they have human-tuned breeding rules.

        The most impressive "AI" I ever saw was a Java-based physics simulation of a bunch of joints connected in a vaguely skeletal way, with joint movement individually controlled by a GA. Someone had put it up on their university home area (back when everyone had a home folder /~username, and webspace on their uni account). It was "rewarded" by the distance it could achieve from the starting point in a given time. The "course" was randomly generated to have hills and dips. After something like 2000 generations of genetic breeding, it could epileptic-fit itself across the screen and make some kind of progress (before eventually getting stuck or reversing).

        After 10,000 generations, it could form a hop-and-a-skip. After a million generations, it almost began to resemble chimp-like four-limb running. Given it was Java and the 90's, that would be months of calculation, physics simulation, breeding, etc. though (luckily you could export the generations and reseed from a certain point). It would never, no matter how long it was left running, form a consistent stable gait. It was just pretty much random twitches at timed intervals that by chance happened to get it so far until it stumbled.

        It was never "intelligent". It never looked ahead and inferred how best to achieve its task, or change its tack based on the terrain. But it was damned impressive (and it's not been on the web for years now, I know, I've looked for it). And despite having a computer science degree, and having friends with PhDs in computer vision, etc. that's the closest I've ever seen to anything AI - anything changing to solve the problem at hand.

        All "AI" is similar. It's either a human telling it exactly what to do and when, or random chance, or brute force. The combination of all three tends to mask the use of any one, but it doesn't form intelligence in even the most primitive way.

        Just because our kids learn using YouTube videos, does not mean that showing a YouTube video to a heuristic system is "learning".

    2. David 164

      Re: Pointless

      An I bet Deepmind often tests how it AI does when it introduce to a new game straight off without any training of that game. They will get there in time, probably by combining many of these approaches and others into a single neural design.

  4. Anonymous Coward
    Anonymous Coward

    AI: 50% Silicon Snake Oil Salesmanship

    and 50% Marketing Bollocks.

  5. o p


    It looks they just copy the actions of a player making the best moves, and compare the result with the "average" player: the reward is based on a comparaison to the frame of the best move at the same time for the same game.

    A real "learning" would consist in training on a variety of games but performing on a different dataset ( different games ). For me it looks like they just overfit on a specific game.

    1. Pascal Monett Silver badge

      Re: overfit?

      And that is all they ever do. We hear about AI, but none of their computers are capable of doing anything else than what they have specifically been trained for.

      A true AI would look at one video, then play the game several times to get the hand of it. Then it would go play a different game, maybe without watching anything. Then it would surf YouTube for a while then it would read a book. Then you'd ask it a question and it would say "Can't you see I'm busy ?".

      1. Charlie Clark Silver badge

        Re: overfit?

        A true AI would look at one video, then play the game several times to get the hand of it.

        Have you ever seen kids learning like that? I see lots of very basic repetition.

        Just as we don't need to overvalue Machine Learning, there's no need denigrate it at every opportunity. Personally, I can lots of opportunities for this kind of technology in taking over routine and repetitive tasks currently performed by people.

  6. Evil Auditor Silver badge


    taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos

    That's all cool stuff but someone seriously believes they can get intelligent by watching YouTube videos? What did this world come to...

    1. Lee D Silver badge

      Re: Delusion

      I happen to think that video is the WORST POSSIBLE WAY to learn. I've thought that even since the "OU on BBC2" days. It's slow-paced, information-sparse yet data-heavy, unsearchable, etc.

      Kids do watch YouTube to learn things but they are just sitting watching YouTube, not learning. I would watch YouTube to learn, say, how to unscrew that odd bit on my car that I can't visualise from the instructions. But my primary reference is never YouTube.

      It's extremely annoying (and worrying) that when you search for the simplest of things, say, how to do X in Windows, how to change a battery in this item, how to repair a part of a laptop etc. that the top hits are all YouTube videos with 2 minutes of advert, 10 minutes of a bloke waffling, and about 15 frames of the actual thing you need to see in order to do it. Because they are top of the search, people ARE using those as a reference, or they wouldn't be at the top.

      Sometimes you need video. But my rule is simple:

      A picture paints a thousand words

      A video paints a thousand pictures

      (P.S. also a good rule of thumb for file sizes with people who don't understand why they can't email a 4 hour video...).

      Hiding the information for "where to go on your game to find the secret level" in a video is a thousand times more information than necessary. A screenshot on a page would do.

      Sadly, children are losing the ability to "learn"... which involves doing yourself (if everyone who just saw/read something became an expert, then we would all be geniuses already), repeating, and not just copying what someone else did.

      There is an entire generation that, because of our own generation's surprise at being able to search for core information and finding it on any subject, believe that all you need do is search and watch someone else to understand something entirely. We've taught kids that "finding a way to access information is the same as knowing everything" (don't even get me started on VERIFYING information).

      I actually find the trend of children WATCHING other people play computer games for hours on end incredibly disturbing... it's like a whole level of laziness above "my kids just sits there playing games". They are losing their reasoning skills, they are losing self-exploration, and they are losing the desire to actually learn and hypothesise, and replacing it with some Borg-like hive-mind where only one person actually knows how to do something and everyone else just watches him.

      That someone thinks this is a good model to train an AI on "because that's how we learn" shows you how ingrained that culture has become in such a short amount of time.

  7. SkippyBing

    'These sorts of rewards are therefore described as sparse: each of the steps involved to obtain the reward appears to achieve very little'

    So kind of like real life then? But over generous with the rewards.

  8. Claptrap314 Silver badge

    This is fraud

    The fact that the "AI" system is being rewarded for matching the state of the player at checkpoints means that there is nothing meaningful going on in terms of intelligence.

    1. David 164

      Re: This is fraud

      Except that it exactly how humans operate, except our reward is often chemical signals in the brain that makes us feel happy and a sense of euphoria when we manage to repeat the actions of our favourite Youtube star.

  9. gurugeorge

    My first degree was computer Sci & math and I focused on machine learning. All the comments above say this is overhyped, but missing one critical thing. The algorithm can play multiple games, and train itself from YouTube, or by playing with trial and error, like a human. My third degree was medicine and I work as a doctor at a London hospital now (for those complaining about IT salaries, after years of Med school, I now earn less per hour than both the cleaner or Ward admin who both don't have a degree - in fact my salary after taxes Ni Pensions etc doesn't cover my rent and I would be much better off on benefits but I like the job).

    Ahem regardless if you look at what deep mind are doing in my hospital with the Moorfields eye clinic they are beating the consultants with 30 years of experience At interpreting retinal scans. These are perhaps some of the most complicated images to interpret - and yes the training started with imitation but now goes far beyond that. It actually has an understanding of how The images are interpreted, just like a human, but faster and better.

    1. David 164

      This is the first time many of us are hearing that humans are inferior at certain complex, intelligence related tasks, it should come as no surprise that many are refusing to accept it, are writing it off as overhyped.

      One thing I will say is through is that so far Deepmind seem to need to train a new neural net for each game using this method and other methods. They haven't yet it seem to have manage to get one neural net to be able to play multiple games and switch back and forth between games. I also haven't seen them show a neural net get better at learning new games as it gains experience from playing games.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like