back to article DeepMind's latest protein-solving AI AlphaFold a step closer to cracking biology's 50-year conundrum

DeepMind says its AlphaFold machine-learning software can now rapidly predict the structure of proteins with decent accuracy, and could one day help us develop drugs faster. In its announcement on Monday, heralded as a scientific breakthrough by some, the Google stablemate claims to have solved a 50-year problem in biology: …

  1. Bitsminer Silver badge

    Arugula

    So, why does it taste so bad? Is it a folded protein?

    1. Tom 7

      Re: Arugula

      You're eating it wrong.

      1. Pascal Monett Silver badge
        Coat

        He's eating it. That's what's wrong.

  2. Schultz
    Boffin

    This is great news ...

    but I am still a bit skeptical because the number of DeepMind "breakthroughs" in the last years rivalled the number of Elvis sightings. Without disrespect to the King, I don't mean this as a compliment because these breakthroughs vanished as fast as the King himself on every occasion.

    Maybe this is finally the problem where AI can shine. DeepMind might be able to predict new structures based on recognizing similarities to known structures. That should be a problem well suited to AI and big number-crunching computer systems. It won't predict anything new or surprising but it might recognize patterns that have been seen before. With ever-increasing data-bases of know structures, this will only get better with time.

    There is one major danger in moving away from hard experimental data and towards computer prediction: You may miss unexpected new structure elements if your analysis relies on comparison to known structures. As modern structure determination increasingly relies on computer modeling (you need much less data if you can substitute some solid speculation into your analysis), we might eventually close the circle: Computers calculate "experimental structures based on sparse data and a lot of pattern recognition, which then confirms modelled structures that are based on the same type pattern recognition. (If it has 4 wheels, it's a car and you can stop it right there. Unless you think the attached backhoe has any significance.)

    1. Dave 126 Silver badge

      Re: This is great news ...

      Determining the physical structure of an individual protein created by a string of amino acids by X-ray crystallography takes many months. It can be done, and indeed it forms the backbone of this long-running competition (since 1994), but it slow going.

      Being able to calculate a protein's shape with 85% accuracy within days is a huge step forward.

      It is never going to replace physical testing of candidate proteins, but it will drastically reduce the search space. It is not closing the door to serendipitous discovery.

      Remember that the PlayStation 3 Folding@Home project was at one time the world's most powerful computer system, and was created to study this very protein folding problem.

      1. Boothy

        Re: This is great news ...

        Not sure why you're calling Folding@Home a PlayStation 3 project? It's a PC project (Windows, Linux and MacOS), and has been since 2000.

        Yes PlayStation 3 had a client for a while, launched back in 2007, and gave a boost to the overall project performance at that time (and likely some extra exposure/publicity). But unfortunately Sony dropped the client in 2012, and didn't launch a new one for the PS4.

        Looks like in 2012, PS3 accounted for about 15% of Folding@Home performance, a little more than Mac+Linux combined at the time. (The rest being Windows and GPUs).

        More recently, Folding@Home was the first system to hit over an exaflop back in March this year, and got to 2.43 exaflops by April 2020 (In comparison, Fugaku the Worlds fastest supercomputer, runs at about 0.5 exaflops). I would assume mainly due to COVID peaking interest in protein folding again. Although that seems to have dropped back down to about .23 exaflops currently.

    2. Mage Silver badge
      Big Brother

      Re: This is great news ...

      Agreed.

      Peer reviews, transparency, actual real world deployments not simply designed to feed Alphabet with personal information.

      It's been fed a massive amount of human created and curated data, so how much is simply their resources to feed the specialist database and pattern matching coupled to some sort of folding algorithm.

      So far this is PR spin.

      1. Dave 126 Silver badge

        Re: This is great news ...

        > So far this is PR spin.

        Yet it is published in Nature, won a longstanding competition (in which only the organisers knew the actual shape of protein molecules after months of study), and the code is available on GitHub.

        But hey, don't let that stop you from just reiterating your stock response again.

      2. Michael Wojcik Silver badge

        Re: This is great news ...

        how much is simply their resources to feed the specialist database and pattern matching coupled to some sort of folding algorithm

        Not much. None, to a first approximation. Perhaps if you took the time to learn a bit about how attention ANNs work you'd have something useful to say on the matter.

    3. Tom 7

      Re: This is great news ...

      Last year an AI came up with solutions for the 3 body problem 100 million times faster than 'normal methods'. Humans largely tend to work things out from first principles in a progressive way whereas AI can sometimes find links between input and output that may involve a surprising collapse of several layers of human struggle into something easily predictable over some limited domains. Its likely the folding AI will fail badly on many proteins and will have to be checked by F@H.

      Humans being humans its likely out understanding of the reasons behind particular collapses which can drive science on in leaps and bounds will be held up by willy waving and the desire to be first to publish and, worse, patent.

    4. Danny 2

      Re: This is great news ...

      I was skeptical two decades ago when all the young computer students I met were specialising in AI, yet without substantial results from the field. With this breakthrough I admit I was wrong. I can finally hold my head high around my first love who is a published molecular biologist - admittedly I never worked in AI but I did read a book on it.

    5. Cuddles

      Re: This is great news ...

      "There is one major danger in moving away from hard experimental data and towards computer prediction: You may miss unexpected new structure elements if your analysis relies on comparison to known structures."

      Exactly. You may also just be wrong, given that even for known structures it's still not perfect. It's a neat idea and certainly has some useful applications. But all these reports saying that it's now just as good as actual measurements and might replace them in the future are just plain nonsense. It's no different from any other theoretical science - you can make all the predictions you like based on whatever previous knowledge you have, but at some point you're always going to have to look at the real things that actually exist.

      1. Dave 126 Silver badge

        Re: This is great news ...

        > But all these reports saying that it's now just as good as actual measurements and might replace them in the future are just plain nonsense.

        The 'actual measurements' are a pain in the arse to obtain, and dont actually return a model of the protein. The experimental data is more akin to shadows cast on a wall by a piece of knotted string, when you want to know the structure of the knot. Going forward, it looks like a partnership of prediction and experiment will yield the most useful results:

        The organizers even worried DeepMind may have been cheating somehow. So Lupas [competition founder] set a special challenge: a membrane protein from a species of archaea, an ancient group of microbes. For 10 years, his research team tried every trick in the book to get an x-ray crystal structure of the protein. “We couldn’t solve it.”

        But AlphaFold had no trouble. It returned a detailed image of a three-part protein with two long helical arms in the middle. The model enabled Lupas and his colleagues to make sense of their x-ray data; within half an hour, they had fit their experimental results to AlphaFold’s predicted structure. “It’s almost perfect,” Lupas says. “They could not possibly have cheated on this. I don’t know how they do it.”

        - https://www.sciencemag.org/news/2020/11/game-has-changed-ai-triumphs-solving-protein-structures

  3. Primus Secundus Tertius

    Same again

    This seems to be a pragmatic approach, that the next protein will be only a little bit different from the previous one. But there has been a lot of hard work to make this approach succeed.

  4. Filippo Silver badge

    One of my customers is working on applying ML to a certain problem in biology. Not protein folding, but still, the following considerations apply to all systems that try to do predictions, simulations and whatnot in the context of scientific research.

    Generally speaking, with these ML projects (not calling it AI, it ain't), the point is not to replace physical experiments; that doesn't make sense. The point is that, once you've defined your problem, the space of possible experiments that might give you useful information is mind-boggingly vast. You can't run them all, you can't even get close. Very often, just figuring out which experiments are worth running is in itself a major project.

    A ML system that has somewhat decent accuracy can effectively and cheaply point you towards a bunch of experiments that are more likely to give interesting results. So you run those first. You won't necessarily miss the oddballs, but you'll only get to them after you've tried the best candidates and failed - which is as it should be.

    It's like when you are trying to brute-force a password - if you just literally brute-force, it'll always take you a very long time, but if you run dictionary words first, you've got a fairly decent chance of scoring early, because most passwords are not random. Same thing here. The ML tool is your dictionary.

    1. tip pc Silver badge

      "not calling it AI, it ain't"

      spot on,

      So long as those in charge realise this is a tool and use it as a tool things should be ok.

      As you point out, computer modelling can be used to quickly analyse candidates worthy of further investigation, or could be used to verify older assumptions with old answer checked manually by experimentation using modern equipment where they don't match the new predictions and the modes further refined.

      the day some bean counter realises they can do modelling instead of real experimentation is when the trouble starts.

      No doubt we will see compute for science shops opening that will specialise in these compute models and sell their findings to bio speculators.

    2. Anonymous Coward
      Boffin

      ML limitation

      Coincidentally, Google boffins, who also call it ML, not AI, just published a paper to ARXIV on a new weakness in ML which they term underspecification. TLDR “In general, the solution to a problem is underspecified if there are many distinct solutions that solve the problem equivalently,”

      Article explaining it:

      https://www.discovermagazine.com/technology/google-reveals-major-hidden-weakness-in-machine-learning

      Paper:

      arxiv.org/abs/2011.0339

      1. mjflory

        Re: ML limitation

        The URL for the Arxiv paper on underspecification has lost a digit. It should be:

        arxiv.org/abs/2011.03395

  5. Eclectic Man Silver badge

    BSE+

    Now, maybe they turn their program to analyse the different foldings of proteins associated with brain diseases such as Bovine Spongiform Encephalopathy or Creuzfeld-Jacob Disease to determine how some proteins change shape and the effects thereof.

  6. Pascal Monett Silver badge

    Great progress

    Good to know that our statistical analysis machine technology is finally giving back something useful.

    Also good to know that the people involved are not touting this as a replacement for research, but as a valuable addition.

    And the best to know is that it will be peer-reviewed.

    That means it's serious, not like a lot of pseudo-AI news we've been getting this year.

  7. phy445

    Nice, but the problem still stands

    This is nice work, it gives useable results in a few days rather than a few decades - which is super, but it doesn't really solve the protein folding problem. Its a black box that gives a pretty good stab at structure of the proteins.

    What biophysics has been trying to work out it how do the proteins fold so quickly (and so reliably)? They form those structures that have been splashed all over the articles on this system way to fast – this is the protein folding problem. Solve this and you will be the next Watson and Crick (hopefully without all the racism, etc.).

    1. Vikingforties

      Re: Nice, but the problem still stands

      Good point, it's a good step on in emulating reality but doesn't tell us much more about reality itself.

    2. Anonymous Coward
      Anonymous Coward

      Re: Nice, but the problem still stands

      To be fair, they weren't trying to solve the "how", that's not where the big money is. The "what" (structure comes from one of those tantalising hypothetical protein sequences) is where the scientific entrepreneurs see the utility at this point, for the medical applications. I agree with you that the how is intellectually more interesting, hopefully it will be able to pull something in the way of principles out of this AI stuff that help us construct a virtual ribosome which shows us what actually happens in the folding process - opening the black box.

      I was disappointed to see one of the challenge organisers quoted as saying it would encourage people to do "more thinking and less pipetting". More thinking is always a good thing, but less experiments... not so sure.

    3. Schultz
      Boffin

      how do the proteins fold so quickly (and so reliably)?

      This is a phantom question. Protein folding is driven by local interaction creating local order, which gradually reduces the phase space available for folding and therefore allows a fairly rapid convergence to a folded structure. Humans and computers can't reliably reproduce this process but that doesn't make it mysterious.

      And as a side note: we only know well ordered (folded) structures because those are the only ones we can experimentally characterize. There are lots of not-so-ordered protein domains out there but we cannot resolve their heterogeneous structures.

      1. Francis Boyle Silver badge

        Re: how do the proteins fold so quickly (and so reliably)?

        Well put. Sometimes you just have to accept that complexity is just complexity.

  8. Anonymous Coward
    Anonymous Coward

    Not as good as the article seems to indicate

    This isn't a generic "fold any protein" system. The competition has several divisions, and this is only one of them - "Regular targets". No multimeric targets, refinement targets, or contact predictions. And it has an 87% accuracy rate, not 100%, for predicting the structures that were ALREADY determined by humans - hardly "solving" this problem.

    It's a great start, but there's still a long way to go.

    https://blogs.sciencemag.org/pipeline/archives/2020/11/30/protein-folding-2020

    1. rob miller

      Re: Not as good as the article seems to indicate

      > for predicting the structures that were ALREADY determined by humans

      It's a blind test - the targets have been solved and by agreement not published until after the assessment.

      1. Anonymous Coward
        Anonymous Coward

        Re: Not as good as the article seems to indicate

        True. But it still:

        1. Just matches things already done by humans, so hasn't been tested against new problems

        2. Is likely to come up with the kinds of structures that a human would be able to determine (since that's it's training set) rather than harder-to-determine ones

        3. Only for one subset of protein folding (e.g. no protein complexes)

        Again, it's a great start, but hardly "solves" the problem. It's like coming up with a calculator than can do basic 4-function calculations with 5-digit numbers correctly most of the time, and claiming that it's "solved" math in general.

  9. Abbas

    Scientist here. All your objections valid and true. For the moment being a black-box approach. Not comfortable.

    But believe me, it's bowel-liquifying moment in science.

    Not many like this in a lifetime.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like