back to article AI can predict the structure of chemical compounds thousands of times faster than quantum chemistry

AI can help chemists crack the molecular structure of crystals much faster than traditional modelling methods, according to research published in Nature Communications on Monday. Scientists from the Ecole Polytechnique Fédérale de Lausanne (EPFL), a research institute in Switzerland, have built a machine learning programme …

  1. find users who cut cat tail

    AI Machine learning can predict kind of guess the structure of chemical compounds thousands of times faster than after being extensively trained on results calculated using actual quantum chemistry.

    TIFIFY

    What always annoys me is that if they did not have a large database of results obtained by the oh-so-slow real DFT calculations, their ‘AI’ would be utterly worthless. People had to do all the DFT calculation -- and still have to do them for anything unusual the thing has not been trained for.

    1. Dr Stephen Jones

      "AI" = Fake News daily

      Shoddy and generally useless research does not merit a press release. Wake me up when Winter starts.

      1. onefang

        Re: "AI" = Fake News daily

        "Wake me up when Winter starts."

        Winter is com... oh wait, am I allowed to say that?

      2. Korev Silver badge
        Boffin

        Re: "AI" = Fake News daily

        generally useless research does not merit a press release.

        If this works then this is huge for the field. When you make and develop drugs you need to ensure that the "Polymorph" that patients get is the same as the one that was in Clinical Trials. There have been cases where this has not been the case; there was even a case where an <a href="https://en.wikipedia.org/wiki/Ritonavir>HIV drug had to be temporarily withdrawn</a> because the active ingredient's molecules were changing structure on the pharmacies' shelves. Moreover, it could also make drugs that scientists know works into a form that a patient can actually take.

        If drug companies and regulators can carry out this kind of simulation quicker then this could save lives as well as save lots of money. The increase in throughput that this is claimed to offer would mean that many more drug candidates can be virtually screened for this kind of activity.

    2. Cuddles

      "People had to do all the DFT calculation -- and still have to do them for anything unusual the thing has not been trained for."

      Indeed. It's all very well getting a AI machine learning prediction on what a new molecule might look like in only 6 minutes, but that doesn't really help if you still need to spend years doing real physics to check if it's actually right.

  2. Pascal Monett Silver badge

    "SwiftML [..] can perform as accurately as DFT programmes in some cases."

    To be sure, 6 minutes instead of 16 years is quite the improvement, but only some cases ?

    I do hope that they know which cases, because it would be a shame if they applied it wrong yet still used the answer that their statistical analysis machine gave them.

    Because it's not AI - the computer doesn't know it is in the wrong use case.

    1. FrogsAndChips Silver badge

      Re: "SwiftML [..] can perform as accurately as DFT programmes in some cases."

      Maybe that's a case of "finding the solution is hard, but checking it is easy"? Say the AI is right 50% of the time, in those 50% cases you can quickly confirm the solution, in the other 50% you're back to the classic method but have hardly wasted any time.

    2. Anonymous Coward
      Meh

      Re: "SwiftML [..] can perform as accurately as DFT programmes in some cases."

      I do hope that they know which cases, because it would be a shame if they applied it wrong yet still used the answer that their statistical analysis machine gave them.

      Because it's not AI - the computer doesn't know it is in the wrong use case.

      Yes, I wish people would use the term "statistical analysis machines" in preference to "artificial intelligence" when talking about artificial neural networks. "Artificial Intelligence" gives the impression that it is some magic, when all they are doing is using the statistics of the training set. The major benefit of these techniques is often just that the programmer doesn't have to actually bother with, or even know, any statistics - he just plugs the training set in and cranks the handle, and voila - "its intelligence Jim, but not as we know it".

      Perhaps there is hope - I am just about old enough to remember when expert systems were similarly puffed up as Artificial Intelligence.

      1. MonkeyCee

        Re: "SwiftML [..] can perform as accurately as DFT programmes in some cases."

        "I wish people would use the term "statistical analysis machines" in preference to "artificial intelligence" "

        That's pretty much how I describe my study to people. Since saying my faculty name, or chosen specialty just ends up with more questions, saying "I do statistics with computers" covers it much quicker.

        The AI hype is just that. Same shit, different decade. We quite literally have a course on it, including noting that we're experiencing the third wave of such hype. Some of the same predictions still hold: Computers will be able to beat 99.9999% of humans in finite state games, as long as there is a suitable technology to record and search the finite states. But still won't outsmart a four year old in the real world.

    3. Ian Bush
      Flame

      Re: "SwiftML [..] can perform as accurately as DFT programmes in some cases."

      "To be sure, 6 minutes instead of 16 years is quite the improvement, but only some cases ?

      I do hope that they know which cases, because it would be a shame if they applied it wrong yet still used the answer that their statistical analysis machine gave them."

      Well speaking as somebody who develops DFT codes on HPC machines I would use the following technical term to describe their comparison: Bollocks. You can solve systems with a few thousand atoms relatively easily on a few 10s or 100s of cores on a modern HPC machine, maybe 1200 hours is fair, but 16 years, WTF are they on?

      Oh, and of course out of a quantum mechanical calculation you get the wavefunction which allows deep insight into the properties of the system under investigation. Out of curve fitting you get the fitted curve. Don't get me wrong, this can be extremely useful in certain circumstances. But what is being presented here is complete over-sell - "News: Drawing a line through some points is quicker and easier than solving the fundamental laws of nature, film at 11!!!"

  3. Destroy All Monsters Silver badge
    Holmes

    Hmmm....

    Another technique known as Density functional theory (DFT) is needed. It uses complex quantum chemistry calculations to map the density of electrons in a given area, and requires heavy computation. SwiftML, however, can do the job at a much quicker rate and can perform as accurately as DFT programmes in some cases.

    "Taking a wild-ass guess can do the can do the job at a much quicker rate and can perform as accurately as trying to model the problem from first principles."

    Well, yeah. Although using "accurately" here is frankly an evil abuse of language.

  4. Destroy All Monsters Silver badge
    Windows

    Remember the gold old times

    ...when "ML" stood for "Metalanguage"?

  5. ExampleOne

    This isn't really a major surprise. Any semi-competent graduate chemist can produce a basic guess at what an NMR spectrum for a given molecule will look like a lot faster than any DFT calculation can. Given a large enough, and suitably diverse, sample set I would expect this to be "acceptable". I am more surprised no one has attempted this approach before.

    That said, I suspect the two methods are not doing quite the same thing. Reading the supplementary information, it appears the DFT method is first doing a geometry optimization, with no hint that the ShiftML is doing the same. This is a grossly unfair comparison if that is the case.

  6. onefang
    Coat

    The specific example they give is cocaine. I wonder what sort of drug research they are planing on doing?

    I'll get my coat, it's the one with the mysterious patches of white powder.

    1. MiguelC Silver badge
      1. Alan Brown Silver badge

        "Cocaine is a drug sometimes used in preparation for or in medical procedures"

        It's an extremely useful (and extremely cheap(*)) drug and the various substitutes are a lot more expensive, for less effect or worse side effects, Heroin (diamorphine) is also a helluva useful and cheap medical drug with most of the substitutes being a poor second choice or bloody expensive by comparison.

        The easiest and quickest way to "solve" the "opium and cocaine growing problem" is to contract the growers to supply for the medical industry (which is crying out for supplies) at a better rate of pay than the narcogangs offer - it's actually what was done to "solve" the problem in Turkey. The fields are still there but the destination changed.

        (*) The profit margins in illegal drugs are measured in tens of thousands of percent over the raw material costs. A medical knockout dose of cocaine is a few pence - and that will cost a lot more on the street, cut with godknowswhat. Anyone who thinks that the "drug problem" is about drugs has missed the amount of money involved. They're just the means to an end.

  7. JimmyPage Silver badge
    Stop

    Not really AI though, is it ?

    Just a very niche expert system.

    If it was "AI" it wouldn't need programming. You'd just tell it what to do.

  8. Loyal Commenter Silver badge

    Groups of atoms oscillate at a specific frequencies, providing a tell-tale sign of the number and location of electrons each contains.

    This is not strictly true; NMR works by flipping the nuclear spin on unpaired protons in the atomic nucleus (hence the 'N' for 'nuclear' in 'NMR' - which is also why the same technology used in the medical field is called 'MRI', because the 'N' bit sounds scary to patients).

    Normally, the nuclei of atoms are oriented in space in random directions, but in the presence of a strong magnetic field (and these things contain very strong superconducting magnets), it is energetically favourable for nuclei containing unpaired protons to line up with the field. Pulsing the sample with a radio-frequency burst (typically in the hundreds of MHz depending on the field strength) gives the nuclei enough energy to spin freely. They can then re-emit radio frequency photons to settl back into field alignment, which is what is measured by the spectrometer. The type of nucleus and environment it is in (other nearby atoms bonded to it) affect the frequency of those radio photons.

    Atoms with an even number of protons (such as carbon and oxygen) don't give any signal at all because the protons go round in pairs with their spin aligned opposite to each other, efectively cancelling each other's spin, and therfore not lining up with the field. It is also why expensive solvents containing deuterium, rather than hydrogen (such as heavy water) are used for samples, otherwise the signal from the solvent swamps the signal you are actually interested in. In most organic samples, NMR is only looking at the hydrogen atoms, although IIRC, compounds with nitrgoen or phosphorus can give more complex spectra.

    So, the tl;dr; is that NMR gives information about protons, not electrons. Although the electron density of nearby atoms can shift the signal to a lower frequency (known as down-field shifting), the interesting signals come from 'coupling' between other nearby atoms with odd numbers of protons, which splits single peaks into patterns of multiples.

    1. ExampleOne

      Actually, 13C is probably the second most common type of NMR spectrum collected, and would be required for most basic organic chemistry results. Oxygen is also used.

      Deuterium is NMR active, (it has a spin number of 1) but due to the way it's signal behaves it doesn't overlap with the hydrogen spectrum you are normally trying to collect. It can be used to provide a reference lock for a lot of the standard experiments.

      1. Loyal Commenter Silver badge

        Admittedly, it is the best part of two decades since I've used an NMR machine. IIRC, 13C spectra are weak (due to low isotope abundance) and messy (due to lots of coupling in anything but the simpelst compounds), so require much higher fields to get a clean spectrum. At the time, the cutting-edge machines ran at about 400MHz (with field strengths of around 10T), and the one in use at my Uni was 100Mhz. I suspect the frequencies (and corresponding field strengths) have gone up since then towards the 1GHz range.

        You're absolutely right about the 2D spectra - I'd totally forgotten that deuterium isn't spin-neutral! If my memory serves me correctly, the peak from trace amounts of un-deuterated solvent at the machine's native frequency was used as the reference point, as the solvents were usually 99+% deuterated, not 100%.

        1. Anne Hunny Mouse

          19 years since I ran NMR as a post grad. Day to day work was on a 250 MHz and 400 for more specialised work (often used by the supramolecular group).

          I can remember Glaxo getting the first 750 MHz machine in the UK

          around 94-95.

          It was taken out of action for some time after the photographer taking publicity shots got closer than allowed and the magnetic caused their equipment to smash into the machine...

          1. Loyal Commenter Silver badge

            When I was a student, I remember seeing a number of 2 pence coins placed edge to edge suspended in mid-air following the field lines of an NMR spectrometer. It gives you an appreciation of the field strengths involved, especially considering that 2p coins are not ferromagnetic..

            Powering down (and restarting) the magnets on those things is not a trivial task, especially when you consider that (at the time at least), they were cooled with liquid helium (at -269°C), which itself was surrounded with liquid nitrogen (at -196°C). Getting those things up to room temperature so that foreign objects can be accessed and removed, and then cooling them back down again is a serious undertaking.

          2. Loyal Commenter Silver badge

            I can remember Glaxo getting the first 750 MHz machine in the UK

            If that machine was based in Stevenage, I may have actually used it, in around '99 or so, back when they were GlaxoWellcome, but not yet merged with Smithkline Beecham to form GSK.

  9. katrinab Silver badge
    Facepalm

    "AI" here means "looking up something in a database".

    I have no doubt that it is a huge database, and very fast, and very useful, but it is no more intelligent than looking up a dictionary in the library.

  10. Long John Silver
    Pirate

    Never look a 'black box' in the mouth

    I understand pragmatic justification for using machine learning. Yet, it's disturbing should too much trust be placed in a technique the inner workings of which, for each instance of use, are occult.

    In essence, machine learning is atheoretical. With analogy to the statistical technique of multiple linear regression, it might be said to discover 'adjusted' correlations and use these for prediction outside the context of its derivation. The analogy breaks down when the roles of statistician and, in this instance, chemist are compared.

    The statistician has complete understanding of how multiple regression is performed and is able to mess around with combinations of independent variables, assess the stability of his model according to changes, and explore interactions among variables. Although there are circumstances in which regression models may be deployed to make predictions beyond the original set of data this requires great caution. Statistical modelling techniques in general are better suited to exploring relationships under guidance from external theoretical considerations.

    The chemist is confronted by a 'black box'. There doesn't seem to be scope for fiddling about with assumptions as within the statistician's domain. It's unclear whether the result can be tuned to be more informative in the light of broad knowledge of chemistry's theoretical underpinning.

    As always, caveat emptor.

  11. Prosthetic Conscience
    Joke

    the same feat would have taken 16 years with conventional techniques

    Can't they just give it a lick and find out?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like