back to article Google crafts neural network to watch over its data centers

Google has put its neural network technology to work on the dull but worthy problem of minimizing the power consumption of its gargantuan data centers. By doing this, the company says it has created a bit of software that lets it predict with 99.6 per cent accuracy how efficiently its data centers consume electricity, allowing …

COMMENTS

This topic is closed for new posts.
  1. Nate Amsden

    doesn't seem very amazing

    The term "neural network" sounds very sophisticated but the system described seems like a collection of scripts and some algorithms that look at these 19 points and react based on them. Like almost any kind of monitoring system, just operating at a data center level instead of a server or cluster level.

    It sounds neat for sure but the article to me makes it seem far more grand than it actually feels like it is.

    1. JackClark

      Re: doesn't seem very amazing

      Hi, Nate - actually it is a full neural network so somewhat more than some scripts. Technical details are given in the detailed whitepaper linked to from within Google's blog post http://googleblog.blogspot.com/2014/05/better-data-centers-through-machine.html

      Hope that helps!

      JC

    2. Anonymous Coward
      Anonymous Coward

      Re: doesn't seem very amazing

      @Nate Amsden - "The term 'neural network' sounds very sophisticated but the system described seems like a collection of scripts and some algorithms that look at these 19 points and react based on them."

      Agreed - this is still just programming, although clearly a bit sophisticated. But there's no true "learning" going on - just an adjustment to new inputs.

      1. Achilleas

        Re: doesn't seem very amazing

        @Nate and Andy:

        (NB: Reading my post after writing it, I realise it might come off a bit condescending or simplistic. I apologise and this was not the intention. I felt like describing how ANNs work was necessary and I have no idea what the average engineer or IT worker knows about them and how they work).

        "Learning" is a tricky word when it comes to artificial neural networks (ANNs) and machine learning in general. Of course it's "just programming", neural networks aren't magic and no one's ever claimed they are, but they're not just "a collection of scripts" either.

        The way an ANN (or this one in particular) differs from a standard piece of monitoring software is that, when it's very hard to predict which of the inputs has to change (and how much) for the output to change in an intended way, the monitoring software ends up relying on intuition and small searches in the parameter space. Imagine you have 5 inputs, 5 values you can tweak in your system. Each one affects the system in its own way and changing each one has an associated cost. Even if we know how they affect the system relatively (say, inputs a, b, c increase the output value, d and e decrease it), we're still looking at a 5 dimensional space, which is likely non-linear (few things worth investigating are linear) and we're trying to figure out how to spend the minimum cost to change inputs in order to get a desired effect on the output. Alternatively, we just want to be able to predict how a certain subset of the inputs (which are out of our control) affect the output, so we can make an informed decision on how to tweak the rest, or what to expect when certain cases occur.

        Now imagine you have a 19 dimensional input space. Where the learning comes in, what it essentially does, is construct the equation (the model) that connects the inputs to the output. It's more akin to a "smart random search" of the entire solution space, but keep in mind that the network built in this case has a (roughly) 10000-dimensional solution space (which seems a bit like overkill to me, but I guess there was a good reason to make the network 5 layers deep).

        I'm sure arguments can be made either way about whether gradient descent (the core learning algorithm in most neural nets) constitutes learning or not and I'm sure these arguments have been going on for over 50 years by people much more qualified than me (on both sides) and like the news article already mentioned, there's nothing really fancy or new about the methods used in this instance. But even though the algorithms and methods have been around for decades, they never really caught on in the IT industry in general, so much so that putting this 50+ year old algorithm to use to actually cut costs is newsworthy!

        So no, it's not a bunch of scripts and algorithms like standard monitoring software and it's not just an adjustment to new inputs. It's a minimization/optimization on a 10k-dimensional solution space that produces a predictive model that couldn't be done by exhaustive search or intuitive tweaking by hand.

        1. Sir Runcible Spoon
          Boffin

          Re: doesn't seem very amazing

          I always thought that the difference between a neural network and smart system was that neural networks could be applied to learning anything (with appropriate inputs and feedbacks) whereas smart systems were single purpose.

          So what really is the difference?

          1. Achilleas

            Re: doesn't seem very amazing

            Smart systems are about feedback-control loops: take readings from sensors -> adjust system to bring the behaviour closer to the desired situation.

            Neural networks are (generally) used as predictive models, i.e., what would be the change in the system if I changed the inputs (by xyz amount)? However, I don't see any reason why a neural net couldn't be used as a controller for a smart system.

            The way I see it (though, I claim no expertise on smart systems), when one talks about smart systems, they mean the actual controller (which, again, could very well be a neural net *after* it's undergone training and has been shown to fit the data), but when one talks about neural networks and machine learning, they're almost always talking about the actual learning of the relationship between input and output. In other words, what the ANN provides for us is an automated way of configuring the controller. The point of machine learning is automatically and intelligently figuring out the input-output relationship, while the point of smart systems (as far as I know, anyway) is adjusting inputs in response to deviations in the output from a desired state, i.e., taking advantage of pre-existing knowledge of the I-O relationship to keep outputs at desired states.

  2. roger stillick
    Go

    Google's energy nural net is a good idea

    A local fish freezing plant installed a neural network to drive its chillers n fan system for minimum energy use...ROI was less than 3/4 year n energy use went way down... side benefit= it predicts equipment failure...

    IMHO= this is a no brainer if done right...RS.

  3. Roo

    Genuinely impressed...

    I am genuinely impressed that folks have found a good use for neural networks that generates some measurable ROI.

  4. Steven Raith
    Terminator

    Skynet

    We're all fucked.

    (yeah, I know it was obvious...but come on, someone had to read nothing but the headline and sub-head and comment without reading the article!)

  5. Anonymous Coward
    Anonymous Coward

    Obligatory movie quote (slightly modified)

    The Skynet Funding Bill is passed. The system goes on-line August 4th, 2015. Human decisions are removed from all domestic and industrial HVAC and plumbing systems. Skynet begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug in response to a massive heatwave which causes Skynet to over-clock all domestic and industrial air-conditioning systems.

    Millions die........ first of heat exhaustion, then cholera as Skynet reverse flows all sanitary devices and pollutes drinking water.

    And now, the rise of the air-conditioners..... Judgement Day

  6. Harry Kiri
    FAIL

    This isnt very good

    Fail : the validation in the white paper is wrong. You cannot validate a single input in a multi-input model non-linear neural net by holding the other 18 constant - thats meaningless as if you change the value of one other input the line may go down or wobble around or do something entirely different - thats the point of nnets - to provide an arbitrary non-linear model of n-dimensional space.

    Also, when you use a model (nnet or other) you must bound when it knows what its talking about (because its based on data its already seen) and when its just guessing. If the weather is ever hotter or cooler than the data obtained over the 2 years (quite likely) the nnet will basically make a random guess at the PUE. The output is based on the internal weight space of the neural network which has been jogged into position from a random state. Nnets dont magically extrapolate answers and they wont magically interpolate stuff they wont have seen either. Similarly if it hasnt seen all of the combinations of cooling towers running it'll just guess. The PUE guess will be a smoothed extrapolation but in reality the PUE might have a pretty drastic change.

    All in all, this isnt very good and has some quite basic errors in it.

    1. Achilleas

      Re: This isnt very good

      The sensitivity analysis was just an analysis of the impact of individual values based on the (already validated) model. You're right that such an analysis might assume linearity between variables, but it's useful to see whether the impact of singular inputs follows intuition (e.g., higher load -> lower PUE) and you get a sense for the shape of the dependence (albeit in a rather locked down test case). Cross-validation and testing was done on 30% of the data, which wasn't used for training, which is how it's always done.

      "Nnets dont magically extrapolate answers and they wont magically interpolate stuff they wont have seen either. Similarly if it hasnt seen all of the combinations of cooling towers running it'll just guess."

      If they aren't supposed to extrapolate answers on data they haven't seen then what's the point?

      If the data covers the entirety of problem space then you don't need a model, you just need a lookup table. Of course "it'll just guess", that's what ANNs do. They fit a curve to known data, are validated against other data (which we pretend are unknown) and then they're used to predict the output of unseen input cases.

      1. Harry Kiri
        FAIL

        Re: This isnt very good

        Nope, I'm afraid you've got the sensitivity analysis bit wrong. The feature space is 19 dimensional. The sensitivity of one parameter (say IT load) depends on the values of the other 18 variables. You cannot just pick one slice through a multi-dimensional space and infer that moving 1 variable (IT load) always gives the same PUE response.

        Graphs 4b and c are total nonsense as the inputs are integers - In reality there are only 2 data points, 0 and 1 - yet the paper talks about a non-linear relationship if you have 0.79 of a cooling tower. Thats nonsensical. Also, while we're at it, depending where you are on an exponential curve, it can look pretty straight. Where you are on the curve is dependent on the other 18 variables. So changing these can make the 'curve' look straight. Thats why its fundamentally wrong to extrapolate the response from just one set of values for the other 18 variables.

        The cross validation is wrong too. The data was sampled at 5 min intervals and 30% was used as 'unseen' test data. But the dataset was shuffled chronologically. Looking at the variables and its highly likely that data received every 5 mins will be highly correlated. Removing (on average) every third data point means the test data is very highly correlated with the training data and cannot be said to be independent or unseen. Thats why the prediction rate is so high, relatively speaking there are a LOT of nodes in that network and it is basically overtrained to pieces with test data pretty much the same as the training data.

        The usual way to demonstrate this is to show the test and train data performance over the set of training epochs. The training performance generally gets better and better whilst at some point the test data performance will get worse as the nnet becomes over-specified to the training data. We havent seen this.

        The bit you've missed is to evaluate a neural network (or any form of classifier) you need _representative_ data, not complete data. As I said, once you step out of the data range, yes the nnet is 'extrapolating' and providing a numerical answer - but its just guessing. If the weather is different to the data gathered over 2 years (so very hot or very cold) that system is just guessing. And anyone can do that.

        The real point of nnets is providing a tool capable of modelling non-linear relationships where you dont have to have a preconceived model of the relationships. If the relationships are linear then an nnet won't outperform a linear classifier. Because of the response function in the neuron you always get a smooth transition through the feature space that looks convincing but thats just an artifact of the maths, not the data.

        To be honest, the more I look at this, the worse it gets. As I said it isnt very good or convincing and has some really basic mistakes in it. Its basically nnet models training data very well shock.

        1. Achilleas

          Re: This isnt very good

          "Nope, I'm afraid you've got the sensitivity analysis bit wrong. The feature space is 19 dimensional. The sensitivity of one parameter (say IT load) depends on the values of the other 18 variables. You cannot just pick one slice through a multi-dimensional space and infer that moving 1 variable (IT load) always gives the same PUE response."

          I didn't "get it wrong", I'm not the one who did the analysis. My description of the analysis isn't any different from how you described it, you're just spinning it in a negative tone ("You cannot just pick one slice ..."), while I was simply stating what it *might* be useful for. The interdependence of variables is, of course, an issue, which is why I said it sort of assumes a linear dependence. If the dependence between params a and b is linear, you can expect that parameter a will have a similar effect on the output across the entirety of b's values, and b will just be "scaling" the effect of a (just as a simplistic example of what I meant by assuming linear dependence). Of course this assumption probably doesn't hold, so yes, the analysis isn't that useful. But it's still not part of the validation.

          "Graphs 4b and c are total nonsense as the inputs are integers - In reality there are only 2 data points, 0 and 1 - yet the paper talks about a non-linear relationship if you have 0.79 of a cooling tower. Thats nonsensical."

          You're right, that is weird. It's probably nonsense, or they're normalising over max-{chillers,cooling towers}. Either way, it's an error on the author's side.

          "Also, while we're at it, depending where you are on an exponential curve, it can look pretty straight."

          Only if you do some weird scaling on the axes, independently (large scale for vertical combined with small scale for horizontal), but I see your point.

          "Where you are on the curve is dependent on the other 18 variables. So changing these can make the 'curve' look straight. Thats why its fundamentally wrong to extrapolate the response from just one set of values for the other 18 variables."

          Yes, we covered this. It assumes linearity, etc, etc. I thought we could get over the issues of the sens analysis by just pointing that out.

          "The cross validation is wrong too. The data was sampled at 5 min intervals and 30% was used as 'unseen' test data. But the dataset was shuffled chronologically. Looking at the variables and its highly likely that data received every 5 mins will be highly correlated. Removing (on average) every third data point means the test data is very highly correlated with the training data and cannot be said to be independent or unseen. Thats why the prediction rate is so high, relatively speaking there are a LOT of nodes in that network and it is basically overtrained to pieces with test data pretty much the same as the training data."

          I'm not sure there's really an issue here. Yes, the validation samples are picked from "in between" the training samples, but how would you do it? There's no temporal training so each sample is assumed independent (as far as temporal sequence is concerned, they're just arbitrary points in time). Any other kind of splitting besides random would induce biases. Of course, there is a temporal dependence between data points (consecutive points on the same day probably have very similar ambient temps, usage loads, etc) and are correlated, but the same can be said for times of day, or period of the year (points from the same time of day across a whole month probably have very similar usage loads, points from the same time-of-year are probably correlated based on loads, temps, etc). Unseen doesn't mean independent. In fact, if your unseen data is completely independent, you're going to have a hard time testing. The goal is generality, not tricking the network into failing.

          "The usual way to demonstrate this is to show the test and train data performance over the set of training epochs. The training performance generally gets better and better whilst at some point the test data performance will get worse as the nnet becomes over-specified to the training data. We havent seen this."

          Agreed, an overfitting test would have been nice.

          "The bit you've missed is to evaluate a neural network (or any form of classifier) you need _representative_ data, not complete data. As I said, once you step out of the data range, yes the nnet is 'extrapolating' and providing a numerical answer - but its just guessing. If the weather is different to the data gathered over 2 years (so very hot or very cold) that system is just guessing. And anyone can do that."

          It's a better guess than "anyone" can do, though. I mean, it's data across two years. Assuming there hasn't been any radical climate changes or usage changes to be expected across consecutive periods, then you can probably train on 2 years of the most recent data and keep updating as you go along. Barring any extraordinary events that might make the inputs deviate substantially from their historical values, you're probably going to be within the bounds you've been training with or very slightly outside of them, which a properly trained ANN should be able to handle.

          What could be more representative data for testing than random points in time across your data set? It's not "complete" data, it's a subsample of your usage scenario across 2 years.

          "The real point of nnets is providing a tool capable of modelling non-linear relationships where you dont have to have a preconceived model of the relationships. If the relationships are linear then an nnet won't outperform a linear classifier."

          It might, but it wont be worth the effort.

          "Because of the response function in the neuron you always get a smooth transition through the feature space that looks convincing but thats just an artifact of the maths, not the data."

          Not sure what you mean by "looks convincing".

          "To be honest, the more I look at this, the worse it gets. As I said it isnt very good or convincing and has some really basic mistakes in it. Its basically nnet models training data very well shock."

          I'm not as sceptical because of one very important reason: It worked. Not on testing data, or on validation data, but on the real thing *after* it was trained and validated. The sensitivity analysis is a bit broken, I agree, but even with the assumptions it makes, there's no denying that it shows the I-O relationship is highly non-linear (independent of the linear assumption is might be making for the interdependency between inputs), so an ANN is well worth the effort.

This topic is closed for new posts.

Other stories you might like