Re: This isnt very good
"Nope, I'm afraid you've got the sensitivity analysis bit wrong. The feature space is 19 dimensional. The sensitivity of one parameter (say IT load) depends on the values of the other 18 variables. You cannot just pick one slice through a multi-dimensional space and infer that moving 1 variable (IT load) always gives the same PUE response."
I didn't "get it wrong", I'm not the one who did the analysis. My description of the analysis isn't any different from how you described it, you're just spinning it in a negative tone ("You cannot just pick one slice ..."), while I was simply stating what it *might* be useful for. The interdependence of variables is, of course, an issue, which is why I said it sort of assumes a linear dependence. If the dependence between params a and b is linear, you can expect that parameter a will have a similar effect on the output across the entirety of b's values, and b will just be "scaling" the effect of a (just as a simplistic example of what I meant by assuming linear dependence). Of course this assumption probably doesn't hold, so yes, the analysis isn't that useful. But it's still not part of the validation.
"Graphs 4b and c are total nonsense as the inputs are integers - In reality there are only 2 data points, 0 and 1 - yet the paper talks about a non-linear relationship if you have 0.79 of a cooling tower. Thats nonsensical."
You're right, that is weird. It's probably nonsense, or they're normalising over max-{chillers,cooling towers}. Either way, it's an error on the author's side.
"Also, while we're at it, depending where you are on an exponential curve, it can look pretty straight."
Only if you do some weird scaling on the axes, independently (large scale for vertical combined with small scale for horizontal), but I see your point.
"Where you are on the curve is dependent on the other 18 variables. So changing these can make the 'curve' look straight. Thats why its fundamentally wrong to extrapolate the response from just one set of values for the other 18 variables."
Yes, we covered this. It assumes linearity, etc, etc. I thought we could get over the issues of the sens analysis by just pointing that out.
"The cross validation is wrong too. The data was sampled at 5 min intervals and 30% was used as 'unseen' test data. But the dataset was shuffled chronologically. Looking at the variables and its highly likely that data received every 5 mins will be highly correlated. Removing (on average) every third data point means the test data is very highly correlated with the training data and cannot be said to be independent or unseen. Thats why the prediction rate is so high, relatively speaking there are a LOT of nodes in that network and it is basically overtrained to pieces with test data pretty much the same as the training data."
I'm not sure there's really an issue here. Yes, the validation samples are picked from "in between" the training samples, but how would you do it? There's no temporal training so each sample is assumed independent (as far as temporal sequence is concerned, they're just arbitrary points in time). Any other kind of splitting besides random would induce biases. Of course, there is a temporal dependence between data points (consecutive points on the same day probably have very similar ambient temps, usage loads, etc) and are correlated, but the same can be said for times of day, or period of the year (points from the same time of day across a whole month probably have very similar usage loads, points from the same time-of-year are probably correlated based on loads, temps, etc). Unseen doesn't mean independent. In fact, if your unseen data is completely independent, you're going to have a hard time testing. The goal is generality, not tricking the network into failing.
"The usual way to demonstrate this is to show the test and train data performance over the set of training epochs. The training performance generally gets better and better whilst at some point the test data performance will get worse as the nnet becomes over-specified to the training data. We havent seen this."
Agreed, an overfitting test would have been nice.
"The bit you've missed is to evaluate a neural network (or any form of classifier) you need _representative_ data, not complete data. As I said, once you step out of the data range, yes the nnet is 'extrapolating' and providing a numerical answer - but its just guessing. If the weather is different to the data gathered over 2 years (so very hot or very cold) that system is just guessing. And anyone can do that."
It's a better guess than "anyone" can do, though. I mean, it's data across two years. Assuming there hasn't been any radical climate changes or usage changes to be expected across consecutive periods, then you can probably train on 2 years of the most recent data and keep updating as you go along. Barring any extraordinary events that might make the inputs deviate substantially from their historical values, you're probably going to be within the bounds you've been training with or very slightly outside of them, which a properly trained ANN should be able to handle.
What could be more representative data for testing than random points in time across your data set? It's not "complete" data, it's a subsample of your usage scenario across 2 years.
"The real point of nnets is providing a tool capable of modelling non-linear relationships where you dont have to have a preconceived model of the relationships. If the relationships are linear then an nnet won't outperform a linear classifier."
It might, but it wont be worth the effort.
"Because of the response function in the neuron you always get a smooth transition through the feature space that looks convincing but thats just an artifact of the maths, not the data."
Not sure what you mean by "looks convincing".
"To be honest, the more I look at this, the worse it gets. As I said it isnt very good or convincing and has some really basic mistakes in it. Its basically nnet models training data very well shock."
I'm not as sceptical because of one very important reason: It worked. Not on testing data, or on validation data, but on the real thing *after* it was trained and validated. The sensitivity analysis is a bit broken, I agree, but even with the assumptions it makes, there's no denying that it shows the I-O relationship is highly non-linear (independent of the linear assumption is might be making for the interdependency between inputs), so an ANN is well worth the effort.