# Reply to post: Re: Models, fits, and wild guesses

### When one of NASA's sun-studying satellites went down, AI was there to fill in the gaps

#### Re: Models, fits, and wild guesses

I don't think they are pretending that they will do major advances on the science here, more that they think they can still provide early warning for events.

Given the name of the journal where the article is published[1], that is actually quite funny :-)

Seriously though, this is a fairly common, and perfectly valid, line of reasoning in science: "I haven't got the data I know I need. Can I somehow massage the information I've got to get the same or substantially similar data?" What you do next is however critical. Let's assume you somehow have a prescription which estimates the missing data from other observations. Two possibilities exist:

a) The estimate is not sufficiently similar to the actual measurement. This is a boring possibility [2], so we'll forget about it.

b) The estimate is substantially similar to the actual measurement. This means that the measured data we had as the input does in fact contain the necessary information. With physics-based models, we can trace this information to the specific features of the input data and the processes underlying these features, understand them, and then monitor these features directly. With traditional statistical analysis (e.g. principal-component or time-series analysis) we could at least locate the correlations between the measured data and the desired observable, which will guide our understanding. With a neural-net fit, we've learned nothing beyond the fact that the correlation exists. We do not know why; we do not know which part or feature of the measured signal contains the data; we do not know how robust the correlation is and whether it will still hold tomorrow. That is numerology, not science.

[2] Which however can't be eliminated in the present case, since the instrument performing the actual measurement is dead, and the real comparison is no longer possible. The standard counter-argument is that this possibility can be guarded against by dividing the initial dataset into the training and testing subsets; if the data from the testing set, which was not used in training, is reproduced, then everything is fine. This counter-argument is valid if, and only if the available dataset covers all possible variations of all possible input parameters the system depends on. If some of the parameter space was never explored by the reference dataset, the behaviour of the neural net for these parameter combinations is intrinsically unpredictable. This is the key difference between the physics-based models and fitting.

## POST COMMENT House rules

Not a member of The Register? Create a new account here.