# Numbers war: How Bayesian vs frequentist statistics influence AI

If you want to develop your ML and AI skills, you will need to pick up some statistics and before you have got more than a few steps down that path you will find (whether you like it or not) that you have entered the Twilight Zone that is the frequentist/Bayesian religious war. I use the term "war" advisedly because war, by …

1. Maybe a daft question but would the bayesian method not have been applied to original 99% infection rates?

2. If there is no infection at all then 1% of the population will still test positive.

In mathematics (and statistics), a good way to test your reasoning is to go for the extreme values and see if it still works. So you should test 50/50, no infection and complete infection, and see what happens.

1. #### If there is no infection at all then 1% of the population will still test positive.

Your sentence is the wrong way around. The data is (would be) that 1% of the population have been tested and came back positive, which means that ... <insert deduction about infection rates here>

1. #### Re: If there is no infection at all then 1% of the population will still test positive.

This gets into understanding how medical tests work (IAAD, and this is a bit of a niche) and which tests you should use.

If there is a very low infection rate, and you are looking to 'rule in' you need a test that is very specific (is only true in disease) otherwise you suffer from too many false positives.

Equally, once the infection is fairly widespread, it is generally safe to assume that anyone who looks like a zombie, is a zombie.

We generally use medical tests with a 1% 'error' rate because in practice that is usually good enough.

To look at this problem in a better way, Likelihood Ratios are the way to go.

1. #### Re: If there is no infection at all then 1% of the population will still test positive.

"it is generally safe to assume that anyone who looks like a zombie, is a zombie."

Unfortunately it's not safe. Which is why we have so many people being sent home by their GP and then dying of meningitis / pulmonary embolism / cancer.

1. #### Re: If there is no infection at all then 1% of the population will still test positive.

It is not safe to asume that anyone who does not look like a zombie is a zombie.

3. Hmm. Think the communications theory profs upstairs are Bayesians. The field seems to have achieved a lot that way. But I'm not going to check, as I left my ear defenders at home.

4. #### "How can you possibly do statistics on a guess!?"

Isn't it how pollsters work today??

1. #### Re: "How can you possibly do statistics on a guess!?"

"Isn't it how pollsters work today??"

No. TFA explains the problem pretty well.

If different age groups have a different probability of actually voting and this is affected by factors like weather, it's very hard for polls to produce a single meaningful number.

The "Corbyn surge" in the under-25s and the 35-45 age group made a big difference. ComRes seems to have called it better than the other polls*. But if you were doing a poll to sell to the Torygraph, and assumption (1) gives May a majority of 100 while assumption (2) gives a hung Parliament, which assumption would the boss wish you to make? At that point he thinks about the cheque and mutters about turnout being guesswork.

Note that I am not making a party political point, but one about not wanting to be the messenger who's going to be shot. I detest roughly equal numbers of politicians on both sides. But our Press is mostly so partisan that it's part of the problem, not the solution.

*I first suspected that they might be right when Johnson tried to spin the BBC audience as composed of "lefties". You can pretty much assume that if Johnson opens his mouth he's either lying or evading the truth, so it was evidence that they had indeed picked a representative sample and he was getting worried.

1. #### Re: "How can you possibly do statistics on a guess!?"

"But if you were doing a poll to sell to the Torygraph"

... well, the Guardian uses (I think) ICM who were consitently giving the highest Concervative lead - though of course, to the "true believer" the Guardian is one of the leaders of the MSM anti-Corbyn conspiracy

1. #### Re: "How can you possibly do statistics on a guess!?"

Well conspiracy aside, the Grauniad is not exactly filled with joy about the prospect Corbyn (not that I am either - Hobson's choice to some extent) and a lot of the pieces they have published have been somewhat ignorant of facts and statements.

Labour give an impression about being equally gung-ho (if a little reserved on a few issues) on cutting or ties with Europe as a number of the hard-liners in the Conservative party.

And I say that with regret - we have just achieved a larger European-based workforce (OK by 1 it is a small workforce) than UK.

1. #### Re: choice

Why choose the lesser evil; vote Cthulhu

2. #### Re: "How can you possibly do statistics on a guess!?"

"well, the Guardian uses (I think) ICM who were consitently giving the highest Concervative lead "

The Guardian is basically Lib Dem these days; telling the faithful that voting Labour was a wasted vote is no skin off their noses.

2. #### Re: "How can you possibly do statistics on a guess!?"

I think the response is "How can you possible do statistics without a guess!?". Given that there are no comprehensive models of the world, and practically nothing is truely independant, you always assume something, whether you realise it or not.

1. #### Re: Given that there are no comprehensive models of the world...

I think that's the key point. Everyone brings a prior (guess). The frequentists insist that the only legitimate prior is one that expresses total ignorance. The Bayesians are willing to start from somewhere else. Once enough evidence actually turns up to make the prior unimportant, both parties agree. Until then, you don't actually have enough evidence.

5. #### I am Bayesian ...

probably, but certainly not religiously

6. But, in the example given don't we also have to take into account the effects of those false results ?

A false positive : Some pool innocent gets shoved into the quarantine hospital room/jail cell/pit (depending on the stage of the overall infection) until they are eaten by the other recently turned.

A false negative : soon-to-be zombie killer goes back to their family to consume them at a later date.

How is that figured out when using stats to save the world ?

1. #### Now we're into Decision Theory

It's the utility of the outcomes that will underpin your decision, often this makes the Bayes/Frequentist views entirely irrelevant...as observed.

2. //How is that figured out when using stats to save the world ?//

good question. I think stats would help.

Let us assume we have a certain amount of money to spend on the issue, which essentially can be converted into putting people into isolation, quarantine and/or providing good separation between arbitrarily sized groups in the community (i.e. building walls and guarding access-gates). Plus further testing beyond the initial screen.

Depending on the relative costs of those, and the estimate of the infection rate (which we can quickly obtain given the known test error rates, once the population screen is complete), and the cross-infection rate (how many latent cases an infective zombie causes) an equation could be derived to optimise the number of people saved.

1. Let us assume we have a certain amount of money to spend on the issue, which essentially can be converted into putting people into isolation, quarantine and/or providing good separation between arbitrarily sized groups in the community (i.e. building walls and guarding access-gates). Plus further testing beyond the initial screen.

Ahh you don't fool me - I've seen all of the Zombie films and the inevitable consequence of locking them out....

2. I recall someone saying that there are lies, damn lies and statistics and anyone that relies on statistics is a fool.

In the real world t is better to rely on actual facts (numbers) rather than some guesswork based on assumptions that mat not be provable.

1. //In the real world t is better to rely on actual facts (numbers) rather than some guesswork based on assumptions that mat not be provable.//

Well obviously. If you have the data you want, you don't need statistics. Unfortunately, the real world is not always so obliging.

If you don't have the information already, what are you going to do - give up?

2. "In the real world t is better to rely on actual facts"

The real world doesn't work like that. Note that you missed out a letter which is almost certainly "i" to make "it". All readers of your missive will have had to apply some form of deductive (probability based) reasoning to fill in the gap. Those with a shaky grasp of English may have even got it wrong. That was easy to correct but this error is nearly parse-able without correction to get a different result than that which you intended:

"based on assumptions that mat not be provable"

1. @gerdesj

I not sure why he's subbing in a variable. He appears to have more than one real world.

7. #### Both methods have merits which are mathematically based.

So I can only think the arguments between the two sides are to try and confuse the punter to ensure they dont get far enough into the subject to realise when they are on the wrong side of a dutch book.

8. Statistics acan mean whatever you want them to, if you select the right test, right sample size and right sample.

So does it really matter which camp you live in?

1. "Statistics acan mean whatever you want them to, if you select the right test, right sample size and right sample."

That basically negates the entire point of statistical analysis, so no. What you are describing is more or less "anecdotal evidence".

9. #### Not an example of Bayesian inference

The example given has nothing to do with "Bayesian" logic. The author doesn't seem to understand the difference between Bayes theorem, a mathematical result which no one disputes, and Bayesian inference, which is a controversial stance in philosophy of science and stands in contrast to frequentist inference.

No frequentist would dispute the logic of the given example - there is nothing objectionable to a sane frequentist about using population frequencies and Bayes theorem in this way. Nor is there anything wrong with using your best guess of the population frequency if you don't have it exactly (though you can do much more nuanced stats than a single guess, there's no need for a short article to go into that).

The controversy is over the extent to which one can equate "belief" and "probability". It's a thorny topic and this comment is long enough already.

The author is right this is a religious war that will bite you if you jump into ML or stats, but yet another "explainer" from someone who doesn't really understand the subject isn't much help.

1. #### Re: Not an example of Bayesian inference

Errr, I think the author makes it very clear that he does understand the difference. The article opens with a discussion of conditional probabilities which, he makes clear, are related to Bayes Theorem.

You title your comment with “Not an example of Bayesian inference”. Where in the article does it say it IS an example of Bayesian inference? Answer - it doesn’t. The word inference is never used. The article is more generally about the Bayesian World.

Strangely, after that, your views seem to mirror his all the way through.

You say “No frequentist would dispute the logic of the given example - there is nothing objectionable to a sane frequentist about using population frequencies and Bayes theorem in this way.”

He says “Now I hope I have convinced you that …. no one would be stupid enough NOT to take it into account if they knew what it was because it clearly makes a difference.”

So you agree there.

He goes on to say “However, a dyed-in-the-wool frequentist would say, "But you don't know the actual number. How can you possibly do statistics on a guess!?”

You say “Nor is there anything wrong with using your best guess of the population frequency if you don't have it exactly.”

Violent agreement there as well. You both agree that most people WOULD use the guess. You say that the “sane” frequentists would use the numbers (implying that some insane ones would not) and the author makes the point that there are some zealots out there who would not if they don’t have the exact figure.

Finally, you say that “The controversy is over the extent to which one can equate "belief" and "probability".

He says “If you read more about the frequentist and Bayesian views of the world it turns out that they diverge much further and the debate becomes much more of a philosophical one about how you view the world.”

I don’t think you have presented any information to suggest that he doesn’t understand the topic; the two of you seem to have essentially identical views.

10. Bayes Rule is uncontroversial and unobjectionable to Frequentists.

What the latter object to is applying the Rule to the parameters of a probability distribution so that, say, the mean of a normally distributed random variable now becomes a random variable itself rather than a real albeit unknown constant. Welcome to Bayesianism.

Then the prior can only be interpreted as degrees of belief in its different values; a deviation from objectivism and pure empiricism represented by the undiluted information contained in a study sample. Since many forms of prior make the maths intractable, it is traditional to use a conjugate prior so that the posterior distribution can be more easily calculated, In other words a fudge on top of a fudged method.

Of course we only hear of Bayesians successes though Nate Silver's recent poll forecasting would suggest that what used to be called inverse probability is due a more sober press.

11. #### Monty Hall

Monty Hall tells his contestant that there are three curtains and behind one is a car and behind two are goats. The contestant chooses one curtain and Monty hall opens another curtain revealing a goat. Should the contestant switch?

## POST COMMENT House rules

Not a member of The Register? Create a new account here.