# 8 out of 10 cats fear statistics – AI doesn't have this problem

If statistics were a human being, it would have been in deep therapy all of its 350-year life. The sessions might go like this: Statistics: "Everyone hates me." Pause. Therapist: "I'm sure it's not everyone..." Statistics: "And they misunderstand me." Pause. Therapist: "Sorry, I didn't quite get what you meant there..." …

1. You can also include more data to get the answer you want if it's not "your" right the first time. Ask the question I suppose can be change the calculation method as sometimes there isn't just one set way of doing things.

There are many ways to fudge numbers if you know how and have a deep understanding of the data you are using.

You're right though people don't trust statistics because of politicians and that's not going to change any time soon.

1. In fairness to politicians at least in the Western world: they are generally representative of ordinary people. It is other people that people don't trust.

2. This post has been deleted by its author

3. I don't think we can blame this on "politicians". Everyone and her dog abuses statistics.

And there are many ways to fudge numbers if you don't know how, and don't have a deep understanding of anything. One of the problems of statistics is not just that it's easy to do them wrong, but that it's actually really hard to do them right.

1. I think you can blame politicians. In the U.S., most politicians graduated from law schools and shunned quantitative courses in their academic careers, yet they deal with national and international policies of profound importance like macro-economics, international trade and international economics, as well as military capabilities, all of which are based largely on quantitative assessments. As for, "Everyone and her dog...", the purpose of a representative democracy is not for the elected to represent the simple sum of constituents' opinions, if that's was best we'd have direct democracy, it's for putting people smarter than the average in power, to run the government.

2. #### Statistics rulez

Let's not forget that Nethack is almost entirely statistics, wrapped in a thin layer of "UI"...

(well, technically the code is probabilities, but the observed effects are stats)

3. #### Statistics actually started much earlier - think sums and means

I believe that ancient granaries and other warehouses didn't count every grain or small item; they sampled a cupful and then multiplied that sample by the number of cups (or whatever.)

The imperial foot was calculated by getting a group of men to stand toe-to-heel and then dividing the total length by the number of men. This is an "average".

The Seven Pillars of Statistical Knowledge is an excellent, short read.

1. #### Re: Statistics actually started much earlier - think sums and means

True, but use of the mean or average (only a measure of central tendency) to represent a population or sample, without citing a measure of variation, is itself a frequent misuse of statistics in that relatively few in a population or sample are "average".

4. Nice Car

1. Soon as I saw Pascal, I knew I was in trouble and my eyes glazed over.

Pretty car though.

5. #### The issue is a good percentage of the population doesn't grasp the concept of probability...

... otherwise they won't waste money in bets and other games. On the other side, the small percentage of those offering bets and other games understand it very well (and hire mathematicians to ensure it when they're not sure).

1. This post has been deleted by its author

Rats.

1. #### Re: Without statistics, there can be no self-driving cars, no Siri and no Google.

Yes, my first thought on reading that sentence was "presumably there's also an upside...."...

2. #### Re: Without statistics, there can be no self-driving cars, no Siri and no Google.

It's settled, then.

Statistics must die.

7. #### When is a car not a car?

"Google was 99 per cent sure my photo is of a car"

But was somehow only 98.5% sure that it's a vehicle. While I'm sure the code to figure these things out is horribly clever and complicated, I'd have thought one of the first and simplest parts would be to check if one category is a subset of another and take that into account when figuring the overall probability

1. #### Re: When is a car not a car?

The question is: when is a car not a vehicle?

If it were a toy car, for instance, this would not neccessarily be classed as a vehicle. Especially if it had non-moving parts.

1. #### Re: When is a car not a car?

How about a full size 2D picture of a car?

What about a car on a fairground roundabout? Would a passing car's AI recognise a fixed circular locus?

What would it do when confronted with an apparent herd of galloping horses - going up and down?

2. #### Re: When is a car not a car?

"If it were a toy car, for instance, this would not neccessarily be classed as a vehicle."

Why not? A car is a vehicle, a toy car is a toy vehicle. No matter what modifiers you add to "car", there is never a situation where the same modifier cannot also be added to "vehicle".

1. #### Re: When is a car not a car?

Depends on, amongst other things, how you define "vehicle".

If I define it as "something I can use to get myself from A to B", toys are not vehicles.

If I define it as "some sort of box with wheels", a Matchbox toy is a vehicle; just like my trolley case or the IKEA thingy the rubber plant in the living room sits on.

Tricky.

2. #### Re: When is a car not a car?

As Magritte would tell you, it's not a car, it's a picture of a car.

And I wonder what the self-driving car would make of this: https://www.google.co.uk/maps/@53.571895,-1.6610001,3a,15y,14.84h,76.54t/data=!3m6!1e1!3m4!1shgFY-Sgpy7aPBcFLra-3Tw!2e0!7i13312!8i6656?hl=en

It's not a sheep. It's not even a picture of a sheep.

8. In my experience the problem with applying Bayes Theorem is when someone has to decide on a weighting for something that is not clear cut. The probability works as long as the data doesn't break the constraints that were believed to operate on it.

Usually the people using the algorithm have no idea how the original weighting was decided. Therefore they believe it in situations where they shouldn't.

9. " It is approximately 2.8m tall. Humans (even with hats) of this height occur with a very low probability. The system will, for now, decide that this is not a human. "

However it is quite common in some festivals for there to be one or more people walking on stilts - or wearing artificial heads that increase their size and height well beyond human limits. A human knows that inside that rig is an actual person. Al training is all about context.

1. #### Statistics killed Jesus

a person might be carrying something or someone, like when they have a child on their shoulders, or they might be carrying a protest sign, or maybe even a wooden cross, in which case someone's dogma just ran over their karma.

1. #### Re: Statistics killed Jesus

I think that you'll find that it's "someone's karma ran over his dogma"

10. #### Oh the irony

Isn't it ironic that the article about misuse of the statistics uses one of the 6 incorrect interpretations of p-values, for explaining the statistical significance of chi square test. No, the p-value 0.043 does not mean that the probability of this particular data occurring by random chance is 4.3%. Google for "asa p-values" for ASA (American Statistical Association) statement on p-values. It lists 6 most common mistakes of p-value interpretation and the number 2 is used in this article.

1. #### Re: Oh the irony

And a Chi-square test should only be applied to a Chi-squared distribution.

I'll wager that a single day's data is an in appropriate sample size for one's new product line.

And that real world confounders like - were you advertising your new product before launch in lots of women's magazines?, or on TV where the viewership was skewed towards women? should be considered before reaching for the calculator or Excel (other spreadsheet and statistical software are available)

1. #### Re: Oh the irony

But the numbers calculatated in this case for a chi-squared test probably are more or less chi-square distributed.

If one assumes a binomial distribution of the raw figures. they deviate by two standard deviations from a 50-50 result. The probability of a deviation that size or more is roughly five percent.

2. #### Re: Oh the irony

I'll wager that a single day's data is an in appropriate sample size for one's new product line.

Was thinking very much the same. What was the first day of launch - a weekday (when more women are likely to be shopping during the day) or a weekend (when more men may be out)? There are claims that women have a tendency to buy stuff on sale and opening specials on impulse whereas men tend to take a bit more time (depending also on things like colour and so forth, not necessarily just the application - and the stress some people put themselves through over "do I go xbox or playstation?) (that's easy, both are made by evil companies, go PC all the way, and Linux.. Now, do I go Mint, or Debian, or CentOS, or Ububtu, no wait Ubuntu did that advertising/spying thing, or SuSE, or...Maybe Devuan to get away from systemd).

One day of sales data makes for poor sampling, and poor sampling makes for poor stats. I can see companies tweaking their advertising based on "more women brought this in the first day, therefore more women want it so we'll advertise to women" when it was the Friday before Fathers Day that the item went on sale, and it's a good looking but cheap Leatherman-like tool in a great looking pouch.

I'm not a stats person but even I know that one days worth of data would be a silly time to be analyzing what the data means!

2. #### Re: Oh the irony

Absolutely.

The way to look at this is to calculate the margin of error for this sample.

The sample size is 2692+2128 = 4820

We calculate the 95% margin for error as 1.96 * sqrt(0.5 * (1.0 - 0.5) / 4820) (footnote)

This gives 0.0141157044469341 which says that 95% of the time, the expected number of women will be within 50 +/- 1.412%. This translates to a range of 4820/2 +/- 68 people, or [2342,2482]. The value 2,128 is outside this range so all we can say is that using a 95% confidence interval, the assertion that males and females are equally represented (p=0.5) is not supported by the sample.

Chi-squared is slightly different since it's a measure of fit of a set of individual observations to the expected, but the above is effectively its application to the average case (ie, it ignores the spread of individual samples). Neither provides a measure of how unrealistic/unexpected the result [set] is, as Vaidotas Zemlys has pointed out.

footnote: http://www.dummies.com/education/math/statistics/how-to-calculate-the-margin-of-error-for-a-sample-proportion/

11. #### The purpose of statistics

a) To describe a population clearly (be that theoretical or sampled)

b) To compare populations meaningfully (and thus make value judgments)

If nobody understands, then (a) has failed.

If a conclusion is not supported by the statistics, then (b) has failed (and probably (a) too).

i.e. The problem is not with statistics. It is with bad statistics.

1. #### Re: The purpose of statistics

My favorite bit of statistical nonsense is a study that went out of it's way to obscure any relevant information that might be meaningful to a normal person seeking information on the topic in question. They basically cooked the numbers in order to make any useful distinctions disappear.

2. #### Re: The purpose of statistics

And what is the definition of bad statistics? Or, rather, what is the definition of statistics? You only say what its purpose is (are, in fact, as you list two). We can then understand what is the difference between statistics and bad statistics, and where the study went wrong. In fact, it is nothing wrong with the study - it obeys statistics. We just forget (and forgive) the fact that s a medicine works in 99.999% of the patients, but it does not work on you, you will 100000% die (if we talk about an antibiotic to treat your flesh eating bacteria).

Let us try rather understand what could be the conclusions we draw from stats used to interpret weather data and those used to interpret data that will determine the percentage of people who will get a certain disease. If the weather prediction is wrong, the consequences could be: take an umbrella on a sunny day (mild annoyance), all the way to putting you in the middle of the twister. Ouch! If the data on disease say that only 0.02% of the population will get that, then you are at a low risk, indeed, of getting the disease. But if you got it, there is a very high likelihood that there will be no studies and dedicated medicine for it. Ouch again! So is this the fault of statistics? I cannot blame statistics, because there, on the first page of the book I have, it says clearly that winning the lottery is not guaranteed. So, use it with caution and with a clear understanding of what the interpretation of the data may be. I'd say, if we engage common sense, we'll be fine.

12. Four out of five doctors surveyed agree!

13. "From then on it became increasingly popular for people to use statistical evidence rather than violence to support their arguments."

Perhaps that's the source of the animosity toward statistics? It isn't quite as much fun to prove foul villainy upon the spreadsheet of one's opponent.

14. #### Cat-food

I remember hearing that before.

Thing is, I always interpreted the updated message as "of those that responded to the survey"* when of course, they are discounting everyone who said "my cat doesn't exhibit a preference to a brand"

So even the "clear" explanation can be mis-interpreted

*note; I was a lot younger and much less of a sceptic at the time

15. #### Stats is hard

Many of the techniques used in statistics can be mastered with practice, even by a social sciences graduate if they are willing. But the proof of these things is often extremely difficult: for example, that a binomial distribution with large numbers tends asymptotically to a Gaussian distribution.

Stats therefore becomes a memory test, since it is difficult to re-prove a theorem that has slipped one's mind.

1. #### Re: Stats is hard

Actually it is not that hard. Either use Stirling's formula, or method of characteristic functions. The proof of CLT for iid variables uses only few tricks and thus it is not hard to re-prove. I suspect any active mathematician can reprove all the theorems from the undergraduate courses in mathematics. Math.stackexchange.com is a living proof of this.

1. #### Re: Stats is hard

El Reg told us that "Machine learning is hard. And sometimes deeply offensive" - I believe it- how can you not to, if it was said here? :)

16. I have to admit failure here, I simply can not comprehend the dice example, how 6 and 6 would be any less, or any more likely to come up than 6 and 5, or any other combination?

1. There are 12 sides in total - six on each die.

There are only a total of two faces, one on each die, that can produce a 6 - 6 combination..

There are a total of four faces - 6 and 5 on each die - that can produce a 6 - 5 or 5 - 6 combination.

So there are twice as many chances of a combination of a 6 and a 5.

2. I think it's the way it's written that causes confusion.

Roll die one: 1/6 chance of being a 6.

At this point, the chances of 6-6 are indeed the same as the chances of 6-5.

The article isn't however exploring that scenario. It's treating 6-5 as meaning either of the dice shows the 6, and the other shows the 5.

So 6-6 needs die one to be a 6, and die two to be a 6.

6-5 needs die one to be a 6, two to be a 5

5-6 needs one to be a 5, two to be a 6

Thus the combination of 6 and 5 has two ways in which it can be reached, which is twice as many as the ways of rolling two 6s.

3. "I have to admit failure here, I simply can not comprehend the dice example, how 6 and 6 would be any less, or any more likely to come up than 6 and 5, or any other combination?"

Because it can be either:

Dice A: 6, Dice B: 5

or

Dice A: 5, Dice B:6

1. I shall throw my hat in the ring and see if i can explain it a different and hopefully simpler way...

Look at how many ways you can get to your total

12 - 6+6

11 - 6+5 or 5+6

04 - 2+2 or 2+2 or 3+1 or 1+4

07 - 6+1 or 1+6 or 3+4 or 4+3 or 5+5 or 5+2

08 - 6+2 or 2+6 or 5+3 or 3+5 or 4+4 so a lot more likely!

No , wait this could be easier . Lets just look at how many ways dice one could land and you could still make your total with the second dice , (at 6 to 1.)

To get a 12 D1 needs to be 6

To get a 11 D1 can be 6 or 5

To get a 04 D1 can be 1 or 2 or 3

To get a 07 D1 can be 1 or 2 or 3 or 4 or 5 or 6

To get a 08 D1 can be 1 or 2 or 3 or 4 or 5 or 6

I therefore conclude the odds of getting a 7 are the same as an 8. Is that true? im pretty confused now!

17. #### 8 out 10!

Eight out of ten owners who expressed a preference said their cat prefers it."

And that might not be entirely accurate either - it might really be "Eight out of ten owners who expressed a preference said their cat prefers it to not eating." e.g. it's so bad that 20% of the cats would rather commit suicide than eat it.

18. #### Stats abuse

20% of car accidents are caused by alcohol.

Hence 80% of accidents are caused by sober drivers.

Stay drunk and live.

1. #### Re: Stats abuse

20% of car accidents are caused by alcohol.

When we used to have the "bad weekend on the roads" stuff reported (or rather, when I could be bothered paying attention to the trash that passes itself off as NZ's equivalent to sewer tabloids news media), I often wanted to know other stats. Eg you could have a weekend where 27 people were killed (average being 5-6) - but they were killed in a bus that was hit by a landslide that nobody could've seen coming, and that being the only accident for the period. I figured out probably before I was 12 that what could be a much more useful stat is the number of accidents and numbers of injuries. And when fatality rates started dropping significantly in the 2000's, was it due to safer driving or much more likely to things like airbags entering the cheaper cars? Given the state of NZ's "world leading driver education and licensing" I'd place my money on "kiwi drivers are still idiots, but their cars help protect them".

Stats without context are just as bad as stats made up on the spot, at least 98.775% of the time.

19. #### The Goodies had it best...

9 out of 10 doctors recommend this product. Mind you, we had to search a bit for the right 9 doctors...

1. #### Re: The Goodies had it best...

"9 out of 10 doctors recommend this product. Mind you, we had to search a bit for the right 9 doctors..."

I recall Alan Freeman delivering the line "Four out of five can't tell the difference between Stork and butter".

a) it was hard to tell on the tellies of the day how much Stork or butter was spread on those bits of bread handed out. Possibly no more than a smidgen.

b) some years later I came across mention of an organic compoundr (began with a T?) which only 20% of the population could taste. I did wonder if a similar chemical was present in Stork.

20. I don't understand why the chi squared test is the appropriate test for the women and men buyers. A quick bit of searching hasn't made it any clearer.

I'm no expert, but it seems to me this corresponds to the binomial distribution. The question would be: what is the probability that at least 2,262 buyers out of 4,390 are of one gender if each gender had a 50% chance of buying? Using the formula for the c.d.f of the binomial distribution, the answer is 4.15%. Here's the formula, http://www.wolframalpha.com/input/?i=2*(1-sum+k%3D0+to+2262+of+(4390+choose+k)*.5%5Ek*.5%5E(4390-k))

The article says 4.3%. I'm guessing there's a deep reason it's close, but it calls in to question the use of the chi squared test. Why was it selected over others? How should one decide what test to use? How can one check their answer if they're not sure? This is exactly the problem people have with statistics. Hand wavey, monkey see monkey do processes that people don't understand the idea behind. You can treat statistical program functions like black boxes, but then what black box do you choose?

21. #### If statistics worked...

I would have long played the lottery or the stack market, won, and went back to work for free without having to deal with managers. No, don't tell me that I can just work for myself :) Leonardo Da Vinci was (still is) unique.

The mighty pdf (probability distribution function) has us for breakfast everytime we "predict" without data and without knowing how the data looks like.

22. #### 2 sigma

The author states that statistics are simple and are misunderstood by many. And the same article, subtitled "Use and abuse of figures", states that a probability of 4.3% is "small" and thus the hypothetical enterprise can confidently conclude that its new product appeals to women more than to men. How unfortunate...

This is quite typical, actually: anything that can be stated with a confidence level of more than 95% - "2 sigma" in statistical parlance, meaning more than two standard deviations (of a normal distribution) from the mean - is deemed "significant".

Well, here is some really simple intuition. Let's say the hypothetical company from the article intends to make the same observation daily. How often can it expect to see a difference between men and women of more than 2 standard deviations under the "null hypothesis" that the product is equally attractive to both sexes? What do those 4.3% really mean? Well, if the stores are closed on Sundays then you would expect to see such an outcome (or an even larger difference) about once in 4 weeks (a bit more frequently, statistically speaking). If the product sells 7 days a week then it's closer to once in 3 weeks (4.3% = 1/23.26). The probability that it happened on the first day they made the count does not look so small when you phrase it like that. If you make not daily but hourly observations you will see deviations larger than 2 sigma every day.

No one in sciences regards a 2 sigma result as significant. For a confident statement one needs 5-6 sigma. If our hypothetical company make daily observations they will see a 3 sigma outlier about once a year, a 4 sigma outlier twice in a lifetime. Such outliers simply do happen by chance.

And all that depends on the unmentioned assumption that your deviations from expectation are normally distributed, which is often a good assumption for systems in equilibrium in natural sciences but not where human activity is concerned. A normal distribution falls off very sharply indeed (exponentially) so outliers are rare. Random variables related to human activity, including economics, often have wider, sometimes power-law distributions, and the chi2 p-values will significantly overestimate the confidence.

Funnily, it is in the natural sciences like physics where 2 sigma results are not considered significant. In non-scientific fields (including medicine, in my experience) 95% confidence is deemed significant almost universally and experiments and surveys are often specifically designed with 95% in mind.

Sigh...

23. "It is true that the equations become more complex but, as a non-mathematician, you can simply accept that the statisticians know what they are doing and use the tool without having to follow what the equations mean."

Trust me, I know what I'm doing. Here, hold my beer while I do this.

Yeah, great pitch, but nope. Unless you know how the raw data was collected and how it was processed, any statistic has to be taken with a ton of salt. And if we are talking potentially lethal applications* there has to be accountability. So show me the data. And what you've done to it.

* Not much harm in Siri sending me to the wrong restaurant. Two tonnes of metal, glass and plastic hurtling down the high street in the wrong direction at full speed, however...

24. Anyone else notice a problem with the probabilities on the image match?

The article itself points out the car is not really vintage, but that's OK, that was only 88% confidence. That's not the problem. The problem is 99% car, 97% land vehicle. Since all cars are land vehicles*, it cannot by any reasonable definition be "more likely", a car than a land vehicle. I'm sure there are perfectly good reasons for these numbers, but it's another example of why "trust the tools, don't worry about how it works" might not be good advice.

*It's 2017, where's my damn flying car?

1. "It's 2017, where's my damn flying car?"

... he wrote from a 3 inch by 6 inch pocket computer instantaneously to subscribers worldwide using only his right thumb

25. #### 9 out of 10 cats meow.

survey one cat, over the course of its nine lives, and once more when it gets back from the taxidermist...

26. If you want to see creative misuse of statistics, try this -- look at the graphic lower down if nothing else...

http://www.bbc.co.uk/news/education-40043891

27. #### Oh the Irony of the Irony

You are saying that the author is wrong and you are saying that the figure 4.3% DOESN'T represent the probability of this event occurring by chance.

I think you are incorrect and that you are slightly misrepresenting what the ASA said. In the interests of accuracy, what it actually said was:

“2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.”

It seems to me that the author was using the P value to make a statement about data in relation to a specific hypothesis (that the original population had a 1:1 ratio of males to females).

But the good news is that we don’t have to argue about it. Indeed, doing so is pointless. Why on Earth argue when we can actually test it and find out who is correct? We can actually measure the probability that we will find 2,262 or more of one gender in a random sample of 4,390 taken from a balanced (1:1) population. Then we can compare it to the 4.3% given by the Chi Squared test.

How? We can use a Monte Carlo simulation. We can set up a population with equal numbers of males and females. We can then take a sample of 4,390 people at random from that population and count the number of females that we get. We can see if the number of females in our sample is equal to, or greater, than the one observed (2,262). We can then repeat that sampling over and over again (say, 1,000,000 times) and see how often, on average, we get a deviation as large (or larger) than 2,262.

Monte Carlo simulations do not guarantee to give exactly the right answer but, if you run them often enough, they should give an answer that is close to reality. The more often you run them, the closer (on average) they should get.

This is really simple to do. You can write your own code and test it for yourself. Or you can use mine (which is in R) and check it out. If I have made any errors, I apologize. Please do correct them and try it for yourself. When I ran this code the answer approximated to around 4.47%. This suggests to me that the author was correct in his statement. (Bear in mind that Chi Squared never promises to give an EXACT figure, all probabilities are estimations.)

PReached=0

startingset=c(1,0)

People=4390

Runs=1000000

X = 1:Runs

for(i in X) {

for ( i in People ) {

Foo=sample(startingset,People, replace=TRUE)

}

Baa=sum(Foo)

if (Baa > 2261) {

PReached = PReached+1

}

}

PReached=PReached*2 # to allow for both ends of the curve

Prob=PReached/Runs*100

Prob

## POST COMMENT House rules

Not a member of The Register? Create a new account here.