Reply to post:

Eggheads have found a positive link between the number of racist tweets and the number of racist hate crimes in US cities

TRT Silver badge

I use the term "causal relationship" in the sense that it is used by just about everyone in science, engineering, etc

That being:

A causal relation between two events exists if the occurrence of the first causes the other.

The first event is called the cause and the second event is called the effect. Correlation between two events, variables or other measures does not imply causation. However the reverse is not true: if there is a causal relationship between two variables, they must ipso facto be correlated.

The very point of the example given was that as there is a correlation between drownings and ice cream sales, then the two events may both have causal relationships with a third event or variable. This same example set of data provided a whole year's worth of lectures, believe me! You start bringing in other data which may or may not be causally related, and you look for co-variance. This is all what these "AIs" like Watson do - look through massive data sets trying to identify co-variant relationships between stuff. You look at the dates on which these data were recorded. You plot them out on a timeline and see that both variables have some hint of a periodic cycle. The most obvious cycle is annual, so you look at "day of year" and find there's some correlation, some evidence of co-variance, but it's not staggeringly significant; it doesn't explain all the variance. You try it by month, and you get a much better figure. You try it by week of year and the correlation drops. You then correct for the day of the week by synthesising a value like first Saturday of Month n, second Friday of Month n. You then start lumping... Fridays in July, Saturdays in August...

This starts explaining the variance.

You then do something radical and expand your data set. You look at total visitor numbers, if such a figure exists. Damn, no-one was sat on the top of the Tower with a pair of binoculars counting people on the beach and people in the sea... so you take a proxy measure... the guy with the hand clicker at the piers turnstile. Woah! There's some kind of correlation there... But it doesn't explain everything still. So you look at the weather record, and you find there's a 99.999% explanation for ice cream sales linked to sunshine and high temperatures. But that only explains 87% of the drownings, but extreme BAD weather can explain another 12% of those.

So eventually you reach a point where you have all these correlations, and you are fairly certain that ice cream sales and drownings increase when the weather is fine and hot, when there are more people visiting the seaside, when it's a weekend, but that ice cream sales and drownings are negatively correlated when the weather is poor.

None of this actually gives any proof whatsoever for causality. It's reasonable to say that good weekend weather in the summer causes people to buy ice creams and to swim in the sea, and that swimming in the sea is the cause of some drownings. But there's still no proof. You'd have to conduct a controlled experiment to do that. You'd have to vary ice cream sales and monitor drownings in three different places or at three different times - one you leave as is, one you give away ice cream, the other you close the ice cream shop. You change one variable and see if the other changes.

Lo! Changing the amount of ice cream consumption on the coast does not affect the number of drownings. There is no causal relationship.

What experiment could you do to test the other correlations? I suppose you could ban swimming, close the beaches. Then visitor numbers could vary and the drownings wouldn't follow. Or force people into the sea - drownings go up. That just proves the causal link between swimming and drowning.

Another idea would be to see if this is a special case for this town, or a general case for all seaside town in the UK, Europe, Worldwide...

Or you could just accept that you cannot prove causality, but you have a credible explanation for causality, which is enough for a working hypothesis, but one must be open to evidence countering that explanation.

Exactly the same with Social Media and Crime with Racial Aggravation. One does not necessarily cause the other, and the only way to PROVE that one way or the other is to manipulate the Social Media feeds and observe the outcome. Now, as if ANY company with any shred of socially responsibility would undertake such an unethical experiment...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon