Stu Radnidge

Data, Infrastructure, Node.

January 6, 2013 at 12:27pm


Sensationalist Correlation

As much as I hate to start the year off on a kind of negative foot, I can’t let this recent article, which proposes a causal link bewteen lead exposure and crime, go unexamined. The author, Kevin Drum, first dismisses prior analyses of social phenomena as being “purely correlative”, then goes on to present a causal explanation using… pure correlation, primarily from the work of Rick Nevin.

Going by the number of tweets that appeared in my timeline, apparently there were many in my circles who didn’t immediately see the folly.

I’m not suggesting for a minute that lead is anything but toxic to humans, but it is monumentally naive to suggest that X can be used as a predictor for Y in the absence of a causal link between X and Y.

Meteorology provides the best example of this that I can think of. When I am trying to make a decision about something that is heavily influenced by the weather, such as when and where to go snowboarding, I will invariably fall back to looking at historical data (“there is normally decent snow in the French Alps in February”) and make a prediction based purely on that. Which is fine - but I don’t consider my prediction anything more than a mug punt (a gamble, for those not familiar with the Australian / British vernacular ;).

But because it’s somewhat normal to do this, one might assume that meteorologists do something similar. Which would be wrong (and an insult to meteorologists). Meteorology attempts to predict weather patterns based on cause and effect, such as the effect of barometric pressure over Greenland on the path of Hurricane Sandy. It only uses correlation as a predictor when causality has been reasonably established, it doesn’t blindly use correlation alone.

But perhaps with greater relevance to the article in question, the practice of statistical analysis is not centered on proving a hypothesis with supporting data - it’s about using data to disprove alternative hypotheses. When you test something for significance, you can only make conclusions about the absence of other things.

For example, if your theory was that all swans are white you should spend your time looking for swans that are not white, as opposed to seeking out known populations of white swans and using that as supporting “evidence”. There is obviously a huge difference between saying “all known populations of white swans are white” and “all swans are white”, yet much of what we see in the mainstream media concludes the latter when in fact it can only say the former.

This is in effect what Nevin did. A correlation between lead levels and behaviour was observed in one geographical location of a western society. But rather than investigate causality, Nevin instead looked for more white swans. Have a look at the opening sentence of Nevin’s 2007 paper (emphasis mine)

This study shows a very strong association between preschool blood lead and subsequent crime rate trends over several decades in the
USA, Britain, Canada, France, Australia, Finland, Italy, West Germany, and New Zealand.

Do you really not see an obvious confounding variable there? It is of course not the only one - social behaviour in western society is not exactly uncomplicated. But the paper presents no discussion of potential problems with his analysis, and dedicates a cursory paragraph to investigations of the neurochemical effects of lead exposure.

But enough of the negativity - my intent with this post is not to try and discredit Nevin or Drum, nor am I trying to support alternative explanations of the same phenomenon such as those from Levitt or Gladwell. The lead correlation is interesting, and the article well written.

My intention is to educate, so that the next time something sensationalist appears in the media (“Global Warming Caused by Lack of Pirates”), you’ll think about causality versus correlation - the two should never be confused.