Friday, November 27, 2009

Simpson's paradox

Averages seem like simple things, but not always. Consider this simple baseball example:

Tony and Joe are competitive friends and so they compare batting averages. At the All-Star break, Tony is batting .300 and Joe is only batting .290. Joe mentions that batting in the second half of the season is more important, and so he and Tony agree to compare their batting averages for the second half of the season (and only the second half). When they finally meet, it turns out that Tony batted .390 in the second half of the season. Joe did better, too, but only batted .375. Tony wins both halves of the season.

Question: who's batting average was higher for the entire year? Turns out we don't know, and it could very easily be Joe! (I'll post an example later)

What the paradox states is that averages for subgroups can demonstrate relationships that are inconsistent with averages for different subgroups or the overall averages. So Tony could win both halves of the season, but have a lower batting average for the entire season.

I do not know the details about Climategate, but it is very interesting to me, with all the statistics. Are average temperatures going up or down or whatever. Here's an article that about midway through mentions this batting average paradox. I wonder if I have a new example of the paradox involving averages.

No comments: