Let’s continue our discussion of “Athletics and Alumni Giving Evidence From a Highly Selective Liberal Arts College” pdf by Jessica Holmes, James Meditz and Paul Sommers (HMS). Today, my focus is on the claim that hockey winning percentage at Middlebury leads to larger alumni gifts. See their Table 6. I have already demonstrated that using hockey championships as the independent variable is stupid because every year, bar one, after 1995 is a championship year. But, as Rory points out, HMS also show significant results when trying to predict donation amounts (but not donation rates). Yet this result is just as flawed. Here (pdf) is raw data on hockey’s record. (I am assuming that the year 1996, say, means alumni giving through June 30, 1996 and the hockey team’s record for the 1995-1996 season.)

Consider:

The entire positive relation is (almost certainly) driven by the 1994 outlier year, visible in the lower left. (The line is simple least squares.) If your result changes when just one year out of 15 is deleted, then your result is junk.

1) This aggregate approach is not the same as the individual/gift/year model that HMS actually use, but it captures the central flaw in their result. Instead of aggregating all the data in 1994 into a single mean (as I do in this chart), they have 20,000 or so observation for 1994. Yet the effect is exactly the same. That year (like all years in the early 90s) featured lower than average giving. It also featured an anomalously horrible hockey team. Take away that year, and the result probably goes away, even with their huge panel.

2) Another way to see the problem is to drop 1994 from the analysis and recreate the same chart.

There is no relation between hockey winning percentage and average donation size once we drop the outlier 1994 results from the picture. If anything, there is a small (and statistically insignificant) negative correlation.

Summary: The central problem with this paper is not that correlation does not prove causation. That is an issue for all non-experimental work! Instead, the central problem is that HMS have no good evidence of correlation. Variable 1 (championship seasons) fails because they all occur in the second half of the data. There is no (meaningful) variation beyond that. Any variable that is TRUE for post-1995 and FALSE before that will show the same result, even gibberish items. Variable 2 (winning percentage) avoids this problem because it varies over the entire time period but, outside of 1994, there is no correlation. Higher winning percentages are not associated with higher donation amounts. The 1994 outlier drives everything. And, if you result changes when a single year out of 15 is dropped, then your result is useless.