I am somewhat suspicious of the statistics in the Neighborhood Review Committee’s (NRC) Interim Report (pdf). (Previous discussion and associated links here.) Consider this passage:

Opinions on the desirability of diversifying dorms varied widely and were strongly predicted by demographic group, with heavier drinkers, athletes, men, and white students much less likely to value diverse dorm life than women, non-drinkers, non-athletes, and minority students.

There are other similar discussions in the Report, generally suggesting that certain groups of students feel very differently about topics like diversity (and implying that this is a bad and/or avoidable state of affairs). But how should we translate the words above into the numbers that the Committee refuses to share with us? My thoughts below.

1) Don’t trust the results of any study in which the key numeric data is hidden from you. In this case, there are two levels of hiddenness. First, the Committee refuses to release (to anyone?) even the summary data that it has collected and presented (to whom?) in its Appendices. How can a student at tomorrow’s forum have an informed opinion on the Report if she can’t read the Appendices? Short answer: She can’t. Second, even if we could see the tables we should also be able to analyze the raw data. The College collected nothing secret or embarrassing. There are no student names or identifiers. Replication is the sine qua non of seriousness in empirical social science. For example, why is there no copy of the survey? How were the questions phrased? How were they ordered? Without that basic information, there is no way to know if the results should be taken seriously. Consider the best practices suggested/required by the American Association for Public Opinion Research.

Excellence in survey practice requires that survey methods be fully ddisclosed and reportedin sufficient detail to permit replication by another rresearcher andthat all data (subject to appropriate safeguards to maintain privacy and confidentiality) be fully documented and made available for independent examination. Good professional practice imposes an obligation upon all survey and public opinion researchers to include, in any report of research results, or to make available when that report is released, certain minimal essential information about how the research was conducted to ensure that consumers of survey results have an adequate basis for judging the reliability and validity of the results reported.

Why isn’t the College more transparent on these issues? Some sad combination of arrogance, defensiveness and a pathetic insularity.

2) How should we understand the phrase “strongly predicted?” The first time I read it, I assumed that the Committee was talking about meaningful differences in opinions, on the order of 20% or more. For example, I could easily imagine that only 30% of men (compared to 60% of women) thought that diversifying the dorms was a good idea. A phrase like “strongly predicted” would be consistent with such results.

But Will Slack ’11 wrote:

As I have commented to the NRC, I do not see support in the actual survey data for HWC’s conclusions. That responses differ by a few percentage points does not a polarization make.

A few percentage points?!? Just what is going on here? There is no (reasonable) manner in which, say, a 51%/55% male/female split on “the desirability of diversifying dorms” could be described as “varied widely” or as “strongly predicted” based on demographic variables.

Now, it could be that Will is wrong. (And he may not have permission to share his knowledge with us or, for that matter, with his fellow students.) And it could be that Will is referring to other aspects of the Report. But there is just no way to square the language of the Report with Will’s description of the numbers.

3) Even if we could read the Appendices and even if we could agree that the summary tables match the language of the Report, we still face the problem that, as best I can tell, the statistical analysis was — How to put this kindly? — not overly sophisticated. If they would release the data, than I (or someone else at Williams, like the dozens of students in STAT 101/201 looking for a project) could do a proper analysis. That analysis might generate the same answers that the Report does, but there is no way to know without checking.

Consider a simple example. Assume that 40% of all Williams students are in favor of “the desirability of diversifying dorms.” (Assume that the question is just Yes/No.) Yet there is a significant gender split: 20%/60% male/female. Nothing wrong with that, men and women differ in their opinions.

Now, assume that gender is the only thing that matters. No matter how you split up the male population (by drinking, athletic participation or race), a constant 20% is pro-dorm-diversity. Female opinion displays the same constancy.

But, (self-reported ) athletic participation and alcohol consumption skew much more male than female. So, if you split the population up into athletes and non-athletes, you will find a dorm-diversity split that has nothing to do with athletics per se but that is driven by the differential rate of male versus female athletic participation.

In other words, it could be that drinking, athletics and (maybe) race have nothing to do with anything. Without more information (mainly cross tabs among the various categories) there is no way to know.

Summary: The Report, as I previously noted, is an excellent piece of writing, but its statistical reporting is poor. Without more transparency and openness with regard to its numerical results and raw data, there is no way for any outsider to know if it is accurate. Williams deserves better.

Print  •  Email