Professor Alan White had this interesting comment on my proposal to make student course ratings public, either for all faculty or for certain subsets (i.e., tenured, highest rated).

I take there to be (at least) two statistical-significance issues relevant to the question of whether, particularly at small liberal-arts colleges like Williams, evaluations should be made public. One is that enrollments in at least many and perhaps (I don’t know) the majority of courses taught at Williams are low enough that (we’re told by our statistician) the results of the evaluations aren’t statistically significant (my recollection is that statistical significance requires N greater than 20; all of our writing-intensive courses, all of our tutorials, and almost all courses taught in my department are smaller than that). A second is that it can be and has not infrequently been the case that scores can fall in the second quintile or the fourth quintile without their differences from the mean being statistically significant (this can also happen with first- and fifth-quintile scores, though I have no clear recollections of having seen that happen).

I don’t think that your statistician (Chris Winters? Dick De Veaux?) has made himself correctly understood.

First, statistical significance is poor guide here. The issue is not: Does a t-test reject the null hypothesis at the 5% level of significance that student satisfaction with course X is the same as student satisfaction with course Y? The issue is: Does having access to student class ratings help students to pick between course X and Y? The answer to the first question does depend on sample size and is influenced by the narrow range of scores typically awarded. The answer to the second question is Yes, for almost any sample size. More data leads to better decisions even if that data is not statistical significant. (If that isn’t obvious, I can elaborate in the comments.)

Second, even if the numbers for a single question are not statistically significant, there are multiple dimensions on which students rate classes/professors. A class that gets 4s from most students in one category may not be significantly different than a class that gets 3s. But if the first class gets 4s in every category then the increase in N will probably reject the null hypothesis.

Third, even if you want to focus on a single measure, we have data for multiple classes and multiple years for the same professor (and multiple professors over multiple years for the same class). Consider Professor Sam Crane. Imagine that he gets high rating for a single small seminar. That, alone might not tell us much. But we should also have data for Sam from other classes and for other years. With this information (and assuming that it shows high marks across the board, as I bet it does), we can be fairly certain that Sam will receive excellent ratings in his small seminar next semester.

So (I take it): much of the “data,” including almost all relating to my own department, is essentially junk.

Totally false. And, in fact, I can prove it. Give me the data for your department for the last 5 years (even with the professor and class names anonymized) and I can make a simple statistical model which will predict, out of sample, what ratings the students will hand out next week. Now, of course, my estimates will have confidence intervals and some will be wrong. But, for what I know about the literature on this topic, professors/classes with high ratings in the past generate, on average, high ratings in the future.

If that’s correct, and if it is also correct (as I take it to be) that Williams professors work hard to teach well and regularly succeed in doing so, then I do not see how students would benefit by having access to the evaluations.

There are plausible reasons for not making this data public. It might not lead professors to change their behavior. (Laszlo Versenyi told us that he didn’t hand out course evaluation forms because doing do just frustrated him and the students.) It would probably make low scoring professors feel bad. It could lead to unhelpful scheduling problems as students made more of an effort (than they do already) to avoid low scoring professors.

But there is no doubt that students could use this information to select classes/professors that they would, at the end of the semester, rate more highly.

I would deem it unfortunate if a student drawn to the subject matter of a given course chose not to take that course because statistically insignificant evaluation results made it appear that the course was somehow mediocre — perhaps because it appeared to be of no more than average quality at Williams, but even if it appeared to be below average. Bottom line: it makes no sense to present pseudoinformation as though it were genuine information.

Again, this is not “pseudoinformation” and anyone who tells you this is misleading you. As a rule of thumb, Williams students are smart and use information wisely, especially if that information is packaged/explained appropriately.

Best plan: Try it out for a year. Restrict the data to just tenured professors and/or the top 25%. What’s the worse that could happen? The, after a year, re-evaluate. Prediction: The plan will work and be highly popular among students.

The more transparent that Williams becomes, the better an education its students will receive.