Abstract
Scholarly debate about student evaluations of teaching (SETs) often focuses on whether SETs are valid, reliable and unbiased. In this article, we assume the most optimistic conditions for SETs that are supported by the empirical literature. Specifically, we assume that SETs are moderately correlated with teaching quality (student learning and instructional best practices), highly reliable, and do not systematically discriminate on any instructionally irrelevant basis. We use computational simulation to show that, under ideal circumstances, even careful and judicious use of SETs to assess faculty can produce an unacceptably high error rate: (a) a large difference in SET scores fails to reliably identify the best teacher in a pairwise comparison, and (b) more than a quarter of faculty with evaluations at or below the 20th percentile are above the median in instructional quality. These problems are attributable to imprecision in the relationship between SETs and instructor quality that exists even when they are moderately correlated. Our simulation indicates that evaluating instruction using multiple imperfect measures, including but not limited to SETs, can produce a fairer and more useful result compared to using SETs alone.
Acknowledgments
We thank Elizabeth Barre, Joshua Eyler, Bethany Morrison, Fred Oswald and Arthur Spirling for helpful suggestions and comments related to this project.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Funding
Notes on contributors
Justin Esarey
Justin Esarey is an Associate Professor of Politics and International Affairs at Wake Forest University. His area of specialization is political methodology, with a particular interest in hypothesis testing and the scientific ecosystem.
Natalie Valdes
Natalie Valdes is a student at Wake Forest University. Her research interests include Methodology and Comparative Politics, specifically the interactions between gender and politics.