Teaching the Difficult Past of Statistics to Improve the Future

Abstract In recent years, the discipline of statistics has begun reckoning with its difficult history. Institutions are reconsidering names that have honored key historical figures in statistics who have deep ties to eugenics movements and racial and class prejudice. These names, however, continue to appear in our classrooms, where we teach the methods created by these individuals, raising the question of how instructors should address their legacies. Three examples of famous statisticians and their work—Francis Galton’s use of conditional probabilities to demonstrate “hereditary talent,” Karl Pearson’s attempt to quantify the intelligence of Jewish immigrant students, and Ronald A. Fisher’s creation of the analysis of variance to de-emphasize environment in human development—highlight the intimate ties between statistics and eugenics. These examples, along with a discussion of the context of these men, eugenics movements, and the statisticians and scientists who opposed their eugenic programs, can humanize the field for students, teach them about the challenges in accurate and unbiased data collection and analysis, and connect historical mistakes to contemporary ethical issues. Confronting this history in the classroom can both improve the teaching of the statistical methodologies themselves and begin a broader conversation about the role of statistics in the world. Supplementary materials for this article are available online.


Introduction
Statistics, like many other disciplines, is facing a reckoning with its history.Among the topics of controversy: the Galton Lecture Theater and the Pearson Building (Langkjaer-Bain 2019), an image of Francis Galton and a department URL bearing his name (The University of Chicago Department of Statistics 2019a, 2019b), a prestigious statistical award named for Ronald A. Fisher (Tarran 2020b), and a stained-glass window showcasing Fisher's famous Latin Square (Evans 2020).These sources of recognition, among many others, have recently been debated, renamed, or removed amid discussion of the eugenic ideas and advocacy of their namesakes, as well as their more explicit racist and colonialist statements (Langkjaer-Bain 2019; Tarran 2020a).To redress these and other instances of historical and contemporary racism and imperialism, many scientific, research, and educational institutions have committed to antiracist principles, and now face the difficult task of fulfilling those commitments (Nobles et al. 2022).In statistics departmentsamong others-this work must include confronting the racist and discriminatory consequences of eugenics and race science, which included sterilization and institutionalization and formed a precursor to Nazi atrocities (Clayton 2020;Evans 2020).As the statistics profession and university communities decide how to reckon with the intimate historical ties between eugenics and statistics, the question arises for those who teach statistics: where does this topic fit in our classrooms?
Meanwhile, instruction in statistics and data science is growing rapidly at all levels of education.These courses and curricula are evolving to meet new challenges, including both ethical and technical demands (American Statistical Association Undergraduate Guidelines Workgroup 2014; GAISE College Report ASA Revision Committee 2016; Bargagliotti et al. 2020;Raman et al. 2023).Given these changes, and the importance of statistical reasoning skills in other disciplines and the world at large, it is a crucial time for students to understand not just statistical methods, but their role in society (American Statistical Association Undergraduate Guidelines Workgroup 2014).Educational guidelines emphasize the importance of pre-college students developing "a healthy dose of skepticism about findings based on data" (Bargagliotti et al. 2020, 5), college students developing "an awareness of ethical issues associated with sound statistical practice" (GAISE College Report ASA Revision Committee 2016, 11, emphasis in original), and college programs incorporating "[e]thical issues …throughout a program" (American Statistical Association Undergraduate Guidelines Workgroup 2014, 13).In particular, it is important to engage such issues at the introductory level to reach a wide variety of students (Raman et al. 2023).
The history of statistics provides a means for students to grapple with the role that statistics has played in society and envision what that role will, or should, look like in the future.In mathematics education, incorporating history can change "perceptions of mathematics" and help "explain the role of mathematics in society, " although teachers may face challenges in broaching the issue (Fauvel 1991, 4).An inquiry into the history of eugenics at University College London-where Galton, Pearson, and Fisher all worked-noted the importance of not hiding eugenic pasts, but of teaching about them and "linking science to its wider context" (Solanke et al. 2020, 30).
Specifically for statistics, the history of statistics can add complexity to what has been called the "ritual" of the use of some statistical methods (Gigerenzer 2004, 587).In teaching pvalues and null hypothesis testing, for example, describing the development of these methods can demonstrate that they are but one of many proposed models of inference, and that this development was shaped by conditions at the time (Kennedy-Shaffer 2019).Using historically significant data and experiments as pedagogical tools have also been suggested as ways to situate statistics within science more broadly and teach the importance of understanding the context of data before beginning analysis (Dickey and Arnold 1995;MacKay and Oldford 2000).And placing methods in their historical context can elucidate connections and contrasts between approaches that students might otherwise miss (Stanton 2001).
Incorporating lessons about the intertwined history of statistics and eugenics into undergraduate statistics courses provides students with a richer picture of the field.It also creates an opportunity for students to begin conceptualizing the complex interactions between data and society.In Section 2, I offer three examples of the ties between statistical methods, their developers, and eugenic applications that motivated or resulted from these methods.I also offer a few examples of those who used data and statistical reasoning to push back against these narratives, both contemporaneously and in the years since.In Section 3, I discuss the value that these examples can have in the classroom, connecting to and expanding upon key learning goals.In Section 4, I discuss some of the challenges and limitations statisticians may face in incorporating these examples into the classroom.While these are only some possible examples, the approach of incorporating historical and contemporary examples of the uses-and misuses-of statistics and data science is crucial to meeting our educational goals.

Francis Galton and the Origin of Correlation
Francis Galton is perhaps best known for two key concepts: eugenics and correlation, both terms he coined himself.Galton's statistical work, which placed the field on a different footing than prior work on averages and the theory of errors, arose directly out of his eugenic aims (MacKenzie 1981, 52).Galton saw the need for new mathematical tools to defend his theories of intelligence and the inheritance of mental characteristics.Unprepared to develop these tools himself, Galton enlisted mathematicians and, with them, fostered the rise of mathematical statistics (MacKenzie 1981, 96-101).
Demonstrating the importance of heredity was the primary goal of Galton's research.Hereditary Genius, his 1869 work on the relationship of success with family lineage, demonstrated Galton's commitment to applying statistics to support his eugenic theories.The book, which built on two 1865 articles he had written, did not attempt to find a mechanism for this heredity, but rather to establish heritability through identifying correlations in achievement between family members.Specifically, Galton (1869, 6) endeavored to show "how large is the number of instances in which men who are more or less illustrious have eminent kinsfolk." Galton considered men of prominence-which he considers the key mark of "genius"from a wide variety of fields, including politics, law, science, and the arts, and spent most of the book cataloguing their relations.In each section, Galton tallied the proportion of prominent men who have prominent relatives of varying degrees of closeness.
A summary table from his 1865 article showcases Galton's approach in a way that is easier to understand than the more detailed and notation-heavy tables in Hereditary Genius itself.This table, presented below in Figure 1, sums up the rate of occurrences of a prominent "near male relationship, " a "distinguished son, " or a "distinguished brother" among prominent men in six categories (Galton 1865, 163).From these counting exercises, Galton (1865, 162) did not attempt to exactly calculate the "value of hereditary influence" but noted that "it is enormous." This he determined based on the large share of those distinguished men with a distinguished relative.More specific quantification using a mathematical notion of correlation would come later (Kevles 1995, 17).
Immediately, this table, as well as the ones for each individual field of prominence in Hereditary Genius, provides an opportunity to discuss correlation and causation.Galton drew an implicit contrast here between conditional probabilities of having a distinguished relative conditional on being distinguished oneself and the marginal probability of having a distinguished relative.From this, and the relative consistency of these conditional probabilities across types of distinction, he proclaimed the "enormous power of hereditary influence" (Galton 1865, 163).Galton gave a cursory nod to some other potential explanations, including nepotism and material advantages in life, but dismissed these for a few cases and concluded that the similar results across fields demonstrated the unimportance of these factors.Galton's faith in the importance of identifying correlations, rather than trying to identify specific causal factors, became a fundamental article of the biometric tradition that Karl Pearson would carry on (Porter 2004, 261).
Modern students, on the other hand, can easily note several confounding factors, sources of selection bias, and alternative explanations to the hereditary causation proposed by Galton to explain the correlation.These include issues of wealth and  power in Victorian society and the access to educational and material resources they provide, as well as the selection bias induced by Galton or the other collators of "notable men" in selecting from among their networks.Indeed, reviewers of Galton's work often pointed out these flaws, and the limitations of the use of reputational data to infer natural ability (Paul 1998, 31).Another lesson arises here for today's students: consider carefully what is measured by a variable and be clear in that definition in probability and correlation statements.
Galton's legacy in both statistics and eugenics-and their intersection-pervaded the fields both within and outside academia.These analyses formed only the early part of Galton's work.In later chapters of Hereditary Genius, Galton (1869) discussed the differences in abilities between races, largely by stereotypes and anecdotal reports of British travelers, and his political program for using eugenic measures to improve the trajectory of Great Britain and its "stock" (see also Clayton 2021, 135).His and his colleagues' later statistical work largely concerned quantitative measures, and, because of this, his examples generally used physical measurements of humans rather than mental or social characteristics.Nonetheless, he frequently applied results to such traits anyway, assuming that correlations for physical traits would apply equally to other characteristics (Kevles 1995, 18).He even carried over the normal distribution frequently found among measured physical traits to intelligence and "worth, " using rankings of individuals and then fitting a normal curve to those rankings (Clayton 2021, 136).
Galton became the progenitor of an establishment in both British mathematical statistics and British eugenics; these programs would be carried on by Karl Pearson, the Eugenics Society, and his endowed chair and laboratory at University College London.In addition, he inspired eugenics work in the United States, including the extensive data collection and analysis conducted by Charles Davenport and his Eugenics Records Office at Cold Spring Harbor (Kevles 1995, 45), Madison Grant's 1916 white supremacist screed The Passing of the Great Race (Okrent 2019, 207-208), and the Galton Society which-among other activities-advocated for restrictive immigration laws (Okrent 2019, 230-231).

Karl Pearson and the Quantification of Group Characteristics
Karl Pearson carried on the legacy of Francis Galton in many ways: in name, as the first Galton Chair of Eugenics at University College London; in deed, as his biographer and acolyte; and in science, expanding on his notions of regression, correlation, and inheritance.While Pearson's socialist politics were markedly different from Galton's patrician tendencies, they found a common eugenic ground in the imperialist project to improve the socalled British race (MacKenzie 1981, chap. 4).
Pearson's political and scientific views mixed in seeking "a high pitch of internal efficiency [of the nation] by insuring that its numbers are substantially recruited from the better stocks, and kept up to a high pitch of external efficiency by contest, chiefly by way of war with inferior races" (Pearson 1905, 46).The only way to "succeed in modifying the stock" was through struggle and survival of the fittest (Pearson 1905, 21), with a considerable role for state direction of fertility for eugenic purposes (MacKenzie 1981, 85).Along with the eugenic project at home, Pearson advocated colonialism and in some cases destruction of the "lesser races" abroad with explicitly racist language (Clayton 2021, 143-144).
How did Pearson determine these so-called better stocks and inferior races?His notion of a scientific or "objective" path to eugenics ran through quantification and curve-fitting.The collection of mass data on human characteristics was of paramount importance for his work, so that he could fit appropriate statistical distributions to them (Porter 2004, 259-261).He called, for example, for training medical mathematicians and embedding statistical departments within municipal agencies and boards (Pearson 1912, 15).In addition, identifying homogeneous racial groups was important from both a social and statistical perspective (Clayton 2021, 145).In his laboratories, Pearson and his colleagues took this work upon themselves.For example, they examined skull measurements, identifying racial differences but no correlation with intelligence.Finding physical measurements unhelpful for inferring intelligence, he attempted to quantify and measure it and other behavioral characteristics directly (Porter 2004, 263-265).The combination of these strains of Pearson's workquantification of human characteristics, curve-fitting, the state of the British race, and differences between racial groupsis exemplified in a series of papers he published later in life, with his laboratory employee Margaret Moul.The series, entitled "The Problem of Alien Immigration into Great Britain, Illustrated by an examination of Russian and Polish Jewish Children, " appeared in parts from 1925 to 1928 in the Annals of Eugenics, a journal founded by Pearson himself (Pearson and Moul 1925a, 1925b, 1927a, 1927b, 1928a, 1928b).Pearson and Moul (1925a, 5) examined a broad array of characteristics of Jewish immigrant children for the purposes of informing immigration policy "from the standpoint of national eugenics." They collected, analyzed, and presented a wide variety of both physical and mental measurements, sometimes comparing subgroups of the immigrant children and sometimes comparing them to Gentile children.
Part II, on intelligence, most clearly showed Pearson and Moul's fixation on quantification.Using teachers' evaluations of students, they categorized the children by intelligence into seven categories based in part on Pearson's previous work and the 1904 London School Board pioneer survey (Pearson and Moul 1925b, 57-59).They then fit these categories to a presumed normal distribution where each category represents 100 "mental units" and so the standard deviation is approximately 100 units as well (Pearson and Moul 1925b, 59-60).They then drew bell curves for boys and girls on top of the categories, which are reproduced here in Figure 2 (Pearson and Moul 1925b, 59-62).By taking the subjective categories, converting them into a seemingly continuous measure, and plotting normal curves to fit those data, Pearson and Moul added an objective veneer to the results.These curves then appeared quite similar to those of more easily quantifiable characteristics, like height and weight.From this, they concluded that "the intelligence of the Jewish girls [is] much below that of the Jewish boys" and thus, since the Gentile girls and boys performed similarly to one another, the Jewish girls must be "very seriously behind the Gentile girls" (Pearson and Moul 1925b, 125).The quantification of these categories also allowed Pearson and Moul (1925b, 63) to compute correlation coefficients between intelligence and a variety of other factors, which they claimed demonstrated the heritability of intelligence and character and discounted environmental or subjective factors.
For a direct comparison between the Jewish and Gentile children, Pearson and Moul examined the categorized intelligence for both boys and girls.Their Table 144, reproduced here as Figure 3, showed these relative percentages, made based on implied similarities between categories across different surveys.
Different studies, schools, and teachers used different categories, which Pearson and Moul related to one another when necessary for numerical or categorical comparisons (see, e.g., Pearson and Moul 1925b, 57).They deduced from this that the Jewish immigrant population is "on the average …somewhat inferior physically and mentally to the native population" (Pearson and Moul 1925b, 126, emphasis in original).This type of analysis, comparing categorical variables across populations, had spurred Pearson's development of methods for the analysis of contingency tables, which are still used today.Their analysis of this data fit with the laboratory's earlier work on determining the "strength of heredity" from contingency tables, which rested on an analogy with the correlation of continuous variables as well as a refutation of environmental or confounding effects (MacKenzie 1981, 168-173;Kevles 1995, 31-32).From these analyses, Pearson and Moul (1925b, 127) concluded that the data did not indicate that "the maintenance and improvement of [the country's] stock …will follow the unrestricted admission of either Jewish or any other type of immigrant." Pearson and Moul collected a mass of data, categorized it, and analyzed it using Pearson's preferred techniques of curve-fitting and contingency table analysis.This led to political pronouncements that fit his preference for eugenics, although they claimed they were justified by "the cold light of statistical inquiry" (Pearson and Moul 1925a, 8).They reiterated this claim to objectivity repeatedly, stating that "there is no institution more capable of impartial statistical inquiry than the Galton Laboratory" and that, "We firmly believe that we have no political, no religious and no social prejudices" (Pearson and Moul 1925a, 8).These claims, however, are undermined by the convoluted eugenic positions the authors must take when confronted with the data, as mathematician Aubrey Clayton (2021, 146) describes: "it becomes clear from reading these papers that Pearson …had decided ahead of time what conclusion the data should support and then found a way to make it do just that." While his work was motivated by and supported these eugenic ideals, Pearson's philosophy of science and of statistics stated as his goal to measure without theorizing (MacKenzie 1981, 88).Modern students may recognize in such statements, and Pearson and Moul's protestations of objectivity, similarities with today's data analysts.The exhortations to "follow the data" or "let the numbers tell the story" exemplify a goal of quantification broadly-and statistics in particular-of providing objectivity in contested settings (see, e.g., Gould 1981;Porter 1995;Desrosières 1998;Clayton 2021).However, as can be seen by Pearson's example, this involves several assumptions, beyond even technical statistical ones.It requires that those collecting the data are able to do so objectively and consistently across different settings.This is particularly challenging in categorizing intelligence, where teachers and schools will often inherently use different references for their students.Moreover, this claim makes a larger claim of independence from social factors and prejudice that cannot hold true for anyone designing a research agenda and shaping the questions asked.
Like Galton, Pearson left a significant legacy in both statistics and eugenics.His relatively well-funded laboratories allowed him to seed the fields of statistics and genetics with his protégés (Kevles 1995, 38-39).His journals Biometrika and Annals of Eugenics allowed him to promote his own lab's methodologies and act as a gatekeeper for British academic research (Kevles 1995, 40).Both journals survive today, although the latter has been renamed the Annals of Human Genetics.The methodologies he designed-of contingency tables, correlation coefficients, and testing the goodness-of-fit of curves and distributions (Porter 2004, 252-261)-last as well, and his goals of objectively quantifying the results of data analysis would be furthered by his eventual rival and partial successor, R. A. Fisher.

R. A. Fisher, Variance, and Heritability
The intersection of British statistics and eugenics culminated in the time and personage of Ronald Aylmer (R. A.) Fisher, who has been called "the single most important figure in 20th century statistics" (Efron 1998, 95) and who maintained the eugenic position long after it had fallen out of favor in Great Britain (Evans 2020).His pioneering work in agricultural statistics, the design of experiments, and evolutionary biology and genetics remain influential in the fields to this day.The Genetical Theory of Natural Selection, his 1930 magnum opus on genetics, particularly demonstrates the connection between his scientific advancements and his eugenic ideas, with several chapters devoted to his preferred eugenic policies (Fisher 1930; see also Aylward 2021).But the connection between his statistical ideas, particularly the analysis of variance (ANOVA), and his eugenicinfused interest in inheritance, has important lessons for the modern statistics student.
Fisher's scientific career began with the help of the eugenics movement.As a student, Fisher founded the Cambridge University Eugenics Society and delivered his first paper in that forum (Mazumdar 1992, 98).In it, Fisher noted, "Biometrics then can effect a slow but sure improvement in the mental and physical status of the population" (quoted in MacKenzie 1981, 190).This goal would drive much of his scientific work and political advocacy.Leonard Darwin-leader of the Eugenics Society, son of Charles Darwin, and relative of Francis Galton-was a major benefactor of the society and proponent of Fisher's, helping Fisher publish his first major scientific article (MacKenzie 1981, 186-187).Fisher in turn dedicated Genetical Theory to him (Fisher 1930).
In his 1918 paper entitled "The Correlation between Relatives on the Supposition of Mendelian Inheritance, " Fisher (1919) planted his foot firmly in statistical methodology research.In this article, Fisher described his ANOVA method to determine the heritability of human characteristics; this method began with eugenic research and would later permeate his agricultural work on experimental design (MacKenzie 1981, 211).ANOVA became a common data analysis technique in regression models and is still included in statistics textbooks and courses (see, e.g., Kuiper and Sklar 2013; GAISE College Report ASA Revision Committee 2016).The paper also put forward the importance of variance as a statistical quantity, supplementing its more widely-used square root, the standard deviation (Tabery 2008, 723).
Using data collected and initially analyzed at the Galton Laboratory by Pearson and Alice Lee, Fisher aimed in this article to explain why the correlation coefficient for height between brothers is 0.54.He interpreted this correlation as meaning that 54% of the variance in height in the population is due to differences in ancestry.Then, he sought to explain the remaining 46% of variability in a way that minimized the environmental contribution (Mazumdar 1992, 109).He did this by calculating the variance of measurements of sibling pairs and "ascrib[ing] to the constituent causes fractions or percentages of the total variance which they together produce" (Fisher 1919, 399).Taking the data given, various estimates of correlations at different levels of relation, and Mendelian genetic principles, Fisher decomposed the remaining variance among individuals of the same ancestry into three genetic-related components.These were "essential genotypes" (i.e., the effect of different dominant inheritance from heterozygous parents), "dominance deviations" (i.e., the irregular behavior of inherited recessive characteristics due to ancestral inheritance), and "homogamy" (i.e., genetic correlations between parents).Interpretations of these factors in somewhat more contemporary terms, as well as more details of Fisher's calculations, are available in the commentary on Fisher's paper by Moran and Smith (1966).Fisher compiled his results into a table, reproduced here as Figure 4.
The table reproduced in Figure 4 is the forerunner of modern ANOVA tables that students may encounter in a first or second statistics course.The quantities associated with each factor are explicitly written as percentages of the total variance.Notably, Fisher's genetic factors summed to 100%, leaving no room for environmental or social factors other than homogamy.In discussions of similar analyses for other traits, Fisher routinely left little room for environmental factors (Mazumdar 1992, p. 109).
In 1924 remarks on this paper and its impact, Fisher further stated that "any environmental influence will tend to lower the correlations" based on an assumption that environment is randomly distributed (Fisher 1924a, 197).He also extended the work at that point to "mental and moral qualities" as categorized in school-based studies, like those used by Pearson and Moul (Fisher 1924a, 198).In this way, Fisher's scientific analyses and ideological views on the relative roles of genetics and environment in population fitness reinforced one another.
Modern students considering this analysis can see many deficiencies in Fisher's argument, reinforcing the limitations of ANOVA approaches.Fisher (1919, 400, 424) himself noted some of these limitations in the article: the assumption of a normally-distributed dependent variable, the inability to interpret coefficients as causal relationships due to confounding, and omitted variable bias.Nonetheless, he made quite sweeping claims, including that "it is very unlikely that so much as 5% of the total variance is due to causes not heritable" (Fisher 1919, 424).In particular, Fisher failed to fully grapple with the assumption in his model that environmental effects would be purely additive to the model (i.e., no gene-environment interaction), normally-distributed, and uncorrelated among relatives (i.e., gene-environment independence) (Moran and Smith 1966, 49;Tabery 2008).Like Galton before him, Fisher was unable or unwilling to consider that environmental factors in a highly unequal society may substantially explain apparent correlations in outcomes between relatives (Tabery 2008, 728-729).This demonstrates an important lesson for students: choosing the factors included in a model is itself a key statistical step and encodes assumptions, both statistical and social, that can affect the outcome of the analysis.
Fisher expanded upon this work in his ongoing development of statistical methodology and genetics.Soon after this paper, Fisher began work at the Rothamsted Agricultural Research Station, where he would develop lasting work on statistical methodology, experimental design, and crop genetics.He was able to exercise more control over the environment in this setting, obviating the need for some of the eugenics-inflected assumptions he made when dealing with human population data (Mazumdar 1992, 124).Through this work, he began to realize a greater role for the variance of environmental factors and for environment-gene interactions (Tabery 2008, 725-726).
Throughout his career, however, Fisher remained committed to the eugenics cause and developing research to support it.He stated that his ANOVA work and related regression coefficients calculated on the biometric data "shows us immediately what will be the effect of selection, in modifying the population" (Fisher 1924a, 196).He put this into practice on the Research Committee of the Eugenics Society.In 1924, building on the Society's work and his own calculations of heritability of what was known as "feeble-mindedness" or "mental defect", he published an article entitled "The Elimination of Mental Defect" (Fisher 1924b).Reproduced and disseminated by the Society, Fisher (1924b) argued forcefully for eugenics laws that would promote fertility among the upper classes rather than the lower classes.He still saw social classes in largely biological terms (Mazumdar 1992, 128-129), and he saw a role for mathematical calculation in informing social policy.Proponents of statedirected eugenics often cited his assessment that the incidence of "mental defect" could be reduced by 36% in one generation of sexual segregation of those deemed inferior (Mazumdar 1992, 130).
Fisher similarly wrote and analyzed data in support of voluntary eugenic sterilization of the "feeble-minded" (Bodmer et al. 2021, 569-571), although there were plenty of opponents warning that it would lead quickly to coerced sterilization and biased application of the law (Kevles 1995, 167-169).Through his work on segregation and sterilization, he became close to several Nazi statisticians and geneticists, even writing in defense of one after the war and after the horrors of the Holocaust had come to light (Bodmer et al. 2021, 571-572).He also opposed language in a prominent UNESCO statement on race science that suggested that there were no characteristic racial differences in intelligence and capacity (Bodmer et al. 2021, 572).
Like Galton and Pearson before him, Fisher's statistical innovations are seen, and debated, in many parts of the discipline today.Despite their early applications, they have become key statistical tools.Fisher's methodologies cannot be separated from their history, however, as their development and role in quantifying the goals of the eugenics movement demonstrate that scientific facts and methods "must always be interpreted in light of social assumptions and goals" (Paul 1998, 114).Indeed, Fisher's later reconsideration of the importance of nonadditive variances demonstrates to modern readers the importance of considering the assumptions built into models and statistical techniques; independence is both a statistical assumption necessary for some analyses and a statement about the population that may reflect pre-conceived notions or social biases.

The Opposition to Eugenics
While these three major figures in the history of statistics shaped their ideas in the crucible of eugenics, others were not so bullish on the prospect.Many individuals, both inside and outside academic fields, challenged the eugenic assumptions, the science that supposedly supported it, and its implementation in the world.
Various geneticists-including Thomas Hunt Morgan, Hermann Muller, Julian Huxley, and J.B.S. Haldane-attacked what they saw as the inherent class and race bias in the eugenics movement in the 1920s and 1930s.Morgan noted that "material advantages and disadvantages" as well as "traditions, customs, …and prejudices" could confound studies, like Galton's familial correlation studies, purporting to show racial differences and superiority (quoted in Paul 1998, 116).Similarly, Muller argued that economic differences could obscure genetic differences (Paul 1998, 117).Political and scientific disputes led these scientists to either reject eugenics outright or-as in the case of Huxley and Haldane-propose a more reform-minded eugenics research program that required biological and statistical methods that would resolve the issue of environmental confounding (Aylward 2021;Kevles 1995, 124;Mazumdar 1992, 183).These ideas presaged modern directions of human genetics research and planning, contrasting with Fisher's right-wing class-focused eugenics (Paul 1998, 119).
Critics also attacked the data that were used to make eugenic claims.Columbia University anthropologist Franz Boas criticized the data that were used to justify immigration restrictions in the United States, compiling his own, far more comprehensive data to refute the genetic determinism claims (Okrent 2019, 156).Along with psychologist Otto Klineberg and others, he also criticized culturally and socially biased intelligence tests that were used to justify eugenic policies in the early 20 century (Kevles 1995, 134-138).Boas (1916, 473) wrote that "many of the data on which the theory of eugenics is based are unsatisfactory." Eventually, the Carnegie Institution-which had for decades funded the Eugenics Records Office, the leading American eugenics data collection effort-would deem the Office's data on family histories and mental characteristics "unsatisfactory for the study of human genetics" and "thoroughly unscientific" because of its biased and arbitrary data collection and compilation procedures (Okrent 2019, 375-376).
Lancelot Hogben, a biologist and mathematician at the London School of Economics, exemplifies several strains of this opposition, explicitly attacking the politics, data, and methodology of his contemporary Fisher and the earlier eugenicists.He wrote of the "explicit social bias" that dominated British eugenics (quoted in Mazumdar 1992, 149).As early as 1919, Hogben (1919, 152) was writing that "the dynamic factor in social development must be pre-eminently emphasised in environment, " in stark contrast to Fisher's paper of the prior year.Hogben and Fisher traded correspondence over the roles of both genetic-environment correlation and interaction (Tabery 2008, 737-740), and Hogben worked throughout the 1920s and early 1930s on incorporating the varied effects of environment on development into genetics research (Erlingsson 2016, 518-519).
In 1933, Hogben published "The Limits of Applicability of Correlation Technique in Human Genetics, " squarely aimed at the methods and conclusions of Fisher and his fellow biometricians.Conceding that these studies could show some genetic component to various human characteristics, Hogben (1933b, 383) rejected the notion of quantifying that contribution by ANOVA, noting that "the ambiguity of the concept of causation completely obscured the basic relativity of nature and nurture." Hogben (1933b, 393) exposed the "limitations of statistical technique" in explaining differences in highly heterogeneous human societies, noting the unacknowledged deficiencies of social statistics as compared to experimental designs in the laboratory.The variance and its components, described Hogben, are dependent on the extent of variation of the environment and how similar or different the environment is for close relations.He expounded on this idea in books aimed at both practicing researchers and the general public (Hogben 1931a(Hogben , 1931b(Hogben , 1933a)).This was at once a statistical statement and a political statement on the importance of improving living standards for the lower classes.
Moreover, Hogben seemed to challenge the entire notion of quantifying the fitness of an individual.In a 1927 textbook, he asked: "What is the good of the race?What is a desirable social quality?What is a 'morally and mentally fit' person?"(quoted in Mazumdar 1992, 155).These questions were rhetorical; Hogben (1919, 155) had already answered that "no single individual or group of individuals is qualified to decide upon what constitutes a desirable type of human being." His ideas had been shaped both through his rejection of apartheid during a professorship in South Africa and through his socialist policies and interactions with Soviet scientists (Werskey 1979, 105-109).
Decades later, geneticists and scientists confronted this question again, as studies of race and intelligence created controversy.American geneticist Norman Horowitz questioned the whole enterprise, remarking that such group-level "statistical considerations" have "nothing to do with the value of an individual human being" and "in any case, the value of a person  Adhikari, DeNero, and Wagner (2022) for introductory data science; and Kuiper and Sklar (2013) for regression and modeling.
rests on much more than his IQ alone" (quoted in Kevles 1995, 283).Stephen Jay Gould's 1981 The Mismeasure of Man surveyed race science and concluded that it rests on a core fallacy of quantification: "the abstraction of intelligence as a single entity, …its quantification as one number for each individual, and the use of these numbers to rank people in a single series of worthiness" and to use that quantification to ascribe inferiority to "disadvantaged groups" (Gould 1981, 24-25).
The opposition to eugenics-both contemporary with and subsequent to Galton, Pearson, and Fisher-demonstrates the contested nature of science and its role in society.Methodological disagreements, especially in the early development of those methods, are key to the progress of science and understanding the assumptions that frequently limit the value of statistical techniques.Disagreements in data validity are core to the statistical process and to acknowledging the limits of quantification.And the larger discussion of the historically contingent and socially embedded nature of science continues today, along with discussion of the appropriate role of scientific knowledge and scientists themselves in political policy-making.

The Value of Incorporating Eugenics History in the Classroom
Incorporating these, or other, examples of the historical connections between statistics and eugenics provides a wider context to course concepts and can engage students on broader questions about the role and practice of statistics.On a first level, the examples described here can exemplify methods taught in undergraduate courses and common mistakes in analyzing and interpreting data.Galton's correlational analysis shows the dangers of confounding and the distinction between correlation and causation.Pearson's quantification of subjective categories and use of contingency tables show the way statistics can provide a cover of rational objectivity to challenging social questions.
Fisher's analysis of variance shows the difficulties of modeling, especially the role of omitted factors and interactions, as well as the (at times) oversimplified assumptions made by that and other regression methods.These examples can be used in a variety of courses, including introductory statistics, data science, or probability, or a second course in statistics focusing on regression and modeling.Table 1 shows several course topics where these examples may fit naturally into course content.
As recommended in curriculum guidelines, these examples emphasize "real-world data and substantive applications" and show the importance of "understand[ing] issues of design, confounding, and bias" (American Statistical Association Undergraduate Guidelines Workgroup 2014).Correlation, two-way tables, and ANOVA all feature within introductory or second statistics courses, and the importance of their assumptions and the ethical implications of the analyses are highlighted here, fitting Goals 7 and 9 of the GAISE Report (GAISE College Report ASA Revision Committee 2016, 11).Correlation and the role of variance and its non-additivity under dependence arise in probability courses as well, allowing these examples to fit into those curricula.
Second, these examples show how contingent data and methods are to the time, place, and method of collection.They all demonstrate that researchers' questions of interest shape the data that are collected, which are in turn informed by social norms.Especially in the context of data science education, discussions of the data collection process are crucial (American Statistical Association Undergraduate Guidelines Workgroup 2014).The social context of, for example, Galton's "genius" designation may be clear to modern students, which encourages them to think more critically of the data and variables they are using in class.Similarly, Pearson's reliance on teachers' assessments demonstrates how the data collection process can be shaped by social prejudices.
This can lead to the overarching question, raised by opponents of eugenics for over a century: what can and should be measured?Daniel Kevles quotes Galton as frequently saying: "wherever you can, count" (Kevles 1995, 7).Pearson saw statistics as a means of measuring without theorizing, of putting forward theory-free descriptions of the world (MacKenzie 1981).Later generations of natural and social scientists, however, would challenge this "Trust in Numbers", as the title of historian Theodore Porter's (1995) book puts it.These anecdotes provide students with examples of when "trusting the data" can lead one to alarming conclusions.Indeed, as Alain Desrosières (1998, 1) notes in The Politics of Large Numbers, the "measurements, which are reference points" in social debates, "are also subject to debate themselves." Becoming a responsible user of statistics and data, as highlighted in the GAISE College Report (2016,11) Goal 9, must certainly include an understanding of what cannot reasonably be measured or quantified.Bringing these ethical questions into statistics and probability courses helps integrate those questions into the entire curriculum for statistics and data science majors, as well as non-majors (Baumer et al. 2022).
Moreover, by discussing the individuals behind the names, and their beliefs, these lessons can humanize the field and reduce the likelihood of an unquestioned faith in the results presented by statisticians and others.As in other mathematical fields, "using historical problems humanizes mathematics" and statistics by showing that "such problems are not created in a vacuum" (Liu 2003, 419).Encouraging this critical mindset is key to being an informed consumer of data and provides a richer engagement with a dynamic and ever-changing discipline.It can even allow students to see themselves more as active participants in the field than passive recipients of established knowledge.Pearson himself saw the value of teaching the historical context of statistics, believing that understanding the birth of these methods would inspire students' imaginations (Porter 2004, 252-253).
In a liberal arts education, connections across disciplines provide crucial learning opportunities for students.These lessons inherently connect statistics education to the life sciences as well as history, sociology, anthropology, and other fields.My own students found these connections valuable, noting how it allowed them to relate to material learned in psychology and sociology courses.Specifically, the discussion of eugenics-and of these individuals in particular-arises in genetics and biology courses, providing more opportunity for cross-curricular engagement.In addition to his role in the history of statistics, Fisher is a prominent figure in the history of genetics and biological and agricultural research.These connections allow statisticians to learn from the way difficult histories have been addressed in various disciplines, including eugenics specifically in biology (see, e.g., Markstein and Davis 2020) and colonialism and imperialism in geology (see, e.g., Cartier 2021;Rogers et al. 2022).
This critical conversation around histories of discrimination and marginalization is a multi-disciplinary one, as exemplified in recent Nature editorials on the past and present of scientific research (Editorial 2022;Nobles et al. 2022).This crosscurricular engagement demonstrates that statistics, and mathematics more broadly, cannot stand apart from society, but exists within its social context.This also provides an opportunity for "cross-curricular work with other teachers or subjects" (Fauvel 1991, 4).As one focus group participant in the University College London inquiry into the history of eugenics at that institution put it: "Maths is not separate from politics.Maths is not an isolated objective little bubble." (Solanke et al. 2020, 30).The earlier students can grapple with this fact, the better stewards (or, when necessary, iconoclasts) of the discipline they are likely to be.
Finally, these historical examples have disturbing contemporary parallels.It is crucial to teach the ethical practice of statistics and data science alongside the technical methods (Elliott, Stokes, and Cao 2018;Tractenberg 2019), especially in introductory courses (Raman et al. 2023), and there are tools and case studies available to do so (see, e.g., Baumer et al. 2022;Tractenberg 2022).Historical examples can connect with present examples to reinforce the longstanding ethical challenges in the discipline.Moreover, specific issues with the analyses of Galton, Pearson, and Fisher may resonate with modern issues of algorithmic bias and other ethically dubious statistical practices.As one example, predictive algorithms used in policing and sentencing can suffer from biases due to the input data (who is likely to be arrested and thus included in recidivism statistics in an unequal society?) and confounded correlations (see, e.g., Angwin et al. 2016).The specific issues of quantifying intelligence, categorizing racial groups, and identifying genetic differences continue to arise in science and politics, and understanding this past has a direct bearing on how we interpret these arguments today (see, e.g., Gould 1981;Rutherford 2020, 156-161).These connect to contemporary political discussion over addressing the history of eugenic policies (Villarosa 2022) and eugenic-inflected ideas promoted in places like Silicon Valley (Dodds 2023).Further examples can be found in the references on contemporary issues of bias in statistics and data science in Appendix 1. Discussing challenging historical topics can equip students to face modern challenges.

Challenges and Limitations in a Statistics Course
At the same time, a statistics instructor will face challenges and limitations in integrating these case studies into a standard statistics course.Studies of incorporating history into mathematics education have found both learning benefits and obstacles (Bütüner 2015).Perhaps the largest such challenge is time: doing justice to these examples and initiating a broader conversation about the role of statistics is a time-consuming endeavor.In addition, a fair treatment of the history requires building up historical context, possibly through additional readings for the students.Honest conversations require a safe and open environment in the classroom and building that takes time throughout the semester as well.Without adequate foundation and discussion, the material can easily feel disconnected from the rest of the course.
Indeed, although I shortened or excised some topics to make room for these discussions in my probability course, some students still felt that it was disconnected or did not see its role in the course.It must be used in the classroom "as part of the lesson plans, not as an 'extra' activity" (Liu 2003, 419).Fitting these discussions into the methodological topics to which they relate may be more challenging from a scheduling perspective, but it makes the lessons feel essential to statistics learning rather than optional.This also provides opportunities to use these as the key learning examples for new methods, showing that the use of statistics is not always desirable or done well.
In addition, statistics instructors generally will not be historians or sociologists of science, with an existing background in the research techniques and frameworks of those fields.This can make it an especially daunting task to introduce such material in the classroom (Fauvel 1991, 4).Eugenics alone is a vast topic, encompassing myriad social movements, individuals, and organizations with a variety of scientific and political viewpoints (see, e.g., Kevles 1995).And the precise interactions of eugenics and statistics, which changed over time as well as by location and individual, are still questions contested by historians and sociologists of science (Louçã 2009).Short examples cannot even capture the complex and varied careers of the statisticians described here.To name only a few examples, Galton as a proponent of evolution, Pearson as a philosopher of science, Fisher as the father of experimental design, and Hogben as a popularizer of mathematics are given short shrift.Thus, it is prudent to be wary of oversimplification.
With discussion of sensitive topics such as race, class, gender, and colonialism, the risks of misstatements or simplifications are even higher.Being willing to honestly tell students when questions go beyond one's areas of expertise and experienceand referring them to appropriate resources-is crucial to meeting this challenge.I have provided some resources here (see Appendix 1) and there are other reference lists available elsewhere (e.g., Tyner and Rice 2021) for students interested in learning more about the context of these examples, but it is also worthwhile to connect students with courses or faculty in other departments that may cover these time periods or themes in more detail.I have found students very responsive to my acknowledging my own limitations in teaching these topics.Inviting students into a broader education and consideration of the topics demonstrates that, even for those of us teaching and researching in the field, there are no easy answers about how to reckon with its past.
Finally, at a time when many instructors are actively trying to be more inclusive in statistics and data science education (see, e.g., Witmer 2021), it may seem discordant to center white male statisticians from over a century ago.Focusing on the harm done by statisticians to marginalized groups could have a demoralizing effect for students from such groups.I believe, however, that this also provides an opportunity to show the importance of diverse perspectives in the discipline.The exception from the examples given, Margaret Moul, provides an interesting venue to discuss the historical role of women in statistics as well: Pearson hired many women at the Galton Laboratory, although frequently to lower-status positions and lower salaries than comparably-qualified men (see Kevles 1995, 220;Paul 1998 54).White, middle-and upper-class women indeed played a large role throughout eugenics movements both popularly and academically, in part because it was seen as a field of science better suited to them (see, e.g., Kevles 1995, 55;Paul 1998, 54-57).
Modern examples of statisticians doing important work to address disparities, as well as of the more diverse (although still far from perfect) field as it stands today, can complement these topics.For example, a recent issue of Significance magazine highlighted historical and contemporary women in statistics (see, e.g., Lorenzo-Arribas and Alba 2022, 5) and a recent issue of AMSTAT News-the magazine of the American Statistical Association-highlighted David Blackwell alongside other Black statisticians (Hughes-Oliver 2020; Murphy 2020).In addition, instructors can discuss multicultural models of science and ethical traditions (Elliott, Stokes, and Cao 2018).Learning from challenging historical episodes as well as positive examples can inspire hope for a better future.

Conclusion
The names of Galton, Pearson, and Fisher may come down from buildings and awards.But they will never be removed from the history of statistics.The discipline's foundational methodology is intimately bound with the history of eugenics and its alarming political and social prescriptions.As sociologist Donald A. MacKenzie (1981, 11) put it: "The eugenic objectives of Galton, Pearson and Fisher were closely connected to their science." Those ties continue to this day, as we reckon with the implications of algorithms and data analyses that display the potential for "data-driven" bias.In the classroom, we cannot erase this history, but we can contextualize it and begin a discussion of its modern ramifications.This engagement in statistics and data science courses can improve students' connections to the technical material covered.More importantly, it can encourage them to consider broader questions about the role of data in society: what is it, and what should it be?

Supplementary Materials
A list of suggested references regarding the history of statistics and eugenics, as well as contemporary issues of bias in statistics and data science, can be found in Appendix 1.

Figure 2 .
Figure 2. Diagram 18 from Karl Pearson and Margaret Moul, "The Problem of Alien immigration into Great Britain, Illustrated by an Examination of Russian and Polish Jewish children" (1925b, 62), reproduced with permission of John Wiley & Sons.

Figure 3 .
Figure 3. Table CXLIV from Karl Pearson and Margaret Moul, "The Problem of Alien immigration into Great Britain, Illustrated by an Examination of Russian and Polish Jewish children" (1925b, 126), reproduced with permission of John Wiley & Sons.

Figure 4 .
Figure 4. Table from R. A. Fisher, "The Correlation between Relatives on the Supposition of Mendelian inheritance" (1918, 424), reproduced with permission of Cambridge University press.