Race, Sex, and their Influences on Introductory Statistics Education

ABSTRACT The Survey of Attitudes Toward Statistics or SATS was administered for three consecutive years to students in an Introductory Statistics course at Cornell University. Questions requesting demographic information and expected final course grade were added. Responses were analyzed to investigate possible differences between sexes and racial/ethnic groups. The findings showed that female students had significantly lower average scores than their male counterparts in affect, cognitive competency, and subject difficulty. In addition, they expected lower average final course grades. When expected and achieved grades were compared, both male and female students overestimated their final scores, but female students did so to a lesser extent. No differences in attitudinal scores or grade expectations were found between racial/ethnic groups. However, significant differences between racial groups were found when comparing student's expected and actual grades. Asian students outperformed the other groups in both meeting their personal expectations and achieving significantly higher final grades. Latino and Black students had outcomes well below their expectations. These results suggest that educators should focus on differences between sexes when planning ways to improve students' confidence in their quantitative ability. They should also consider implementing strategies for minority students to achieve their expected final course grade.


Introduction
While there have been increases in the representation of women and minorities in science, technology, engineering, and mathematics (STEM), these groups are still largely marginalized in quantitative fields. This underrepresentation has often been attributed to classroom environments in which women and minorities do not feel comfortable enough to actively participate in class (Maher and Thompson 2001). These groups subsequently do not have confidence in their quantitative abilities and choose not to pursue careers in a STEM field. The differences between male and female students in attitudes toward mathematics widens as students age (Hyde et al. 2006). By the time students reach university, there is a large confidence gap between male and female students in their ability to succeed in quantitative courses. The purpose of this study is to investigate the role that sex and race play in attitudes toward learning statistics, and in student performance expectations. It is critically important to understand these differences in order to determine the best ways to intervene, so all students feel equally comfortable and are able to meet their personal expectations in statistics classrooms.
The attitudinal effect on student success in statistics courses has been studied in a number of variations. In a meta-analysis of 17 studies, Ramirez, Schau, and Emmio glu (2012) documented that in 15 cases, the researchers found statistically significant relationships between selected attitude components (especially affect, cognitive competence, and value) and academic achievement. A comprehensive review of educational research into the teaching and learning of statistics (Zieffler et al. 2008) discussed a number of studies showing the impact of various noncognitive factors on student success. Garfield et al. (2002) also stressed that promoting a positive student attitude about statistics is one of the three major final outcomes, along with learning and persistence, which educators should strive for in their introductory courses. A positive attitude can encourage students to take upper-level courses, as well as make them more likely to use statistical tools in their professional life.
There have been a number of survey instruments proposed to measure student attitudes toward statistics. Statistics Attitude Survey (SAS) by Roberts and Bilderback (1980), Attitude Toward Statistics (ATS) from Wise (1985), and Survey of Attitude Toward Statistics (SATS) developed by Schau et al. (1995) are the three most commonly used in collegiate courses. SATS has been widely studied and is known to give high values of internal consensus as measured by Cronbach's alpha. SATS has been used to assess attitudinal factors in a variety of classroom situations. These include evaluating the use of a Massive Online Open Course (MOOC) in a blended classroom (Gibbs and Tayback 2014) and comparing traditional to flipped course models (White 2014). SATS has also been used to measure the attitudinal impact of active learning techniques (Carlson and Winquist 2011), student designed data-collection methods (Carnell 2008), web-augments and fully online courses (Gundlach et al. 2015) and simulation-based curricula (Chance, Wong, and Tintle 2016). SATS-28 will be used in this study along with demographic and grade expectation questions.
Two of the key demographic variables studied here are gender and race/ethnicity. Ramirez, Schau, and Emmio glu (2012) reviewed eight studies done in the United States that specifically investigated the influence of gender on attitudinal measures in statistics courses, and they did not find any differences (on average) between the male and female attitudes in any of the SATS components. Interestingly though, nine studies outside of the United States all reported gender differences (Tempelaar, Gijselaers, and Schim van der Loeff 2006;Tempelaar and Nijhuis 2007;Mahmud and Zainol 2008;Verhoeven 2009Verhoeven , 2011Coetzee and van der Merwe 2010;Bechrakisa, Gialamasb, and Barkatsas 2011;. In each case, the male students displayed higher positive scores in all four attitudinal categories. The effect of gender on achievement in introductory statistics classes has been reviewed with variable results (Scheaffer and Stasny 2004;Alldredge and Brown 2006;Haley, Johnson, and Kuennen 2007). While a study of gender differences in introductory business statistics showed females tend to achieve higher scores than male students (Johnson and Kuennen 2006), and that students taught by a professor of the same gender do significantly better, these results do not address the attitudinal or confidence differences between genders.
There has been less research about racial differences in terms of attitudes toward statistics. As Wagler and Lesser (2011) noted, "Students from diverse cultural or language backgrounds may not always respond in the same way to traditional statistics classroom instruction." But the focus has been largely centered on the impact of language, specifically English as a second language (Ware 2004;Lesser and Winsor 2009). Verhoeven and Tempelaar (2014) focused on cultural diversity in statistics education using Hofstede's research on cultural dimensions (Hofstede 1986). They observed patterns among different cultural regions. For example, Dutch and Scandinavian students demonstrated high scores in affect, while Asian students scored lower in general on the other attitude measures. Their final conclusion stated that "strong patterns become visible that deserve our undivided research attention in the near future." Mvududu (2003) examined differences in attitudes of students from samples of American and Zimbabwean students. They also noted that "cross-cultural comparisons have the potential to generate new insights into statistical pedagogy and the role noncognitive socio cultural variables play in teaching statistics to college-age students." While these studies provide insights into cultural differences from a nationality standpoint, less work had been done regarding the influence of gender and race on success in introductory statistics class among U.S. students.
In addition to comparing demographic impact, it is crucial to consider what instructors can do to improve students' ATS and their ability to demonstrate subject knowledge. A study on psychology students in a graduate statistics class suggested that the focus should be on their attitude toward cognitive competence (Dempster and McCorry 2009). Others focus on the demographic impact. Walton and Cohen (2007) studied the impact of 1-hr sessions aimed at helping African American students improve their sense of social belonging. These sessions resulted in increased GPAs and reduced the African American-White GPA gap by half. Cohen et al. (2009) aimed to curb stereotype threat by having college physics students write down values that were personally important to them. A second group was kept as a control and asked to list attributes that were generally important, but not specifically a priority for them. As a result of the intervention, the GPAs of female students in the treatment group increased by 0.33 points. Both of these interventions were designed to make students more comfortable in the classroom and increase their confidence, thus showing the influence of simple interventions.
These studies show that social-psychological interventions have the potential to make powerful impacts in the classroom. Understanding students' differences in perceptions toward statistics, along with the influence of gender and racial diversity, will allow educators to develop targeted interventions and teaching strategies that improve students' attitudes toward the subject.

Setting and Survey Participants
Students enrolled in the introductory statistics class offered by Cornell University's Dyson School of Applied Economics and Management participated in this study. The course consists of three 50-min lectures given by a faculty member and a 2-hr discussion section led by experienced undergraduate teaching assistants. The instructor is a female with a Ph.D. in Statistics and over 25 years of experience teaching introductory classes. All students enrolled in the course during the Fall 2012, 2013, and 2014 semesters were given the survey through an online course management system the third day of class. It should be noted that in Fall 2012, the class was offered at an earlier time (9:05 instead of 1:25) and in a different building during a transition from a Spring to a Fall class. As the class was offered the previous term, Fall 2012 had a smaller enrollment with 176 students, compared to the typical class size of approximately 240 students.
Combining all three classes, a total of 611 students filled out the survey, yielding a 94% response rate. Students were from over 15 different disciplines, with the largest number being Business Management majors (33%). The students were primarily freshmen, 20%, and sophomores, 61%. Due to the large number of freshmen in the sample, we did not ask for their current GPA on the survey. Table 1 contains demographic information for the students that participated in the survey. The gender balance was approximately equal with 51.1% female students and 48.9% male students. A disproportionate percent of the class, 49.3%, identified as White. The next largest racial group was Asian at 19.8%. The "Other" category includes students that self-identified as Pacific Islander, Native American, and multi-racial.

Survey Instrument
The survey consisted of questions from the SATS-28 as well as nine questions added for the purpose of this study. The attitudinal questions were divided into four dimensions for scoring-affect, cognitive competence, value, and difficulty. Questions in the affect category aimed to measure how students feel about statistics. The cognitive competence category measures how students perceive their own intellectual abilities and how well they thought these abilities apply to statistics. The value category gauges their view on the utility of statistics education. The difficulty category measures student perception of the complexity of the subject.
Questions addressing the different attitude categories were varied, and responses were on a 1 (strongly disagree) to 7 (strongly agree) Likert-scale. Some of the items are in the affirmative and some are negative. The responses on the negatively worded questions were reversed (e.g., 7 was coded as 1) before the subscale total scores were computed. Higher cumulative scores in all of these categories indicated a more positive attitude. Additional questions requesting demographic information and the final grade expected in the course were included. The full list of questions can be found in Appendix 1 (available in the online supplementary files).
One change was made to the survey over the course of this study. In the 2014 survey, students were asked to identify themselves. This enabled a comparison between the students' expected and actual final grades.

Analysis of SATS Questions
The statistical analyses were conducted using IBM SPSS Statistics 24. A MANOVA model using Wilk's Lambda was executed using the four subscore dimensions (affect, cognitive competency, value, and difficulty) as the dependent variables. Sex, race, and year were the independent factors. The response for expected grade, measured on a 0.0 to 4.3 scale, was included as a covariate. Initial screening was done and showed that the data were appropriate for the basic assumptions. That is, the total subscores for the attributes approximate an interval level scale of measurement, Kolmogorov-Smirov tests showed approximate normality in all cases (p > 0.05), the observations are independent, and there is homogeneity of variance. The results in Table 2 show significant differences between the male and female students, and among the three years. The expected grade covariate was significant.
Further analyses using F tests of between-subject effects (Appendix 2) indicated significant gender differences at the 5% level of significance for every dimension category except the value of learning statistics (p D 0.49). Table 3 shows that males have higher average scores in all SATS categories except value. Female students thus find value in statistical education, but feel less confident in their cognitive ability and are more insecure about taking the class. There was no interaction between gender and the year of the course.
The results indicate that minority and White students are similar in their attitudes toward statistics, and there was no interaction between race and sex or the year of the study.
The F tests of between-subjects for the years 2012, 2013, and 2014 showed significant differences for all dimension subscores except difficulty (p D 0.612). Bonferonni 95% pairwise comparisons showed that for all three significant subscores, students in the Fall of 2012 had lower mean values than those of 2013 and 2014. The latter two years did not show any mean differences. This could be explained by the fact that the 2012 course was a transitional year. The course was offered at what students consider a very early time period, was located in a different part of campus, and was primarily composed of freshmen and non-Business Management majors.
To understand the impact of the expected grade, correlations were computed with each of the subscore dimensions. Using the Pearson Coefficient and a t-test with 609 degrees of freedom, all were found to be significant at p < 0.001. The highest correlations were with affect (0.339) and cognitive competency (0.324). The lowest correlation was between expected grade and value (0.127), while the correlation with difficulty (0.174) was second smallest. This seems to imply that the more students thought they would do well, the more they had a positive attitude toward the material and their capabilities.

Analysis of Expected and Actual Grades
The expected grades were further analyzed to investigate if there are gender or racial/ethnic differences. Note again that students were asked this question before any assignments or tests were given. The majority of male students expected to receive an A or AC while the majority of female students expected to receive an A or A¡ (Figure 1). The expected grades were converted to a GPA scale and an F test was conducted to look at the factors of sex, race, year, and possible interactions. The results shown in Appendix 3 indicate that at the 5% level of significance, both sex and race are significant, while year and interactions are not. The average grade male students expected to receive was 3.95 ( § 0.03), while the average grade female students expected was 3.87 ( § 0.04).
A Bonferroni pairwise comparison test was used to analyze the differences between the racial/ethnic groups. The mean grade expected by Asian students (4.02 § 0.03) was significantly higher than the means of all the other races. There were no other significant differences between the mean expected final grades. The conclusion here is that male students have more confidence in their ability to earn a high grade, with Asian students overall having the highest personal expectations.
Student names were attached to the 2014 survey allowing for comparison of anticipated and achieved final grades. There were 229 respondents, 57% female and 43% male. Appendix 4 shows the gender and racial breakdown of the course that term. Appendix 5 gives the results of a two-way factorial F test on the actual grades earned that term using race and sex as factors. While there was not a significant difference between the male and female students (p D 0.099), it should be noted that the female students had a mean final average of 3.35 ( § 0.07), while the male students had a mean of 3.12 ( § 0.10). There was a significant difference between the races (p < 0.0001). A Bonferonni pairwise comparison showed that the Asian students had a significantly higher mean actual grade. There were no other differences among the other groups. See Table 4 for descriptive statistics for the expected and actual grades for each racial/ethnic group.
The differences between the student's actual and expected grade were analyzed. Using a paired t-test, there was a significant difference overall between students' expected and actual grades (df D 228, p < 0.001). The mean difference was 0.56 with a standard deviation of 0.83. Using plus/minus grading, this indicates that on average, students expected almost two grade levels higher.
Given the large discrepancies between the number of male and female students in each racial group, gender and race were analyzed separately. A two-sample t-test was used to compare the grade differences between the male and female students. The large sample sizes (n F D 130 and n M D 99), and a verification of the assumption of homogeneity of variance, confirmed the appropriateness of the test. Though both groups overestimated their grades on average, male students did so to a significantly larger degree than female students (df D 227, p D 0.005). The average difference between expected and actual grades was 0.74 GPA points for males and 0.43 GPA points for females. While students overall tended to overestimate their grades, some did predict they would receive a lower grade. The minimum by which male students underestimated their grade was 0.3 GPA points while the minimum by which female students underestimated their grade was 1 GPA point.
Racial comparisons between actual and expected grades were performed. Due to the difference in sample sizes and the violation of the assumption of homogeneity of variance (Levene test statistic D 2.83, p D 0.025), the nonparametric Kruskal-Wallis test was used to compare the differences between the actual and expected grades for the 5 groups. This test found significant differences with an H value of 19.87 (df D 4, p D 0.001). Dunn's nonparametric pairwise comparisons found that Asian students rankings were significantly different from all the other groups. They had smaller differences between their expected and actual grades, thus demonstrating realistic expectations. Students that self-identify as African American, Latino, White, and Other believe they can earn higher grades than they actually received. On average though, this difference was the largest for Latino and African American students. Both groups achieved an average grade 0.82 GPA points lower than expected.

Discussion of Survey Results
It is important to carefully consider the setting of this study when examining the results. It was performed at Cornell University, a highly selective institution that draws top students from around the world. The fact that even at this elite institution, female students feel less confident in their ability and have lower positive attitudes toward statistics classes than male students shows the systemic nature of the problem. This is important in light of the fact that there was no significant difference between the grades the two sexes received at the end of the  semester. When professors first introduce students to statistics they should take time to make sure students understand what the subject entails. Many come in believing it will be just like any other math class, and students that have felt uncomfortable in math may carry this attitude into the statistics classroom. This misconception is particularly damaging for female students that have gone through their foundational education believing they are inferior to their male counterparts. Interestingly, there was no interaction between race and sex showing that differences in attitudes toward statistics courses may be attributed primarily to sex. An additional point is that while there was a significant difference between the male and female students, given the Likert scale of the responses, the practical significance may be small. Nevertheless, the fact that the women consistently scored lower than the male students is something to be considered.
All students underperformed as measured by the difference between the grade they expected and the one they actually received. The cause of this issue is unclear; perhaps, some students think an introductory class implies a less challenging course or freshmen may expect to do as well in college level classes as they did in high school. When grouped by race, the largest difference between expected and actual grades occurred for Latino and African American students, whose actual average grades were nearly an entire letter grade lower than expected. Alternatively, Asian students had the highest expected grades and subsequently met their expectations. This demonstrates the importance of considering both racial and gender differences in future studies.

Student Focus Group Discussion
After the survey was analyzed, two focus groups were held to better understand why some female students feel less confident than male students. The decision was made to form single-sex groups to encourage open dialogue. Students from the Fall 2014 class volunteered to be in the sessions. One group consisted of 12 female students and the other group consisted of 11 male students. These sessions were two months into the course, but before the students received their final grades.
Students were asked how they determined their expected grade. There was a striking difference between male and female students' answers. Male students said that they wrote down what grade they wanted to earn but not necessarily the one they thought they would receive. Many of them saw it as a goal setting exercise and wanted to establish a high standard. Female students based their expected grade on previous grades they received in quantitative classes. This could suggest that the two groups of students perceived the question differently. In future students, it will be important to clarify whether we are asking for a "goal" or realistic expectation.
Students were also asked how they would rate their ability in quantitative subjects in comparison with the average Cornell student. All of the male students said their abilities were average or above with none rating themselves below average. The majority of female students rated their abilities as average. The focus groups were then asked if they planned to take additional quantitative courses (beyond those required by their major). The majority of the male group said they did plan to take additional quantitative courses, while the majority of female students said they would not take these courses. Finally, students were asked about the instructor and overall class design. All of them believed that the course was well taught and did not have any structural issues with the class.
The results of the focus group illustrate the different ways male and female students think about their abilities in quantitative subjects. Male students' confident answers were consistent with the findings from the survey and show that they set high goals for themselves. Female students' answers were more realistic and their expected grades tended to be closer to their achieved grades. Future focus groups should investigate the possible factors contributing to the racial differences in expectations and achievements.

Interventions and Strategies for Improvement
Steele's book Whistling Vivaldi: How Stereotypes Affect Us and What we Can Do (2010) describes a number of intervention strategies that could be employed to improve student success. Many of those specifically looking at math courses at the college level are based on the ground-breaking work of Treisman (1992), who did in-depth studies into the causes behind the underperformance of minority students in his calculus classes. After literally following students around for months he noted some key racial differences in how students worked outside the classroom. For example, Asian students studied in groups, formal and informal, more than Black and White students. Asian students also made little distinction between their academic and social lives. White students studied more independently, but they readily sought help from other students and teaching assistants. African American students were intensely independent. After class they returned to their rooms, closed the door and studied long hours (often longer than the Asian or White students). With no one to talk to, the only way to know whether they understood the concept was to check the answer in the back of the book. These same behaviors could be affecting student performance in statistics courses and are an important future area of study.
One promising strategy for improving students' confidence in their ability to succeed in statistics courses is showing testimonials of teaching assistants who struggled with the course material but ultimately were very successful. An "it gets better" message could give students hope that they can improve even if they initially perform below their expectations. This strategy is reinforced by the growth mindset work pioneered by Dweck (2006). Students in the focus group responded positively to this suggestion. Subsequently, a short video was made asking a diverse group of teaching assistants (undergraduate students from previous years) to discuss their experiences when they took the class, which were not all exceptional from the beginning. They also gave tips on how they succeeded in mastering the content. Further studies will attempt to assess the impact of this intervention.
Another tactic Steele noted would be to have students write down their strengths in the subject before they take exams. Reflecting on the aspects of statistics they feel they are strong in could give them more confidence and allow them to perform better. Students in the focus group responded fairly positively to this suggestion. While some of them did not think it would impact them, others said it might have a positive subconscious effect.
This study is useful in thinking about the ways race and gender affect both a student's attitude toward a statistics class, as well as their expectations and ultimate achievement. Educators should continue to explore pedagogical as well as noncognitive methods that will let all our students feel confident in achieving their personal goals.

Supplementary Materials
Appendix 1 presents the SATS survey questions grouped by subscore items, as well as demographic questions. Appendices 2 through 5 present additional ANOVA test results and demographic information. They are available as supplemental data and can be accessed on the publisher's website.