Skip to Main Content
19,213
Views
13
CrossRef citations to date
Altmetric

Original Articles

The Impact of Participation in the Advanced Placement Program on Students' College Admissions Test Scores

The Advanced Placement (AP) program is an educational program that permits high school students to take introductory college-level courses and receive college credit by passing a standardized end-of-course exam. Data were obtained from a statewide database of 2 high school graduating cohorts (N = 90,044). We used a series of propensity score analyses and marginal mean weighting through stratification to examine the impact of the AP program on students' academic achievement as measured by ACT scores. Results indicate that merely enrolling in an AP course produces very little benefit for students. Students who take and pass the AP exam, however, obtain higher ACT scores, even after controlling for a wide variety of academic, socioeconomic, and demographic variables. The authors conclude the article by discussing aspects of the AP program that remain unanswered.

The Advanced Placement (AP) program was created in 1952 by the College Board for high-achieving high school students to potentially earn college credit upon passing a test in the corresponding subject. Originally a program intended to close the gap between secondary and college education, today the College Board offers 34 AP tests to more than 2 million students each year (College Board, 2012). The AP program offers classes and corresponding tests for many fields, ranging from biology to English literature to statistics. In order to pass an AP course, students must receive a score of 3, 4, or 5 on an AP test. However, the College Board does not award college credit for successfully passing an AP test; rather, the decision to award credit is made by a student's college or university.

There are many incentives offered to students to take AP tests. Students that score well on AP tests typically believe they are more likely to be accepted into college (Hallett & Venegas, 2011). Also, students can earn high school and college credit for passing the AP exam, which many students believe helps them graduate from college earlier and save money by reducing tuition costs (Hallett & Venegas, 2011). Another economic incentive to take courses comes from the College Board, which offers fee waivers to low-income students to cover the test cost. Similarly, 48 states provide financial assistance for students of low-income families (Dounay, 2007).

These incentives are apparently effective because the number of students taking the AP tests has increased substantially over the past decade. In fact, between 2000 and 2010, the number of students taking the tests more than doubled (College Board, 2010). According to the National Center for Educational Statistics (NCES; 2011b), in the year 2000, the average student was enrolled in only 0.58 AP courses; in 2009, that number rose to 1.08. By 2007, 59.0% of public high school and public charter high schools in the United States offered AP courses (NCES, 2011a, p. 173), and by 2009, 36.3% of public high school graduates had successfully completed at least one AP course (NCES, 2011a, p. 235).

One possible reason for recent increased participation in the AP program is the initiative announced in 2000 by the then-U.S. Secretary of Education Richard Riley and the president of the College Board, Gaston Caperton, to have at least 10 AP courses offered in every high school in the United States (Lichten, 2010). A variety of other policies have been implemented in recent years that have also led to the AP program's spectacular growth. For example, two states—Arkansas and Indiana—have legal mandates that all high schools offer AP courses. Two additional states have mandates that all high schools offer AP courses with some caveats: in South Carolina, small high schools are exempt, and in West Virginia high school personnel have the choice of offering either AP courses or the (much less popular) International Baccalaureate program (Klopfenstein & Thomas, 2010). In addition to mandates, some state governments and the federal government also provide subsidies for students (usually from low-income families) to take AP exams (Klopfenstein & Thomas, 2010).

Benefits of the AP Program

It is widely claimed that students who pass an AP test are more likely to have later success in high school and college—a belief so widely held that it has been called “One of the fundamental underpinnings of the AP Program” (Morgan & Klaric, 2007, p. 1). This foundational assumption about the AP program has been supported by a variety of research studies. For example, students who take AP exams are more likely to enroll in a four-year college (Chawjewski, Mattern, & Shaw, 2011), earn higher grade point averages (GPAs; Flowers, 2008), and earn a bachelor's degree and subsequent higher incomes than non-AP students (Flowers, 2008). There is also an increased likelihood of obtaining an advanced degree for students who have successfully completed AP courses (Bleske- Rechek, Lubinski, & Benbow, 2004). Previous researchers have found that for college students enrolled in an introductory college science course, AP participation and success were associated with higher grades. Students who passed the corresponding AP test (i.e., earned a score of 3, 4, or 5) had the highest grades in the course, while students who were enrolled in the AP class in high school—but did not take the AP test—received lower grades (Sadler, 2010a; Sadler & Tai, 2007a). AP students earned higher introductory college science grades than their classmates who had enrolled in honors or regular level science courses in high school (Sadler, 2010a; Sadler & Sonnert, 2010; Sadler & Tai, 2007a).

When examining success of AP veterans in specific introductory science courses, Sadler and Sonnert (2010) found that students who had passed the AP biology, physics, or chemistry tests earned higher grades in the corresponding introductory science course than students who had either no high school course in the subject or a regular course. However, the authors also found that in some courses—such as chemistry and physics—certain groups of non-AP students obtained equal college grades in the same course as those who had passed the AP exam. Thus, Sadler and Sonnert (2010) contended that the AP program has its benefits, but that these benefits were not uniform.

Research from the College Board supports these positive findings about the AP program. One study showed that students majoring in nine different academic fields achieved higher college GPAs if they passed the AP exam for their respective introductory course in high school (Patterson, Packman, & Kobrin, 2011). Authors of another study sponsored by the College Board showed that students who score at least a 3 on English language, biology, calculus AB, and U.S. history tests had higher GPAs their first year of college and had higher retention rates for their second year of college (Mattern, Shaw, & Xiong, 2009). Mattern et al. (2009) concluded that “the results of this study do provide support for the role of participation in the AP Exam in subsequent college performance and success” (p. 12; for similar results, see Mattern & Klaric, 2007). Similarly, Scott, Tolson, and Lee (2010) found when measuring “students with similar high school rank or SAT scores, those with advanced placement credit significantly outperformed their peers with no advanced placement credit. Performance of AP students was higher, regardless of gender or ethnicity” (p. 30).

In addition, Scott et al. (2010) found that veterans of the AP program also received higher first semester GPAs than students without AP experience. However, while these results show positive impacts of the AP program, Scott et al. only examined student outcomes during the first semester of college; questions of long-term benefits were still unanswered. These questions were addressed by Dougherty, Mellor, and Jian (2006) who, in a study sponsored by the College Board, found that Texas students who passed AP tests in high school were more likely to graduate from college than students who did not enroll in AP classes. Morgan and Klaric (2007) similarly found that students who had successfully passed AP exams were more likely to graduate from college within four years and less likely to drop out or transfer to another university.

College Board–affiliated authors of another study found similar results when they compared the performance of two groups of students from 27 universities in college courses for which introductory courses were prerequisites. One group consisted of AP students who had passed the corresponding AP test and therefore bypassed the introductory course. The second group consisted of students who had instead taken and passed the introductory college course. The authors found that AP students earned equal or higher grades than their traditional college classmates, indicating that students with AP experience equaled or outperformed students who did not have such experience (Morgan & Klaric, 2007).

Another suggested benefit of the AP program may be that it helps students explore potential interests and college majors. This is supported by findings that students who succeeded on the AP calculus and science tests were more likely to earn a bachelor's degree in a science, technology, engineering, and mathematics or life sciences field than students who did not pass (Tai, Liu, Almarode, & Fan, 2010). Although this statistical relationship was not constant across all AP exams and majors of study, related research shows that students who took an AP exam were more likely to major in the same topic and less likely to start college with an undeclared major (Mattern et al., 2011; Morgan & Klaric, 2007).

Criticisms and Questions About the AP Program

Despite the noteworthy body of research in support of the AP program, the findings are not uniformly positive. The number of AP courses taken by a student, for example, has been shown to be unrelated to students' freshman college GPA after controlling for socioeconomic status (SES) and academic variables (Geiser & Santelices, 2004). Klopfenstein (2010) found that earning high grades in AP courses did not reduce students' time to college graduation, contrary to the widespread belief (Farkas & Duffett, 2009) that passing AP tests helps students graduate faster. This assertion has been supported by researchers who find no relationship between level of AP participation and college freshman GPA or retention in college after the freshman year after controlling for demographic and academic variables (Duffy, 2010; Klopfenstein & Thomas, 2009). In some non–College Board studies, AP participation also failed to correlate with degree attainment within five years of starting college (Duffy, 2010; Klopfenstein, 2010) and college GPA (Duffy, 2010). Perhaps these results were found because most AP students only take one or two AP courses and exams, which is a number that is neither sufficient to meaningfully reduce enrollment time in college, nor high enough to differentiate between good college-bound students and truly exceptional ones. Moreover, many students who pass an AP course often still take the corresponding introductory college course (Murphy & Dodd, 2009; Sadler & Sonnert, 2010), which does nothing to shorten students' college time or reduce expenses.

Because of these results, some experts have raised questions about the AP program's effectiveness. Some (e.g., Farkas & Duffett, 2009; Lichten, 2000, 2010; Tai, 2008) contend that the rapid expansion of the AP program has resulted in many students enrolling in AP courses who are not well prepared for advanced course work. This claim is supported by the fact that the expansion of the AP program has been accompanied by a drop in the percentage of students passing AP tests, which has decreased from 64.3% in 2001 to 59.8% in 2011 (College Board, 2012). Also, because the addition of AP courses to a high school's curriculum comes at a cost of resources for other uses, widespread failure of students on the AP test comes at a substantial economic expense for many high schools (Dougherty & Mellor, 2010). The desire to make AP courses available for all students is undoubtedly a well-intentioned and noble policy goal, but it rests heavily on assumptions of the benefits of the program. Klopfenstein and Thomas (2010) explained,

The expansion of the AP program is being conducted with the belief that the benefits of AP experience are universal and large enough to justify supportive subsidies and/or legislation, but AP stakeholders likely overestimate the benefits and underestimate the associated costs. (p. 183)

Some of these noneconomic costs include the larger class sizes for non-AP classes, the best teachers being assigned to teach AP classes, and fewer non-AP courses being offered at high schools (Klopfenstein & Thomas, 2010).

Questions about the AP program's effectiveness have also been raised by university administrators, some of whom are not satisfied with the performance of students who receive lower passing scores (i.e., 3 or 4) on AP exams. Because course content in AP classes is determined by high schools, some college personnel are skeptical that what is taught in the AP classroom meets the rigorous standards at elite universities (Duffy, 2010). Because of this, many selective universities do not award credit for some AP examinations unless the student achieves a score of 4 or even 5 (Duffy, 2010, Farkas & Duffett, 2009; Lichten, 2010; Murphy & Dodd, 2009). The reluctance of an increasing number of universities to grant AP credit for a test score of 3 or 4 may indicate a misalignment between AP exam scores and college course grades. A similar misalignment may also exist between AP exam scores and grades high school students earn in AP courses, as evidenced by the low correlations between the two variables (r = .314–.423; Sadler, 2010a). Thus, many students who earn high grades in their AP courses do poorly on the AP exam. However, this may indicate that AP courses as taught by high school teachers vary in quality, which would be a problem with the implementation of the program, not the design of the AP program as conceptualized by the College Board.

Even when scholars recognize the benefits of AP courses, it is sometimes due to characteristics of the classes, and not the AP program itself. For example, when speaking to students, Sadler (2010d) stated, “you will probably be in a class that has fewer students, those students will likely have stronger backgrounds, and there will be fewer discipline issues” (p. 266; for a similar observation, see Farkas & Duffett, 2009, p. 8). Similarly, Paek, Braun, Ponte, Trapani, and Powers (2010) found that AP biology courses tend to be taught by more experienced teachers and that class sizes were smaller (Farkas & Duffett, 2009). Because these students and teacher characteristics are often shown to be linked to positive academic outcomes, it is possible that any benefits of the AP program may be merely due to more prepared students and more experienced teachers—not the program itself.

Purpose of This Study

In light of the widespread popularity of the AP program and the growing number of questions about the program, we believe that a study of the impact of participation in the AP program on other measures of educational achievement was warranted. By conducting this study we hope to add to the growing body of literature on the AP program conducted by researchers who are unaffiliated with the College Board. This study is designed to examine the impact of the AP program on students' educational achievement.

Using data from every public high school student in Utah, we examined the impact of the level of AP participation in two subjects, English and calculus, on students' ACT scores. Through this study we hope to shed light on the extent of AP participation in Utah and the degree of the program's impact on individual students' academic achievement (as measured by college admissions test scores). The AP English and calculus subjects were chosen for analysis for two reasons. First, they are among the most popular AP courses and tests both in Utah and nationwide; only the United States history AP test compares in popularity (College Board, 2012). The popularity of these tests would likely produce larger group sizes, thereby making our statistical tests easier to conduct. Second, English and mathematics are widely recognized as two of the most important core subjects in many levels of the education curriculum.

We decided to use ACT scores as an outcome variable because previous researchers have found that students participating in the AP program earn higher ACT scores than students who do not take AP courses (Mo, Yang, Hu, Calaway, & Nickey, 2011), although these researchers did not control for confounding variables. Also, because test content is based on material taught in American high schools, as determined through a national survey of high school curricula in the United States (ACT, 2007; Zwick, 2006), the ACT is an excellent measure of academic achievement that is much more standardized than other measures of achievement often used as dependent variables in evaluations of the AP program, such as college grades (Camara & Michaelides, 2005).

We believe that this study is valuable for educational scholars because previous researchers have called for new research into the effectiveness of the AP program through the use of longitudinal state data sets (Klopfenstein & Thomas, 2010) and statistical methods that control for confounding variables (e.g., Sadler & Sonnert, 2010). This study was designed to meet these requests.

Methods

Data Source and Missingness

Data for this study were collected by the Utah State Office of Education (USOE) as part of their efforts to comply with state and federal statutes, such as the No Child Left Behind Act of 2001 (2002). Data were collected for the entire high school career of two different cohorts of students in Utah, each named for the year of expected high school graduation. The 2010 cohort contained a total of 45,448 students, and the 2011 cohort contained a total of 44,596 students. Both cohorts consisted of all students who spent at least part of their Grade 9–12 education in Utah public schools. For the 2010 cohort, data were collected from the 2006–2007 school year through the 2009–2010 school year. Data from the 2011 cohort were collected from the 2007–2008 through 2010–2011 school years. The cohorts were analyzed separately.

Variables collected from students are listed in Table 1. Almost all variables in Table 1 had some missing data. Reasons for the missing data were not always known, but included dropping out of high school, moving into or out of the state, leaving the public school system in order to participate in home schooling or private schools, and more. To compensate for missing data we used multiple imputation to create 20 imputed data sets for the 2010 cohort, which were then analyzed to produce the results in this study. Missing data theorists agree that as more data are missing from a data set, the number of imputations needed to compensate for the missingness increases (Bodner, 2008), although the exact number of imputations needed is not always known (von Hippel, 2005). Graham and his coauthors found that even when the missingness of information was 90%, parameter estimates in a Monte Carlo study were very accurate and the loss of power was only 6.0% lower than power derived from 100 imputations (Graham, Olchowski, & Gilreath, 2007). Because of the large sample in this study, we believed that a 6% loss of power was acceptable and that 20 imputations would be sufficient. For details on which variables were imputed and used to impute missing values from other variables, see Table 1. For definitions of variables and how data were coded, see Table 2.

TABLE 1. Variables Included and Missing Data Percentages for Marginal Mean Weighting Through Stratification Analyses

TABLE 2. Definitions and Coding of Variables Used in Analyses

The SPSS Missing Values add-on was used to generate the missing data from the 2010 cohort. However, multiple imputation only functions properly when data are either missing completely at random (MCAR) or missing at random (MAR). If data are missing not at random (MNAR), then the results will likely be biased. Unfortunately, with any data set it is almost impossible to know with certainty whether the data are MCAR, MAR, or MNAR. Nevertheless, we believe that it is reasonably possible that our data are MAR because we included many variables that have been shown in other studies to be linked with school transfer and leaving, including attendance (Nichols, 2003), student mobility (Gasper, DeLuca, & Estacion, 2012), and SES (Carpenter & Ramirez, 2007), all of which were included in our multiple imputation models. Nevertheless, because the imputation of missing data could still possibly produce statistically biased results, we thought it prudent to only generate missing data from the 2010 cohort, and use the 2011 cohort to duplicate the analyses with only cases that have complete data on a more limited number of variables. If the results from the two sets of analyses were similar, then it would give us confidence that the use of multiple imputation was justified and our data were MAR.

Propensity Score Analysis

As experts on the AP program have mentioned in the past (e.g., Sadler, 2010b), we could not randomly assign students to program participation groups as part of a true experiment in order to infer the causal power of the AP program on students' later academic achievement. As has been stated previously, “The decision to take an AP course in high school is a form of self-selection. The most highly motivated and academically successful high school students are drawn to these challenging courses” (Sadler & Sonnert, 2010, p. 120). A failure to control for pre-existing differences in groups also makes the impact of the AP program appear larger than it really is, as has been demonstrated with empirical data (Klopfenstein & Thomas, 2009; Sadler & Tai, 2007a) and theoretically (Guo & Fraser, 2010). For this reason, we did not think it was beneficial to perform a simple group mean comparison (e.g., an analysis of variance [ANOVA]) in order to determine the impact of the AP program on ACT score differences. We chose to use propensity score matching (Guo & Fraser, 2010) in order to balance the groups as much as possible, a methodological method used by others who have studied the AP program (Sadler & Sonnert, 2010).

Propensity score analysis is a statistical technique that attempts to establish causal inference when data are nonexperimental in nature. A propensity score is the predicted probability that a unit (in this case, a student) would be assigned to a treatment condition (e.g., an AP class) given a set of observed covariates. This can be defined aswhere T is the treatment condition, X = x is a realized set of covariates, and p(x) is the conditional probability of a student attending an AP class.

The next step would be to match members of the two groups through nearest neighbor matching (Arya, Mount, Netanyahu, Silverman, & Yu, 1998), caliper matching (Raynor, 1983), Mahalanobis metric matching (Rubin, 1980), or stratification matching (Rosenbaum & Rubin, 1983). Stratification on propensity scores can balance the students on all observed covariates, allowing the average treatment effect to be unbiased or statistically independent:where is statistical independence and Y(0) and Y(1) are outcomes of the students in the treatment or control conditions. This can easily be expanded to have any number of treatment groups. In other words, the propensity score is an analysis that unconfounds treatment status with the outcomes of interest, allowing causal inference to be made.

Propensity Score Analysis for Ordered Categories

Each cohort in the study was divided into four groups: (a) students who did not take an AP course; (b) students who took an AP course but did not take the AP exam; (c) students who took the AP course and the exam but did not pass the exam because they earned a score of 1 or 2; and (d) students who took the AP course and exam and passed the exam by earning a score of 3, 4, or 5. For the rest of this article these groups are called: (a) non-AP students, (b) exam nonparticipants, (c) exam nonpassers, and (d) exam passers, respectively. For the purposes of this article, exam nonpassers in English are students who took either the AP English literature or AP English language exam and did not earn a passing score on either exam. Students who took multiple AP English exams only had to pass one of them with a score of at least 3 to be labeled an exam passer. Similarly, exam nonpassers in calculus are students who took either the AP calculus AB or AP calculus BC and did not earn a passing score on either exam. Students who took both AP calculus exams only had to pass one of them with a score of at least 3 to be labeled as an exam passer. This decision was made because of complications that arose when students took multiple AP courses (e.g., AP English language and AP English literature two different years) or the same AP course two different years.

Because there are four possible outcome groups, it was possible to create three propensity scores for each student. However, the goal of creating propensity scores is to reduce the impact of all the covariates into a single propensity score; creating three propensity scores because there are more than two outcome groups defeats the point of propensity score analysis and introduces problems (Guo & Fraser, 2010). To deal with this problem we used an analysis method called marginal mean weighting through stratification (MMW-S; Hong, 2010, 2012; Hong & Hong, 2009), which permits conclusions about the effectiveness of a treatment when it has been administered in varying dosages to participants. Hong and Hong (2009) explained that MMW-S

[d]oes not require estimating the average of the conditional treatment effects within subsets of homogeneous units. Rather, the analysis involves separately estimating the population average potential outcome—that is, the marginal mean outcome—of each treatment. The marginal mean outcome of treatment z, denoted by E[Y(z)], is the average outcome we would expect to see if the whole population has been assigned to this treatment. (p. 60)

We followed the steps of MMW-S as described by Hong (2012), which are briefly summarized subsequently:

  1. Estimate a propensity score using multiple logistic regression to estimate the probability that each subject will receive each treatment dosage. In our study the treatment dosages were the four levels of AP participation.

  2. Identify a subset of cases—called the analytical sample—that have a nonzero probability of receiving each of the treatment. This necessitates excluding some cases from further analysis, which limits generalizability. However, this is required because cases that have zero (or near-zero) probability of receiving a given treatment have no comparison cases in the other treatment groups (Hong, 2010).

  3. Stratify the sample on the estimated propensity score. We divided our analytic sample into ten deciles and assigned each sample member a decile number ranging from 1 (for the lowest 10% of propensity scores) to 10 (for the highest 10% of propensity scores). Often, researchers who create groups on the basis of propensity score analyses to create only four (e.g., Fan & Nowell, 2011) or five strata (e.g., D'Agostino, 1998). However, preliminary analysis showed that many covariates were still unbalanced when we divided our sample into a small number of groups, so we divided the sample into ten propensity score groups, which were each homogeneous enough to balance many more covariates.

  4. Calculate a marginal mean weight, which is a weight applied to each sample member. Marginal mean weights are analogous to sampling weights in survey research where certain sample members are weighted to compensate for sampling or response bias (Hong, 2012). In a similar vein, marginal mean weights compensate for the fact that members of various strata are not necessarily equally likely to receive a given dosage.

  5. Ensure that the propensity score matching and marginal mean weighting has balanced the covariates used to create the propensity scores.

  6. Analyze the data using ANOVA.

Data Source and Preparation

Data were collected by USOE and were given to the researchers as several files. One file consisted of student variables (e.g., ethnicity, SES, migrant status, special education status, cumulative high school GPA) and ACT scores. There was also a set of files that consisted of mathematics and English course enrollment data, including specific course grades. Another set of files consisted of AP scores for all AP tests reported to the state of Utah. The first demographic and course enrollment files were combined by matching cases on the basis of a unique student ID and all cases matched successfully. The files containing the AP scores did not contain the unique ID number; these files were combined with the others by matching on the basis of first name, last name, gender, and birth date. Cases that did not match automatically were examined manually by three of the authors to combine cases that were judged likely matches (e.g., two cases—one named Alex and the other Alexander—that had the same birth date, last name, and gender). In total, 99.13% of cases from the file containing the AP scores were matched to a case in the other files.

After missing values were imputed, all students in each cohort were included in the MMW-S analysis that investigated the impact of the AP English on ACT scores. Logistic regression was used to calculate propensity scores for each comparison within the MMW-S analysis using the variables described in Table 1. It is important to note that AP class participation and exam passing variables were not imputed (although they were included in the statistical model that imputed missing values in other variables). Therefore, any students for whom data were missing during the years they would have taken an AP English course—usually students' junior or senior years of high school—were assumed to be non-AP students. Yet, ACT scores were imputed for all students. We thought this was a sensible strategy to handling missing data because this would only bias our estimates for the impact of the AP program on ACT scores by making them slightly more conservative—a risk we were willing to take given the widespread popularity of the AP program and the frequently positively biased results in favor of the AP program's effectiveness (Klopfenstein & Thomas, 2009).

To investigate the impact of the AP calculus on ACT scores, we included only students from each cohort who had taken a precalculus course in their sophomore or junior year of high school. This was done because precalculus is a prerequisite course for AP calculus in Utah school districts. Moreover, we did not find it reasonable to compare the impact of AP calculus classes and tests from students who never were eligible to take calculus and those who had a choice of whether to take very advanced mathematics. Therefore the calculus analysis was conducted on only 10,652 students in the 2010 cohort and 10,948 students in the 2011 cohort. Similar to the AP English MMW-S analysis, AP scores were used to impute missing values in other variables in the 2010 cohort, but were not imputed themselves. Readers should note that because of the systematic elimination of students who did not take precalculus during their sophomore or junior years, we thought it was best to conduct impute data for the two analyses separately because using the same imputed values for two different populations (one representative of all students in Utah public high schools, while the other was representative of only students eligible to take advanced mathematics) was not reasonable.

Balancing covariates

After all propensity scores necessary for conducting the four MMW-S analyses were calculated, it was necessary to examine whether the propensity scores were successful in balancing the covariates among the different groups. This was done by dividing the groups of propensity score into deciles and conducting a hypothesis test to examine whether the outcome group members within each propensity score decile were statistically equal in terms of the covariate (Fan & Nowell, 2011; Rosenbaum & Rubin, 1983). For these hypotheses tests, the covariate was the dependent variable in the regression analysis (either an analysis of variance, logistic regression, or multinomial logistic regression, depending on the level of data of the covariate) and both decile membership and final group outcome as independent variables. Interactions effects were included in the models.

Results

Sample Demographic Information

Table 3 shows the demographic characteristics of the students in the two cohorts in the study. The demographic makeup of the two cohorts (labeled in the AP English columns) is quite similar. However, there are a few differences. First, the 2011 cohort is much more affluent than the 2010 cohort; in the older group of students the percentage of whom were eligible for free or reduced-price lunch ranged from 21.9% to 24.4% of students in the different years of the study, with the proportion of low-SES students declining in later years of the study. However, the 2011 cohort had only 9.0%–14.3% of students eligible for free or reduced-price lunch, with the highest proportion being in the students' senior year. Second, the 2011 cohort is slightly more diverse than the 2010 cohort, with the biggest difference being in multiracial students. It is not clear why the 2011 cohort consisted of more affluent students and more multiracial students than the 2010 cohort. Table 4 shows descriptive statistics of other selected variables, both before and after multiple imputation. We also compared the analytic and full samples for each analysis, which showed us that excluded cases tended to be from students with very low academic achievement (results not shown).

TABLE 3. Demographic Data of Study Participants

TABLE 4. Descriptive Statistics of Selected Variables Before and After Multiple Imputation

To investigate the impact of AP calculus on ACT scores, the students from each cohort who had enrolled in precalculus during their sophomore or junior years were selected as a subgroup for further analysis. Table 3 shows that almost 1,800 more students were eligible for calculus in the 2011 cohort than the 2010 cohort. A comparison of these groups in Table 3 shows other important differences. The students in the 2011 cohort who were eligible for calculus were much more diverse and less affluent than the 2010 cohort. The change in demographic composition of the calculus subgroup was due almost entirely to a large number of increases of diverse students taking precalculus during their sophomore or junior years. It is unclear to us why the 2011 calculus subgroup is so much more diverse and less affluent than the corresponding students from the 2010 subgroup.

Balancing Covariates

Effect sizes were calculated for each regression analysis to determine the degree to which the propensity scores were effective in balancing AP groups on the covariates. Two-way interactions between the stratum and the outcome group were included in all models. Intervally scaled covariates with an η2 value greater than 1.0% and nominal or ordinal covariates with an odds ratio less than 0.70 (or greater than 1.43)—which are mathematically equivalent to an η2 value of 1.0%—are reported in Table 5. We decided to use (admittedly arbitrary) effect size thresholds to judge whether covariates were balanced between outcome groups within propensity score deciles because our sample sizes were sometimes so large that p-values derived from null hypothesis tests would have made even trivial differences between groups statistically significant.

TABLE 5. Covariates Imbalanced After Propensity Score Analysis

As is apparent in the table, the propensity score matching procedure was successful in balancing the vast majority of covariates across groups. The AP English analysis for the 2010 cohort had a total of three imbalanced covariates (of 66 covariates, or 4.5%) and the 2011 cohort had no imbalanced covariates. In the AP calculus analysis for the 2010 cohort, three covariates were imbalanced (of 63 covariates, or 4.8%), and the 2011 cohort had one imbalanced covariate (of six, or 16.7%). Therefore, of the four analysis groups, only the 2011 AP calculus cohort had more than the nominal alpha of 5% of covariates imbalanced. As Table 5 indicates, the effect size measuring this covariate imbalance was η = 1.52%, which we believed was sufficiently small to have a trivial impact on our results.

Marginal Mean Weights

Marginal mean weights were calculated for each of the four analysis groups separately through the process described by Hong (2012). These weights were then applied to each stratum in order to compensate for the nonrandom dosage of the AP program that each student received. In general, weights were small, with a median weight of 1.122 and 90% of weights being below 6.5. It is important to recognize that for all four analysis groups that the highest weight was always for the students who had a high a priori probability of passing the AP test but never even took the AP course. The weights for these students ranged from 11.825 to 204.144, and so readers should remember that comparisons between non-AP students and exam passers may be somewhat unstable. Therefore, only results for adjacent groups (i.e., non-AP students and exam nonparticipants; exam nonparticipants and exam nonpassers; and exam nonpassers and exam passers) are shown. We also thought that this method of the marginal mean outcome values was more practical for readers because it is more likely that in the real world that as the AP program expands students would be most likely to move into the next highest group.

Impact of AP Participation on ACT Scores

AP English: 2010 cohort

Table 6 shows the estimated impact of different levels of AP program participation for adjacent student groups. The values in the table are the marginal mean outcome values, which are defined as the estimated average difference between the two adjacent ordered outcome groups in ACT scores in raw score values. As is apparent in the table, the largest differences between adjacent groups are between exam nonpassers and exam passers, with differences in ACT scores ranging from 1.861 to 3.832 points, equivalent to Cohen's d values from 0.43 to 0.77. Note that all Cohen's d values are calculated with the SD of the entire cohort's ACT scores used as the denominator.

TABLE 6. Total Marginal Mean Outcome Values for the Marginal Mean Weighting Through Stratification Analyses

AP calculus: 2010 cohort

Table 6 also displays results from the MMW-S analysis showing the impact of varying levels of participation in AP calculus on ACT scores. Just as in the results for the AP English analysis for the 2011 cohort, the largest differences between adjacent groups were for exam nonpassers and exam passers. The smallest difference between these two groups was for ACT reading scores (0.440 points, or d = .009) and the largest difference was in ACT math scores (1.844 points, or d = 0.42). Compared to the impact of AP English, these values would indicate that AP calculus has a slightly smaller impact on ACT scores than participation in AP English does.

AP English: 2011 cohort

Results for the AP English MMW-S analysis in Table 6 show the largest difference between adjacent groups was also between exam nonpassers and exam passers. Compared to the 2010 cohort's AP English analysis the marginal mean outcome values for ACT scores is larger for the 2011 cohort, which is consistent with the much smaller number of covariates that were controlled for in the propensity score calculation (see Table 1). Likewise, ACT point differences are larger in the 2011 cohort's results than in the 2010 cohort's results, with the younger cohort's ACT score difference between exam passers and exam nonpassers ranging from 3.419 points (for ACT math) to 5.295 points (for ACT reading). These values correspond to Cohen's d values between 0.69 and 0.91.

AP calculus: 2011 cohort

Similar to the 2011 cohort's AP English analysis, the mean difference values for the AP calculus analysis for the 2011 cohort indicate stronger impacts of the AP program than were observed for the corresponding analysis in the 2010 cohort, most likely due to the smaller number of covariates controlled for in the statistical analysis. Similar to the previous three analyses, the largest difference between adjacent groups of students for exam nonpassers and exam passers. The ACT score differences between these two groups ranged from 2.268 to 3.206 points with Cohen's d values between 0.42 and 0.74.

Discussion

On the basis of the results displayed in Table 6, we can make a few broad conclusions about the impact of participating in the AP program on students' college admissions test scores. The impact is not slight: for AP English the marginal mean outcome value was about 2.8–4.1 points for ACT composite scores; for AP calculus, the impact was about 1–2.7 points for ACT composite scores. We believe that our study can be added to the rich body of literature that indicates that the AP program is beneficial for students (e.g., Chajewski et al., 2011; Duffy, 2010; Long, Conger, & Iatarola, 2012; Mo et al., 2011; Sadler & Sonnert, 2010; Sadler & Tai, 2007a, 2007b; Tai et al., 2010).

We find it interesting that for both cohorts, the effect sizes for the impact of AP English on ACT composite scores are remarkably similar (d = .73 for the 2010 cohort and d = .86 for the 2011 cohort). We believe that these independent results from two different cohorts (and using two different methods of handling missing data) strengthen our conclusion of the positive impact of participation in the AP program. However, these effect sizes are somewhat larger than those found by Sadler and Sonnert (2010) when comparing the differences in grades between college students who had passed the AP exam in biology, chemistry, and physics and their classmates who had never enrolled in a course in that subject. This may be a result of our imputation of the ACT scores for the 2010 cohort or the more academically diverse sample that we had compared to Sadler and Sonnert's sample, which consisted only of college students in introductory science courses.

We also find it very interesting that most of the marginal mean outcome values for the first comparison within each cohort are quite small—almost all are less than one raw score point with even a few negative values. This indicates that merely enrolling in an AP course is not very beneficial for students. Rather, high school students seem to reap the benefits of the AP program by taking the AP test—and especially by passing it by obtaining a score of 3, 4, or 5. Again, these results are consistent with the work of other researchers who have found that the greatest benefit occurs when students pass an AP test (Dougherty & Mellor, 2010; Geiser & Santelices, 2004; Sadler & Sonnert, 2010).

The fact that merely enrolling in an AP course seems to usually only boost students' scores by less than one point is telling from a practical standpoint. This is similar to the increase in ACT scores for students who experience traditional test coaching methods in preparation for college admissions exams (Briggs, 2001). Given our results, we question the wisdom of the College Board's recent push for AP for all students. Such a policy would put many unprepared students in an academically rigorous program that would not provide them with any major benefits (Dougherty & Mellor, 2010; Farkas & Duffett, 2009; Lichten, 2010). Similarly, many high schools are probably unprepared to give some students the academic experience that would lead to sizeable gains in their academic achievement (Geiser & Santelices, 2004; Klopfenstein & Thomas, 2009, 2010). Therefore, it is likely that many students in American high schools would have little chance of passing an AP exam. For these students, merely enrolling in the course may not be the best use of their and their school's time, money, and resources (Klopfenstein & Thomas, 2009).

It is important to realize, however, that despite our complex statistical methods, we were still unable to say that participation in the AP program caused students to have higher ACT scores. Such conclusions can only be drawn from true experiments in which students are randomly assigned to AP courses—something that we were unable to do in our secondary analysis of data from USOE. Nevertheless, there is no other study of the AP program that has controlled for as many confounding variables as we did in this article, and we believe that this makes our study among the strongest ever conducted on the AP program. Moreover, the fact that the results were obtained from the 2010 cohort and then replicated with in the 2011 cohort strengthens our conclusions about the relationship between participation in the AP program and students' college admissions test scores.

Limitations

One major statistical assumption that underlies propensity score analysis is that the decision of subjects to enter one outcome group or another is dependent solely on the covariates used to create the propensity scores (Guo & Fraser, 2010; Long et al., 2012). Although we control for many different covariates in this study, we do not think that this assumption is tenable in this situation because AP students have been shown to differ in ways that were not controlled for in our study, such as levels of parental education (Sadler, 2010c) and motivation (Paek et al., 2010). Moreover, in this study we did not control for differences that exist between non-AP and AP classes, the latter of which tend to have higher quality teachers with advanced degrees and smaller class sizes (Klopfenstein & Thomas, 2010; Paek et al., 2010). Therefore, any differences between the various groups in the study would statistically be attributed entirely to the impact of the AP program. This leads us to conclude that the results in this article should be considered upper limits of the effectiveness of the AP program.

In many studies in which propensity score analyses have been conducted, the authors often conduct a sensitivity analysis to investigate the potential maximum impact of an unobserved covariate (Guo & Fraser, 2010). However, because of the nature of the data (i.e., large imputed datasets, multiple treatment categories) and the cutting edge nature of the MMW-S method, we have been unable to find a sensitivity analysis method in the published literature, which could be used in our study. Given the magnitude of the differences between exam passers and exam nonpassers (1.861–5.295 points), we believe it is highly unlikely that any unobserved covariate(s) could completely abrogate the impact of the AP test on ACT scores in our sample.

Also, many AP students take more than one AP course, whereas in this study we analyzed the courses and tests' impact in isolation. Yet, if the impact of the AP program is cumulative (as is possible, given findings from Geiser & Santelices, 2004; Mo et al., 2011), then it is possible that the results we show in this study are overestimates of the impact of any individual AP exam. This, we believe, is another reason why readers should regard our results as the upper bound for the impact of AP courses and tests on student achievement.

Finally, the variables we examined in this study consisted solely of student variables. Therefore, important influences of school and teacher characteristics are completely neglected. An investigation of the AP program which uses hierarchical linear modeling (Warne et al., 2012) or other methods used to analyze nested data could help researchers understand whether AP courses have a unique impact on student achievement or whether offering AP courses is merely a symptom or consequence of high school quality.

Remaining Questions

Although this study makes a noteworthy contribution to the burgeoning literature on the AP program, further research needs to be conducted on other aspects of the AP program that this study does not address. For example, the consequences of policies, such as weighting GPAs for AP participation, legal mandates that high schools provide AP courses, and financial incentives—such as fee subsidies—for exam participation, need to be addressed further in the empirical literature. Researchers should also disentangle the many variables that seem to contribute to more favorable outcomes among AP students: student variables (e.g., motivation, prior academic preparation, SES variables), class characteristics (e.g., teacher experience, smaller class size, more financial support for AP courses), and the AP program itself (e.g., the standardized final exam, incentives for college credit, etc). To date, no study has been conducted that explains which of these groups of variables is most influential in producing the favorable academic outcomes among AP students.

Conclusion

Despite its limitations, there are noteworthy positive aspects to this study. One is the sheer sample size drawn from every public school cohort member in a state, which makes this study among the most academically diverse on the impact of AP programs on academic achievement. Other studies (e.g., Geiser & Santelices, 2004; Tai et al., 2010) have used more academically elite portions of the student population to judge the impact of the AP program. Our study also has the added benefit of having a large proportion of students who never participated in the AP program to function as a control group. This study also joins the small number of studies (e.g., Sadler & Tai, 2007a, 2010) that distinguishes among different levels of academic rigor in students' high school education, ranging from basic courses to taking an AP course and passing the exam.

Overall, we believe that we have found strong empirical evidence that participation in AP English and AP calculus courses is not beneficial to students who merely enroll in the courses, has some benefits to students who take the AP exam but do not pass it, and is most beneficial to those students who take and pass the exam. Our study leads us to concur with Doughtery and Mellor's (2010) opinion that “it matters greatly whether students take and pass AP exams. There is little evidence that simply increasing the number of students taking AP courses will have an impact … if students do not demonstrate mastery on the exams” (p. 220). We also believe that these results show the importance and benefits of the AP program for high school students hoping to go to college. Our MMW-S analysis provides the best quasi-experimental evidence available that students' ACT scores increase as a result of participation in the AP program. Thus, this study provides evidence that for college-bound high school students, proficiency in AP courses may be a helpful component to their high school education.

AUTHORS NOTE

Russell T. Warne is an Assistant Professor of Psychology in the Department of Behavioral Science at Utah Valley University. He is interested in advanced academics, gifted education, and quantitative methods. He has previously published in Educational Researcher, Gifted Child Quarterly, and Behavior Research Methods.

Ross Larsen has worked as a post-doctoral researcher and researcher scientist at the University of Virginia and as an Assistant Professor at Virginia Commonwealth University. Dr. Larsen is now an Assistant Professor in the Department of Instructional Psychology & Technology at Brigham Young University.

Braydon Anderson was affiliated with Utah Valley University and is currently working at Qualtrics as a Product Growth Manager.

Alyce J. Odasso was affiliated with Utah Valley University and is currently pursuing her doctoral degree in Research, Measurement, and Statistics at Texas A&M University.

    REFERENCES

  • ACT. (2007). The ACT technical manual. Retrieved from http://www.act.org/aap/pdf/ACT_Technical_Manual.pdf [Google Scholar]
  • Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., & Wu, A. Y. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM, 45, 891923. doi:10.1145/293347.293348 [Crossref], [Web of Science ®][Google Scholar]
  • Bleske-Rechek, A., Lubinski, D., & Benbow, C. P. (2004). Meeting the educational needs of special populations: Advanced Placement's role in developing exceptional human capital. Psychological Science, 15, 217224. doi:10.1111/j.0956-7976.2004.00655.x [Crossref], [PubMed], [Web of Science ®][Google Scholar]
  • Bodner, T. E. (2008). What improves with increased missing data imputations? Structural Equation Modeling, 15, 651675. doi:10.1080/10705510802339072 [Taylor & Francis Online], [Web of Science ®][Google Scholar]
  • Briggs, D. C. (2001). The effect of admissions test preparation: Evidence from NELS:88. Chance, 14, 1018. doi:10.1080/09332480.2001.10542245 [Taylor & Francis Online][Google Scholar]
  • Camara, W., & Michaelides, M. (2005). AP use in admissions: A response to Geiser and Santelices. Retrieved from http://research.collegeboard.org/sites/default/files/publications/2012/7/misc2005-1-ap-use-admissions-geiser-santelices.pdf [Google Scholar]
  • Carpenter, D. M. II, & Ramirez, A. (2007). More than one gap: Dropout rate gaps between and among Black, Hispanic, and White students. Journal of Advanced Academics, 19, 3264. doi:0.4219/jaa-2007-705 [Crossref][Google Scholar]
  • Chajewski, M., Mattern, K. D., & Shaw, E. J. (2011). Examining the role of Advanced Placement exam participation in 4-year college enrollment. Educational Measurement: Issues & Practice, 30(4), 1627. doi:10.1111/j.1745-3992.2011.00219.x [Crossref][Google Scholar]
  • College Board. (2010). Annual AP program participation 1956–2010. Retrieved from http://professionals.collegeboard.com/profdownload/AP-Annual-Participation-2010.pdf [Google Scholar]
  • College Board. (2012). Summary reports: 2012. Retrieved from http://www.collegeboard.com/student/testing/ap/exgrd_sum/2012.html [Google Scholar]
  • D'Agostino, R. B. (1998). Tutorial in biostatistics: Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine, 17, 22652281. Retrieved from http://web.pdx.edu/˜nwallace/EPA/Dagostino1998.pdf [Crossref], [PubMed], [Web of Science ®][Google Scholar]
  • Dougherty, C., & Mellor, L. T. (2010). Preparing students for Advanced Placement: It's a pre-K issue. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 219232). Cambridge, MA: Routledge. [Google Scholar]
  • Dougherty, C., Mellor, L., & Jian, S. (2006). The relationship between Advanced Placement and college graduation (2005 AP Study Series, Report 1). Retrieved from http://www.nc4ea.org/files/relationship_between_ap_and_college_graduation_02-09-06.pdf [Google Scholar]
  • Dounay, J. (2007). Advanced Placement: Subsidies for testing fees. Retrieved from http://mb2.ecs.org/reports/Report.aspx?id=1003 [Google Scholar]
  • Duffy, W. R. II. (2010). Persistence and performance at a four-year university. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 139163). Cambridge, MA: Routledge. [Google Scholar]
  • Fan, X., & Nowell, D. L. (2011). Using propensity score matching in educational research. Gifted Child Quarterly, 55, 7479. doi:10.1177/0016986210390635 [Crossref], [Web of Science ®][Google Scholar]
  • Farkas, S., & Duffett, A. (2009). Growing pains in the Advanced Placement program: Do tough trade-offs lie ahead? Washington, DC: Routledge. [Google Scholar]
  • Gasper, J., DeLuca, S., & Estacion, A. (2012). Switching schools: Revisiting the relationship between school mobility and high school dropout. American Educational Research Journal, 49, 487519. doi:10.3102/0002831211415250 [Crossref], [PubMed], [Web of Science ®][Google Scholar]
  • Geiser, S., & Santelices, V. (2004). The role of Advanced Placement and honors courses in college admissions Routledge Report No. 4.04). Retrieved from http://escholarship.org/uc/item/3ft1g8rz [Google Scholar]
  • Graham, J., Olchowski, A., & Gilreath, T. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206213. doi:10.1007/s11121-007-0070-9 [Crossref], [PubMed], [Web of Science ®][Google Scholar]
  • Guo, S., & Fraser, M. W. (2010). Propensity score analysis: Statistical methods and applications. Thousand Oaks, CA: Routledge. [Google Scholar]
  • Hallett, R. E., & Venegas, K. M. (2011). Is increased access enough? Advanced Placement courses, quality, and success in low-income urban schools. Journal for the Education of the Gifted, 34, 468487. doi:10.1177/016235321103400305 [Crossref][Google Scholar]
  • Hong, G. (2010). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35, 499531. doi:10.3102/1076998609359785 [Crossref], [Web of Science ®][Google Scholar]
  • Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for evaluating multivalued and multiple treatments with nonexperimental data. Psychological Methods, 17, 4460. doi:10.1037/a0024918 [Crossref], [PubMed], [Web of Science ®][Google Scholar]
  • Hong, G., & Hong, Y. (2009). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31, 5481. doi:10.3102/0162373708328259 [Crossref], [Web of Science ®][Google Scholar]
  • Klopfenstein, K. (2010). Does the Advanced Placement program save taxpayers money? The effect of AP participation on time to college graduation. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 189218). Cambridge, MA: Routledge. [Google Scholar]
  • Klopfenstein, K., & Thomas, M. K. (2010). Advanced Placement participation: Evaluating the policies of states and colleges. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 167188). Cambridge, MA: Routledge. [Google Scholar]
  • Lichten, W. (2000). Whither Advanced Placement? Education Policy Analysis Archives, 8(29). [Google Scholar]
  • Lichten, W. (2010). Whither advanced placement—now? In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 233243). Cambridge, MA: Routledge. [Google Scholar]
  • Long, M. C., Conger, D., & Iatarola, P. (2012). Effects of high school course-taking on secondary and postsecondary success. American Educational Research Journal, 49, 285322. doi:10.3102/0002831211431952 [Crossref], [Web of Science ®][Google Scholar]
  • Mattern, K. D., Shaw, E. J., & Ewing, M. (2011). Advanced Placement exam participation: Is AP exam participation and performance related to choice of college major? (College Board Research Report No. 2011–6). New York, NY: Routledge. [Google Scholar]
  • Mattern, K. D., Shaw, E. J., & Xiong, X. (2009). The Relationship between AP exam performance and college outcomes (College Board Research Report No. 2009–4). Retrieved from http://professionals.collegeboard.com/profdownload/pdf/RR2009-4.pdf [Google Scholar]
  • Mo, L., Yang, F., Hu, X., Calaway, F., & Nickey, J. (2011). ACT test performance by Advanced Placement students in Memphis City schools. The Journal of Educational Research, 104, 354359. doi:10.1080/00220671.2010.486810 [Taylor & Francis Online], [Web of Science ®][Google Scholar]
  • Morgan, R., & Klaric, J. (2007). AP students in college: An analysis of five-year academic careers (College Board Research Report No. 2007–4). New York, NY: Routledge. [Google Scholar]
  • Murphy, D., & Dodd, B. (2009). A comparison of college performance of matched AP and non-AP student groups (College Board Research Report No. 2009–6). New York, NY: Routledge. [Google Scholar]
  • National Center for Educational Statistics. (2011a). Digest of education statistics 2011 (NCES Report 2012–001). Retrieved from www.nces.ed.gov/pubs2012/2012001.pdf [Google Scholar]
  • National Center for Educational Statistics. (2011b). NAEP data explorer (Online data aggregator). Retrieved from http://nces.ed.gov/nationsreportcard/naepdata/ [Google Scholar]
  • Nichols, J. D. (2003). Prediction indicators for students failing the State of Indiana high school graduation exam. Preventing School Failure, 47, 112120. doi:10.1080/10459880309604439 [Taylor & Francis Online][Google Scholar]
  • No Child Left Behind Act of 2001, Pub. L. No. 107–110, § 115, Stat. 1425 (2002). [Google Scholar]
  • Paek, P. L., Braun, H., Ponte, E., Trapani, C., & Powers, D. E. (2010). AP biology teacher characteristics and practices and their relationship to student AP exam performance. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 6384). Cambridge, MA: Routledge. [Google Scholar]
  • Patterson, B. F., Packman, S., & Kobrin, J. L. (2011). Advanced Placement exam-taking and performance: Relationships with first-year subject are college grades. New York, NY: Routledge. [Google Scholar]
  • Raynor, W. J., Jr. (1983). Caliper pair-matching on a continuous variable in case-control studies. Communications in Statistics-Theory and Methods, 12, 14991509. doi:10.1080/03610928308828546 [Taylor & Francis Online], [Web of Science ®][Google Scholar]
  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 4155. doi:10.2307/2335942 [Crossref], [Web of Science ®][Google Scholar]
  • Rubin, D. B. (1980). Bias reduction using Mahalanobis-metric matching. Biometrics, 293298. doi:10.2307/2529981 [Crossref], [Web of Science ®][Google Scholar]
  • Sadler, P. M. (2010a). Advanced high school coursework and college admission decisions. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 245261). Cambridge, MA: Routledge. [Google Scholar]
  • Sadler, P. M. (2010b). Advanced Placement in a changing educational landscape. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 316). Cambridge, MA: Routledge. [Google Scholar]
  • Sadler, P. M. (2010c). How are AP courses different? In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 5161). Cambridge, MA: Routledge. [Google Scholar]
  • Sadler, P. M. (2010d). Key findings. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 263270). Cambridge, MA: Routledge. [Google Scholar]
  • Sadler, P. M., & Sonnert, G. (2010). High school Advanced Placement and success in college coursework in the sciences. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 119137). Cambridge, MA: Routledge. [Google Scholar]
  • Sadler, P. M., Sonnert, G., Tai, R. H., & Klopfenstein, K. (2010). AP: A critical examination of the Advanced Placement program. Cambridge, MA: Routledge. [Google Scholar]
  • Sadler, P. M., & Tai, R. H. (2007a). Advanced Placement exam scores as a predictor of performance in introductory college biology, chemistry and physics courses. Science Educator, 16(2), 119. [Google Scholar]
  • Sadler, P. M., & Tai, R. H. (2007b). Weighting for recognition: Accounting for Advanced Placement and honors courses when calculating high school grade point average. NASSP Bulletin, 91, 532. doi:10.1177/0192636506298726 [Crossref][Google Scholar]
  • Scott, T. P., Tolson, H., & Lee, Y. (2010). Assessment of Advanced Placement participation and university academic success in the first semester: Controlling for selected high school academic abilities. Journal of College Admission, 208, 2630. [Google Scholar]
  • Tai, R. H. (2008). Posing tougher questions about the Advanced Placement program. Liberal Education, 94(3), 3843. [Google Scholar]
  • Tai, R. H., Liu, C. Q., Almarode, J. T., & Fan, X. (2010). Advanced Placement course enrollment and long-range educational outcomes. In P. M. Sadler, G. Sonnert, R. H. Tai, & K. Klopfenstein (Eds.), AP: A critical examination of the Advanced Placement program (pp. 109118). Cambridge, MA: Routledge. [Google Scholar]
  • U.S. Census Bureau. (2011). Age and sex composition: 2010 (Census Report C2010cR-03). Retrieved from http://www.census.gov/prod/cen2010/briefs/c2010cr-03.pdf [Google Scholar]
  • Von Hippel, P. T. (2005). How many imputations are needed? A comment on Hershberger and Fisher (2003). Structural Equation Modeling, 12, 334335. doi:10.1207/s15328007sem1202_8 [Taylor & Francis Online], [Web of Science ®][Google Scholar]
  • Warne, R. T., Li, Y., McKyer, E. L. J., Condie, R., Diep, C. S., & Murano, P. S. (2012). Managing clustered data using hierarchical linear modeling. Journal of Nutrition Education and Behavior, 44, 271277. doi:10.1016/j.jneb.2011.06.013 [Crossref], [PubMed], [Web of Science ®][Google Scholar]
  • Zwick, R. (2006). Higher education admissions testing. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 647679). Westport, CT: Routledge. [Google Scholar]