equateIRT Package in R

ABSTRACT Equating test scores between different achievement test versions is important to assure comparability between test takers’ scores. As many items are modelled with item response theory (IRT), it makes sense to also equate the test scores with IRT equating methods. The equateIRT package in R provides a set of functions which implements IRT equating methods including newer extensions. This paper summarizes some of the advances in equating with IRT, reviews the equateIRT package, and demonstrates, through two illustrative examples, some of the key features of the package.


Introduction
When different test versions are used to measure the same ability, it is important to have methods to ensure that the test takers can be compared regardless of which test version they have taken. Equating refers to a family of statistical models and methods that are used to make test scores comparable among different test versions so that the scores can be used interchangeably (González & Wiberg, 2017). There are many different equating methods available depending on the data collection design and what assumptions are made. If you model items with item response theory (IRT; Lord, 1980) a common tool for test constructors when creating and analyzing tests, it makes sense to use IRT equating methods when equating test scores.
To perform IRT equating, one can use the equateIRT package in R (Battauz, 2015), which provides a set of commands that implement traditional IRT equating methods as well as newer extensions. The aim of this article is to summarize some of the advances in equating with IRT, review the equateIRT package, and demonstrate, through two illustrative examples, the capability of the package. In the next section, some theory of equating with IRT will be summarized, then a brief description of the equateIRT features is given. Next, two illustrative examples are given and the paper ends with a conclusion that provides some limitations and suggestions for the future.

Equating with IRT
Assume we have a test form g, then the probability that a test taker with ability θ g answers an item correctly can be modelled with the three-parameter logistic (3PL) IRT model π gi ðθ g ; a gi ; b gi ; c gi Þ ¼ c gi þ ð1 À c gi Þ expðDa gi ðθ g À b gi ÞÞ 1 þ expðDa gi ðθ g À b gi ÞÞ ; where a gi is the item discrimination, b gi is the item difficulty, and c gi is the pseudo item guessing in test form g. D is a constant which is commonly set to 1.7. If we set c gi ¼ 0, we get the 2PL IRT model and also if a gi ¼ 1, we get the 1PL IRT model (Lord, 1980). In order to equate two different test forms, we need to set the parameter estimates on the same scale, which is done with the help of equating coefficients.

Equating coefficients
Let n g be the number of items in test form g and n g À 1 be number of items in common with another test form g À 1. By using the equating coefficients A gÀ1;g and B gÀ1;g , the parameters estimated from test form g À 1 can be transformed to the scale of test form g as follows: ;g , and θ g ¼ A gÀ1;g θ gÀ1 þ B gÀ1;g . The equating coefficients can be estimated either from methods based on response functions or moments of item parameters (Kolen & Brennan, 2014. Chaps 6.3.2-6.3.3). Methods implemented in equateIRT include the response function methods of Haebara and Stocking-Lord and the momentbased methods; mean-mean, mean-geometric mean, and the mean-sigma. B gÀ1;g is estimated in the same way for all moment-based methods and is given by (1) where A gÀ1;g is method dependent. For the mean-sigma method, which will be used in the later illustrative examples, it is defined by The asymptotic covariance matrix for the vector of estimates of the equating coefficients can be derived with the delta method for the equating coefficients and is summarized in Battauz (2015).
The equating coefficient described so far is used when we have a situation where we can use a direct equating between two test forms. If we have more than two test forms and pairs of common items within the test forms then it is possible to equate the test forms using a path so we get an equating chain and we refer to this as an indirect equating. Let the path from test form 0 to test form k be p ¼ 0; 1; :::k f g , then the chain equating coefficients can be defined as and B p ¼ B 0;1;...;k X k g¼1 B gÀ1;g A g;...;k where A g;:::;k ¼ Q k h¼gþ1 A hÀ1;h is the equating coefficient which links test form g to test form k. The asymptotic covariance matrix of the equating coefficient estimates can be obtained similarly as in the case of direct equating. For more details, refer to Battauz (2015).
If two test forms are linked through different paths, it is possible to average the equating relationship to obtain a single transformation which is expected to be more accurate (Kolen & Brennan, 2014, p. 280). In the equateIRT package, the bisector method for equating (Battauz, 2013) yields a weighted average of linear transformations , and n p are optional path weights. With the bisector method, the equating coefficients are defined as Again, it is easy to obtain the asymptotic covariance matrix of the estimates and this feature is implemented in the equateIRT package.
In large-scale assessments, it is common to use several test versions which are linked together through a series of anchor items, i.e. an indirect equating. In order to link several test versions, one typically uses a linkage plan and the feature to see which test forms have common items is implemented in the equateIRT package.

Equating
There are two different equating methods, which are implemented in the equateIRT package; IRT true-score equating (TSE) and IRT observed-score equating (OSE). IRT TSE (Lord, 1980) uses the mean of the conditional score distributions and uses the assumed true score a test taker has. An observed score can be defined as the true score plus an error term. The idea is to equate a true score τ X associated with a given ability θ on test form X with the true score τ Y on another test form Y. The equating transformation can thus be defined as and where $ j is a vector of item characteristics (e.g. item difficulty or item discrimination) (González & Wiberg, 2017). IRT OSE (Lord, 1980) uses the marginal score distributions and IRT models are used to define the involved conditional score probabilities. First, one assume distributions for the test takers abilities and these distributions are integrated or summed across ability levels to obtain marginal observedscore distributions for test forms X and Y. Then, equipercentile equating is applied to these distributions as follows.
where F X and F Y are the cumulative distribution functions which can be obtained by using either Lord and Wingersky (1984) algorithm or through other approaches discussed in González, Wiberg, and von Davier (2016).

The R package equateIRT
The R package equateIRT has implemented functions to estimate the equating coefficients and their corresponding standard errors (SEs) using the previous mentioned methods. The package allows you to get an overview of which test forms have common items and thus give you the linkage plan. If you have two test forms with common items, it can perform direct equating and if you have pairs of common items on multiple test forms, it allows you to use indirect or chain equating. The equateIRT package has both IRT OSE and IRT TSE implemented for dichotomous items. If you have several possible equating paths, you can get a single transformation by averaging the equating coefficients with the bisector method. The R package supports the Rasch, 1PL, 2PL, and 3PL IRT models but it does not estimate the item parameters nor their covariance matrices. Instead, equateIRT allows you to import estimates of the item parameters and the covariance matrices from flexMIRT (Cai, 2013), IRTPRO (Cai, Thissen, & du Toit, 2011), and the R packages ltm (Rizopoulos, 2006) and mirt (Chalmers, 2012). To import item parameter estimates and the covariance matrix, one can use the functions import.ltm(), import.mirt(), import.flexmirt(), and import.irtpro(). The imported data are then used in the equating to obtain analytical SEs for direct, chain, and average equating coefficients. In the next two subsections, two illustrative examples will demonstrate some of the key features of the equateIRT package.

Illustrative equating example
To illustrate how to use equateIRT to perform IRT OSE and IRT TSE, we will use two test forms from a binary scored college admissions test where each form contains 40 common anchor items in the first columns and 80 unique items in the following columns. The used data are freely available and can be downloaded using the provided links. We will start by fitting a 2PL IRT model and obtain the item parameter estimates and their SEs with the R package mirt. The obtained item parameter estimates and SEs are stored in the objects mADMx.2PL and mADMy.2PL which are then read into equateIRT with the import.mirt() function as follows.
In the direc(mods, which, method = "mean-mean", . . .) function, mods is an object of class modIRT which contains item parameter coefficients and their covariance matrix of the forms to be equated. The statement "which" tells the program which test forms to equate. Finally, method states which equating method is used and the alternatives are "mean-mean", "mean-sigma", "meangmean", "Haebara", or "Stocking-Lord". If one provides the covariance matrix, the column StdErr in the output gives you the SE of the equating coefficients A and B. If one has not provided the covariance matrix of the item parameter estimates, the output shows NA. We illustrate these two functions using the mean-sigma method with equating coefficients defined in Equations (1) and (2). > m2plXY <-modIRT(coef = estXY.2PL, var = estXYVar, names = tests, display = FALSE) > t12 <-direc(mods = m2plXY, which=c(1,2),method = "mean-sigma") > summary(t12) Link: Test1. Finally, the score(obj, method = , se = , w = 0.5) function is used to obtain the equated scores and their SEs using either IRT OSE or IRT TSE. For each of the methods, we display the equated scores for the last five values (scores = 76:80) when using the weight w = 1 in the synthetic population. First, we give the codes for IRT OSE, where the columns in the output from left to right show the scores on Test 2 and the equated scores and their SEs (StdErr).

Illustrative example linkage plans and indirect equating
To illustrate linkage plans and chained equating or indirect equating, we use the five data sets in data2pl which comes with the equateIRT package. First, we estimate a 2PL IRT model for the five data sets with the R package mirt. The estimated item parameters and covariances are then read into equateIRT with the import.mirt() function. Below we give the lines for the first data set but the other four are estimated similarly with mirt and imported similarly into equateIRT with the names est2, est3, est4, and est5. > data("data2pl", package = "equateIRT") > library(mirt) > m1 <-mirt(data2pl[[1]],1,itemtype = "2PL", SE=TRUE) > library(equateIRT) > est1 <-import.mirt(m1, display = FALSE) Next, we create a list of coefficients and covariance matrices, name the test forms "test1-test5" and create an object of class modIRT. Similarly to the first numerical illustration we use as input in modIRT() the item parameter estimates and the covariance matrices. > estC5 <-list(est1$coef, est2$coef, est3$coef, est4$coef, est5$coef) > estV5 <-list(est1$var, est2$var, est3$var, est4$var, est5$var) > test5 <-paste("test", 1:5, sep = "") > m2pl <-modIRT(coef = estC5, var = estV5, names = test5) The five data sets have different items in common and to get an overview of the linkage plan, we can use the linkp(coef) function which calculates the number of common items between a list of test forms. From the output, we can see that test form 1 has 10 common items with test form 2 and test form 5. In order to estimate the direct equating coefficients and SEs between test forms 1 and 5, we would use the direc() function with the mean-sigma method. > dir15 <-direc(mods = m2pl, which = c(1,5), method = "mean-sigma") > summary(dir15) Link: test1. Another feature in equateIRT is the possibility to calculate all direct equating coefficients and SEs using IRT methods between all pairs of test forms with common items by using the function alldirec (mods = , method = , . . .). The function takes as input mods which is an object of class modIRT containing the item parameter coefficients and their covariance matrix of the forms to be equated. Similar to the direc() function you can decide which equating method to use and here we illustrate the function with the mean-sigma method. > direclist1 <-alldirec(mods = m2pl, method = "mean-sigma") > direclist1 Direct equating coefficients Method: mean-sigma Links: test1.test2 test1.test5 test2.test1 test2.test3 test3.test2 test3.test4 test4.test3 test4.test5 test5.test1 test5.test4 Another possibility is to use the function chainec(r = NULL, direclist, f1 = NULL, f2 = NULL, pths = NULL) to estimate all chain (indirect) equating coefficients and SEs using IRT methods. The function allows you to specify the length of the chain that is the number of forms used for equating (r). It takes as input direclist which is an object returned from the alldirec function which contains direct equating coefficients between pairs of test forms as seen above. You can also specify the starting test form (f1) and the ending test form (f2) or the specific equating path you prefer to use (pths). First, we illustrate the function by estimating all chain equating coefficient of length r = 3 from test form 2 to test form 5. > cec25 <-chainec(r = 3, direclist = direclist1, f1 = "test2", f2 = "test5") > summary (cec25 If one has different equating paths, the package provides the option to calculate average equating coefficients using the bisector method and SEs given a set of direct and chain equating coefficients using the function bisectorec(ecall = , mods = NULL, weighted = TRUE, unweighted = TRUE). The function takes as input ecall, which is a list of objects of the classes returned from either the function direc() or chainec(). The option weighted is logical and if true weighted bisector coefficients are estimated, likewise the logical option unweighted if TRUE computes unweighted bisector coefficients. > ecall <-c(cec25, chainec2345) > av25 <-bisectorec(ecall = ecall, weighted = TRUE, unweighted = TRUE) > summary(av25) Link: test2. The output gives the equating coefficients and their SEs for different paths, if they are averaged with the bisector method and if a weighted bisector method is used.

Conclusions
The equateIRT package contains a number of suitable features for conducting IRT equating, between both two test forms and when we have a more complicated linkage plan as is common in large-scale assessments. The package is also used as import in the R package kequate (Andersson, Bränberg, & Wiberg, 2013) where it is used when performing IRT observed-score kernel equating. The package is very useful with only minor limitations. It does not allow the user to use mixed item types. A possible way around this is to use a general model and constrain some of the parameters. Another limitation is that it only allows you to equate two forms, although with possible different paths. The two form equating limitation is resolved using the R package equateMultiple (Battauz, 2017) which allows the calculation of equating coefficients between multiple test forms. A third limitation might be that only analytical SEs are implemented. It would be useful if bootstrap SEs were offered as an option within the package. These are minor limitations, which can be solved through use of other packages.
The overall conclusion is that equateIRT is easy to use, flexible, and is a very useful package if researchers or test constructors want to perform equating with IRT.

Funding
This work was supported by the Swedish Research Council grant 2014-578.