equateIRT : An R Package for IRT Test Equating

This introduction to the R package equateIRT is a (slightly) modiﬁed version of Battauz (2015a), published in the Journal of Statistical Software . The R package equateIRT implements item response theory (IRT) methods for equating diﬀerent forms composed of dichotomous items. In particular, the IRT models included are the three-parameter logistic model, the two-parameter logistic model, the one-parameter logistic model and the Rasch model. Forms can be equated when they present common items (direct equating) or when they can be linked through a chain of forms that present common items in pairs (indirect or chain equating). When two forms can be equated through diﬀerent paths, a single conversion can be obtained by averaging the equating coeﬃcients. The package calculates direct and chain equating coeﬃcients. The averaging of direct and chain coeﬃcients that link the same two forms is performed through the bisector method. Furthermore, the package provides analytic standard errors of direct, chain and average equating coeﬃcients.


Introduction
In many testing programs, security reasons require that test forms are composed of different items, making test scores not comparable across different administrations. The equating process permits the comparison of scores obtained from different forms taken. The term equating traditionally refers to the adjustment of scores from parallel forms, that are as similar as possible in content and statistical characteristics (Kolen and Brennan 2014, Chapter 1.1.2 and Chapter 1.2.3). The methods presented in this paper, implemented in the R (R Core Team 2014) package equateIRT (Battauz 2015b), allow more general forms of equating, such as horizontal and vertical scaling. For example, vertical scaling is intended to make scores comparable across different educational grades, where the content of the tests differs accordingly to the educational level. The package is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=equateIRT.
Various statistical methods have been proposed to perform equating (for a broad review, see Kolen and Brennan 2014). The R package equateIRT focuses on item response theory (IRT) methods for dichotomous items. IRT models are statistical models that have as response variable the responses given to the items of a questionnaire or test. These responses are modeled as a function of a latent trait, for example the ability level, and the item parameters that are related to some characteristics of the items, such as the difficulty. The purpose of IRT models is to provide a measure of the latent trait under investigation. For a broad review of IRT models see van der Linden and Hambleton (1997). The IRT models included in the equateIRT package are the three-parameter logistic model, the two-parameter logistic model, the one-parameter logistic model and the Rasch model. When the parameters of the IRT model are estimated separately for each group of people taking a different test form, they are not comparable because the origin of the measurement scale is not identifiable. However, when two forms present a subset of common items, the parameters can be put on the same scale. There are two methods available to pursue this end. Concurrent calibration consists of estimating the item parameters together and yields measurements directly on a common metric. Alternatively, item parameters are estimated separately for each test form (separate calibration) and the estimates of the item parameters for the common items can be used to estimate the scale transformation. Both approaches can be extended to the case of multiple test forms, provided that the forms can be linked through common items. Concurrent calibration presents the advantage of not requiring any conversion after the estimation of the parameters as they are already expressed on the same scale. However, this approach requires the combination of the data from each form in a single dataset that could become quite large when there are many forms or when each form is administered to a very large number of people. In some cases, the large dimension of the dataset then makes the estimation very slow or even unfeasible. Instead, by equating the coefficients of separate calibrations, it is not necessary to estimate again the parameters of previous administrations, thus avoiding the construction of a single dataset.
The conversion of parameter estimates is attained by applying a linear transformation and the coefficients of this transformation are called equating coefficients. For each pair of forms containing common items, direct equating coefficients can be calculated. Suppose that Forms 1 and 2 have common items and that Forms 2 and 3 have common items. Forms 1 and 3 can then be equated employing the chain going through Form 2. In this case, indirect equating coefficients linking Forms 1 and 3 can be calculated as a function of direct equating coefficients linking pairs of forms with common items. In general, when two forms can be linked through a chain of forms that present common items in pairs, indirect equating coefficients can be calculated that permit the conversion of parameter estimates of one form into the scale of the other form using a linear transformation. These coefficients are a function of direct equating coefficients. Furthermore, some linkage designs are quite complex and two forms can be linked through different chains and possibly a direct link. For every path that links the same two forms the equating coefficients can be computed, thus yielding different scale conversions. In this case, the equating coefficients can be averaged in order to obtain a single transformation.
For the computation of direct equating coefficients, the equateIRT package implements methods based on moments of item parameters, that are the mean-sigma (Marco 1977), the meanmean (Loyd and Hoover 1980), and the mean-geometric mean (Mislevy and Bock 1990) methods, and response function methods, that are the Haebara (Haebara 1980) and the Stocking-Lord (Stocking and Lord 1983) methods. The package computes also indirect equating coefficients through a chain of forms. The bisector method is used to average equating coefficients derived from different paths (Holland and Strawderman 2011;Battauz 2013).
Since the estimated equating coefficients are subject to sampling variation, it is important to assess the magnitude of this variability in order to evaluate the accuracy of the equating process (Ogasawara 2011). This objective can be accomplished by observing the asymptotic standard errors of the equating coefficients. Despite the importance of verifying the precision of the equating performed, equating software generally does not provide standard errors of equating coefficients. To the best of my knowledge, just a few computer programs calculate standard errors based on bootstrap techniques. The computation of analytic standard errors of IRT equating coefficients is not implemented in any software, with the only exception being a set of subroutines that implement the methods developed by Ogasawara (2000) and Ogasawara (2001) for direct equating coefficients, available from the author. Instead, the equateIRT package provides analytical standard errors for direct, chain and average equating coefficients. It is important to note that analytical standard errors of equating coefficients can only be computed if the covariance matrix of item parameter estimates is available. This covariance matrix should be obtained from the software used to estimate the item parameters. The package provides functions to import data from flexMIRT (Cai 2013), IRTPRO (Cai, Thissen, and du Toit 2011) and the R packages ltm (Rizopoulos 2006) and mirt (Chalmers 2012). All of these IRT programs supply the covariance matrix of item parameter estimates.
Other R packages provide implementation of IRT equating methods. The package irtoys (Partchev 2014) performs IRT equating for dichotomous items and the package plink (Weeks 2010) implements unidimensional and multidimensional IRT equating for both dichotomous and polytomous items. The plink package performs chain equating by providing direct equating coefficients between forms that present common items, but does not provide indirect equating coefficients that permit the conversion from the first form into the last form of the path. Chain equating is not included in the irtoys package. Furthermore, the packages irtoys and plink do not provide average equating coefficients nor standard errors of direct, indirect and average coefficients.
The paper is structured as follows. The theory on IRT test equating is summarized in Section 2. Section 3 illustrates the equateIRT package and Section 4 concludes the paper.

IRT test equating
Consider a single test form that is denoted by g. In the three-parameter logistic model, the probability of a positive response on item j in form g for a person with ability θ is given by where a gj is the item discrimination parameter, b gj is the item difficulty parameter, c gj is the item guessing parameter and D is a constant typically set to 1.7. We define the parameter vector of form g as α g = (α ⊤ g1 , . . . , α ⊤ gng ) ⊤ , where α gj = (a gj , b gj , c gj ) ⊤ , j = 1, . . . , n g , and n g is the number of items of form g. Item parameters are estimated separately for each form by using the marginal maximum likelihood method (Bock and Aitkin 1981), regarding the person parameter θ as a random variable with standard normal distribution.

Direct equating
Let g −1 be another form that presents n g−1, g items in common with Form g. The parameters estimated for Form g − 1 can be transformed to the scale of Form g by using the following equations where A g−1, g and B g−1, g are the equating coefficients. These coefficients can be estimated by using moments of item parameters (Kolen and Brennan 2014, Chapter 6.3.2), or response function methods (Kolen and Brennan 2014, Chapter 6.3.3). In any case, direct equating coefficients are estimated using only the item parameter estimates for the items in common between two forms, irrespective of the items in common with other forms.
The mean-sigma, mean-mean and mean-geometric mean methods are based on moments of item parameters. In particular, the equating coefficient A g−1, g is given by using the mean-sigma method, using the mean-mean method, and using the mean-geometric mean method, while the equating coefficient B g−1, g is given by for all methods.
The Haebara and Stocking-Lord methods are based on the response function. The equating coefficients using the Haebara method are obtained by minimizing the following function where h(θ) is the density of a standard normal variable, a * g−1j = a g−1j /A g−1, g and b * g−1j = A g−1, g b g−1j + B g−1, g . The Stocking-Lord method requires, instead, the minimization of the function (10) The integrals in Equations 9 and 10 do not have an analytical solution and they are generally approximated using Gaussian quadrature.
Using the delta method, Ogasawara (2000) and Ogasawara (2001) derived the asymptotic covariance matrix for the vector of estimates (Â g−1, g ,B g−1, g ) ⊤ , that is given by where α g−1, g = (α ⊤ g , α ⊤ g−1 ) ⊤ is a vector containing all the item parameters related to Forms g−1 and g, ∂(A g−1, g , B g−1, g ) ⊤ /∂α ⊤ g−1, g is the matrix containing the partial derivatives of A g−1, g and B g−1, g with respect to the item parameters evaluated at their true values, and ACOV(α g−1, g ) is the asymptotic covariance matrix ofα g−1, g . The derivatives depend on the method used to determine the equating coefficients and are given in Ogasawara (2000Ogasawara ( , 2011 for methods based on moments, and in Ogasawara (2001) for response function methods. The estimate of the asymptotic covariance matrix is obtained by inserting item parameter estimates into Equation 11.

Equating chains
Suppose that two forms are linked through a chain of forms that present common items in pairs. Define the path from Form 0 to Form l as p = {0, 1, . . . , l}. According to Battauz (2013), it is possible to obtain the equating coefficients transforming the scale of θ 0 to that of θ l as a function of the direct equating coefficients that link the forms with common items. These coefficients will be referred to as indirect or chain equating coefficients and they are given by and where A g,...,l = l h=g+1 A h−1, h is the coefficient that links Form g to Form l. Similarly to the case of a direct link, the delta method can be exploited to obtain the asymptotic covariance matrix of the vector of estimates (Â p ,B p ) ⊤ , that is where α p = (α ⊤ 0 , α ⊤ 1 , . . . , α ⊤ l ) ⊤ is the vector containing all the item parameters related to the forms that compose the path, ∂(A p , B p ) ⊤ /∂α ⊤ p is the matrix containing the partial derivatives of A p and B p with respect to the item parameters evaluated at their true values, and ACOV(α p ) is the asymptotic covariance matrix of the estimateα p . The derivatives are given in Battauz (2013). The estimate of the asymptotic covariance matrix is obtained by inserting item parameter estimates into Equation 14.
The computation of chain equating coefficients is then performed in two steps: first, the calculation of direct equating coefficients between consecutive forms of a path and second, the determination of indirect equating coefficients as a function of direct equating coefficients. Thus, in this process, only the items in common between two consecutive forms of a given chain are used. If, for example, Forms 0 and 2 present common items, this information is not used to estimate the chain equating coefficients for path p = {0, 1, . . . , l}. This link can be used to constitute a further path that connects Forms 0 and l and a different scale conversion can be derived for this path. In this case, there are two different equatings for the same two forms, and they can be averaged as explained in Section 2.3.

Average equating coefficients
Suppose that two forms are linked through different paths. Define the set of paths that link two Forms 0 and l as P 0l and the linking coefficients related to path p as A p and B p , p ∈ P 0l . The equations that transform the scale of θ 0 to that of θ l are then As observed by Kolen and Brennan (2014, p. 280) and Braun and Holland (1982, p. 44), the equating relationships provided by each path could be averaged to produce a single conversion that is expected to be more accurate. Battauz (2013) proposed using the bisector method, suggested by Holland and Strawderman (2011) to average the equating functions obtained by using different equating methods. The angle bisector, in case of two linear functions that intersect at a point, is the linear function that bisects the angle between them. The bisector method satisfies the symmetry property, which requires that the inverse function of the average equating function equals the average of the inverse functions. Instead, the (weighted) mean of the equating functions does not satisfy the symmetry property, making the bisector method preferable. The bisector method yields a weighted average of the linear transformations (15): where and n p are optional weights associated with each path. The average equating coefficients are then and The asymptotic covariance matrix of the vector of estimates (Â * 0l ,B * 0l ) ⊤ can then be obtained by using the delta method as follows where α = (α p ) p∈P 0l is the vector containing all the item parameters used in the equating process in at least one of the paths in P 0l , ∂(A * 0l , B * 0l ) ⊤ /∂α ⊤ is the matrix containing the partial derivatives of A * 0l and B * 0l with respect to the item parameters evaluated at their true values, and ACOV(α) is a block diagonal matrix composed of all ACOV(α p ), p ∈ P 0l , and it is the asymptotic covariance matrix of the estimateα. The derivatives are given in Battauz (2013). The estimate of the asymptotic covariance matrix is obtained by inserting item parameter estimates into Equation 20.
Obtaining average equating coefficients is then a process that takes place over three phases: First, the estimation of direct equating coefficients, then the calculation of chain equating coefficients for more than one path connecting two forms, and finally the computation of average equating coefficients. The distribution of common items across forms determines which forms can be directly linked in the first phase. From this follows the composition of paths connecting each pair of forms and the computation of chain equating coefficients. There are no restrictions on these connections: Different paths can share some parts, or the same item can be used to link more than one form. The relationship between the average coefficients and the item parameters used to compute them is reflected in the derivatives , that determine the covariance matrix of the average equating coefficients.
A further issue concerns which weights n p to use in averaging. Battauz (2013) proposed determining weights by minimizing the average variance of θ * l , namely assuming that θ 0 has zero mean and variance equal to one.

The equateIRT package
In order to perform equating with the equateIRT package, it is necessary to have previously estimated item parameters. Item parameters can be estimated using R packages, or using external software and then import them into R. To calculate standard errors of equating coefficients it is also necessary to have estimated the covariance matrix of the item parameter estimates. If the program used to estimate the IRT model does not provide the covariance matrix of item parameter estimates or the user is not interested in standard errors of equating coefficients, the covariance matrix can be set to NULL.

Data preparation
To the best of my knowledge, the computer programs for IRT model estimation that export the covariance matrix of parameter estimates are flexMIRT, IRTPRO and the R packages ltm and mirt. The equateIRT package provides functions to import data from these programs. Item parameters and the covariance matrices can be also imported from other programs. In this case, the user should import item parameter estimates and the covariance matrices using the general R function read.table.
For item parameter estimation, the programs flexMIRT, IRTPRO and the R packages ltm and mirt formulate the three-parameter logistic model as a latent trait model (Bartholomew, Knott, and Moustaki 2011), where π j is the probability of a positive response for the jth item, β j1 = −D · a j · b j are easiness parameters, β 2j = D · a j are discrimination parameters, c j are guessing parameters and z = θ the latent ability. In the following, this parameterization will be referred to as the latent trait parameterization. The two-parameter logistic model, the one-parameter logistic model and the Rasch model can be presented with the same formulation. The two-parameter logistic model can be obtained by setting c j = 0, the one-parameter logistic model can be obtained by constraining also β 2j to be constant across items, and the Rasch model can be obtained by setting c j = 0 and β 2j = 1. Furthermore, for a three-parameter logistic model, if the guessing parameters are given under the parameterization this will be called the logistic parameterization in the following. The IRT programs estimate the parameters (c * j , β 1j , β 2j ), for every item j, and then calculate the item parameters of the usual IRT parameterization given in Equation 1 using Equation 23 and the equations The covariance matrix of parameter estimates exported by the programs is related to the vector of parameters (c * j , β 1j , β 2j ). The covariance matrix of (c j , b j , a j ) can be obtained on the basis of the covariance matrix provided by the IRT programs by applying the delta method. The equateIRT package provides this functionality.
The functions provided by the equateIRT package to import item parameter estimates and the covariance matrix are import.ltm, import.mirt, import.flexmirt and import.irtpro. Since ltm and mirt are R packages, functions import.ltm and import.mirt just extract item parameter estimates and the covariance matrix from an object previously created with these packages. The arguments of the functions are: mod: Output object from functions rasch, ltm, or tpm of the ltm package, or from function mirt of the mirt package.
display: Logical value indicating whether the coefficients and the standard errors should be printed. The default is TRUE.
digits: Integer value indicating the number of decimal digits to be used if display is TRUE.
Instead, functions import.flexmirt and import.irtpro read external files previously created with the programs flexMIRT or IRTPRO. The arguments of the functions are: fnamep: The name of the file containing the item parameter estimates. This is a file whose name ends with "-prm.txt".
fnamev: The name of the file containing the covariance matrix of item parameter estimates. This is a file ending with "-cov.txt".
fnameirt: The name of the file containing additional information to link item parameters with the covariance matrix and to identify parameters that have been constrained to a fixed value. This is a file ending with "-irt.txt".
display: Logical value indicating whether the coefficients and the standard errors should be printed. The default is TRUE.
digits: Integer value indicating the number of decimal digits to be used if display is TRUE.
The functions return a list with components: coef: The matrix with item parameter estimates.
var: The covariance matrix of item parameter estimates.
Item parameters are imported under the parameterization given in Equations 22 and 23. The usual IRT parameterization can be obtained later by using function modIRT. An example of importing data from the ltm package is given in Appendix A.
The equateIRT package does not handle mixed item types, but this issue can be easily overcome by using the more general model and introducing constraints on some parameters. For example, in the case of some item responses modeled as a two-parameter logistic model and others as a three-parameter logistic model, the user can specify a three-parameter logistic model for all items and fix the guessing parameter to zero for some items.
If item parameter estimates are obtained with IRT programs other than flexMIRT, IRT-PRO, ltm or mirt, the user has to create a matrix containing the item parameter estimates. These can be provided using typical IRT parameterization (1) or with parameterization (22) and/or (23). In this matrix guessing, difficulty and discrimination parameters should strictly be given in this order and they are contained in different columns of the matrix. It is important that the names of the rows of the matrix are the names of the items because this information is used to link different forms. The covariance matrix is only necessary if the user is interested in obtaining the standard errors of the equating coefficients. It is important that the order of the items in the covariance matrix is the same as in the matrix with the item parameter estimates. So, there are first guessing parameter, then difficulty parameters and finally discrimination parameters. An example of item parameters and covariance matrices organized in this way is given in the datasets contained in the equateIRT package. The package includes three simulated datasets for illustrative purposes containing item parameter estimates and covariance matrices of five forms. The item parameter estimates were obtained with package ltm. In particular, dataset est3pl contains parameters of a three-parameter logistic model, dataset est2pl contains parameters of a two-parameter logistic model and dataset estrasch contains parameters of a Rasch model. Each dataset is composed of a list of length 2 with components: coef: A list of length 5 containing the matrices of item parameter estimates.
var: A list of length 5 containing the covariance matrices of item parameter estimates.
The following section explains how to obtain the equating coefficients.

Data analysis
In this section, we use the dataset est2pl to illustrate the use of the package. The code for loading the package and the data is R> library("equateIRT") R> data("est2pl",package = "equateIRT") To perform the equating, it is necessary to reorganize the data using function modIRT. This function creates an object of class 'modIRT' consisting of a list with length equal to the number of forms where each element contains a lists with components: coefficients: Item parameter estimates.
var: Covariance matrix of item parameter estimates.
itmp: Number of item parameters of the model.
The names of the forms can be specified in function modIRT using argument names. If names is not specified, function modIRT assigns the names. If item parameters are given under the latent trait parameterization, use option ltparam = TRUE. If guessing parameters of a threeparameter logistic model are given under the logistic parameterization, use option lparam = TRUE. The default values of both these arguments is TRUE, so the user does not need to specify them if these parameterizations were used in the estimation of the parameters. Using these options, the item parameters are returned under the usual IRT parameterization as in Equation 1. In this case, the covariance matrix of the item parameters under parameterization of model (1) is obtained, by using the delta method, on the basis of the covariance matrix of item parameters under the parameterizations (22) and/or (23), supplied by the user. If item parameters already conform to the traditional parameterization, the options ltparam and lparam should be set to FALSE, and the function does not perform any transformation. Argument coef is used to specify the item parameter estimates. They should be given as a list of matrices (one for each form) whose row names are the names of the items. Argument var can be used to specify the covariance matrix of item parameter estimates and it should be a list of matrices. If it is left equal to NULL, standard errors of equating coefficients will not be computed. Option display = TRUE can be used to print item parameter estimates and the corresponding standard errors in order to compare them with those provided by the software used to estimate the item parameters. The coefficients can be extracted using the coef method. The linkage plan can be inspected using the function linkp. Given a list of item parameter estimates with item labels as row names, this function computes the number of common items between all pairs of forms and returns a matrix whose elements indicate the number of common items between the forms. On the diagonal of the matrix there are the number of items of each form. The output of the function shows that every form is composed of 20 items and presents 10 items in common with adjacent forms. Furthermore, Forms 1 and 5 present 10 common items.
Function direc calculates direct equating coefficients between two forms with common items. Arguments mod1 and mod2 are objects of class 'modIRT' containing item parameter coefficients and their covariance matrix. Argument method indicates the equating method to use and should be one of "mean-mean", "mean-gmean", "mean-sigma", "Haebara" or "Stocking-Lord". For example, to calculate direct equating coefficients between Form 1 and Form 5 using the Haebara method for the est2pl dataset the code is R> l15 <-direc(mod1 = mod2pl [1], mod2 = mod2pl[5], method = "Haebara") R> l15 Direct equating coefficients Method: Haebara Link: test1.test5 The direc function approximates the integrals in Equations 9 and 10 using a Gauss-Hermite quadrature with 30 points by default. The number of quadrature points can be modified using argument nq. Alternatively, setting quadrature = FALSE the integrals are replaced with a sum over 40 equally spaced values ranging from −4 to 4 with an increment of 0.05 and weights equal to one for all values. Argument D can be used to specify the value of the constant D in model (1) used in the estimation of item parameters. A summary method for displaying the output of the function is available.

R> summary(l15)
Link: test1. The package provides also function alldirec to calculate direct equating coefficients between all pairs of forms that present common items. Using the mean-mean method, the code for our example is R> direclist2pl <-alldirec(mods = mod2pl, method = "mean-mean In order to compute chain equating coefficients, it is necessary to have previously computed direct equating coefficients using function alldirec. Chain equating coefficients can be computed using function chainec. This function requires the specification of the length of the chain using argument r. For example, to compute all chain equating coefficients of length 4 for the est2pl dataset the code is R> cec4 <-chainec(r = 4, direclist  The summary function displays all the equatings performed.
R> pth2 <-paste("test", 1:5, sep = "") R> pth2 <-data.frame(t(pth2), stringsAsFactors = FALSE) R> cec12345 <-chainec(direclist = direclist2pl, pths  When two forms can be linked through different paths, the equating coefficients can be averaged using function bisectorec, that implements the bisector method. Options weighted = TRUE and unweighted = TRUE can be used to calculate the weighted bisector coefficients and the unweighted bisector coefficients. Weights are determined in order to minimize function (21). For example, to calculate bisector and weighted bisector coefficients to equate Forms 1 and 4 through paths {1, 2, 3, 4} and {1, 5, 4}, and Forms 1 and 5 through paths {1, 2, 3, 4, 5} and {1, 5} the code is R> ecall <-c(cec1234, cec154, cec12345, direclist2pl["test1.test5"]) R> fec <-bisectorec(ecall = ecall, weighted = TRUE, unweighted  Also for the output of function bisectorec, it is possible to extract a data frame containing all the coefficients, or select the link and/or the path. Functions direc, alldirec and chainec provide also the scale conversion of item parameter estimates. This is included in the data frame tab, that is part of the outputs of the functions. Function itm extracts this data frame. Some examples for direct and chain equating coefficients are given below.
Since the bisectorec function returns the equating coefficients of a plurality of scale conversions, it does not provide the scale conversion of the item parameters. For average equating coefficients the user can employ function convert, that is a specific function to convert item and person parameters, given the equating coefficients. The following code extracts the equating coefficients obtained with the bisector method to transform from the scale of Form 1 to the scale of Form 4 and performs the scale conversion of item parameters of Form 1 and of an hypothetical vector of person parameters.

Conclusion
The linkage plans can be rather complex involving many forms, several links, chains and the connection of forms through more than one path. This situation necessitates the performance of chain equating in order to link forms that do not present common items. Furthermore, when two forms can be connected through different paths it can be useful to synthesize the conversions obtained into a single one. The equateIRT package includes not only the computation of direct equating coefficients but also permits the computation of chain and average equating coefficients. Another important and exclusive feature of the equateIRT package is the provision of analytical standard errors of equating coefficients. Standard errors are an important tool for assessing the accuracy of the equating process and can be also used to perform further inferential analysis.