SNSequate : Standard and Nonstandard Statistical Models and Methods for Test Equating

Equating is a family of statistical models and methods that are used to adjust scores on two or more versions of a test, so that the scores from diﬀerent tests may be used interchangeably. In this paper we present the R package SNSequate which implements both standard and nonstandard statistical models and methods for test equating. The package construction was motivated by the need of having a modular, simple, yet comprehensive, and general software that carries out traditional and new equating methods. SNSequate currently implements the traditional mean, linear and equipercentile equating methods, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord item response theory linking methods. It also supports the newest methods such as local equating, kernel equating, and item response theory parameter linking methods based on asymmetric item characteristic functions. Practical examples are given to illustrate the capabilities of the software. A list of other programs for equating is presented, highlighting the main diﬀerences between them. Future directions for the package are also discussed.


Introduction
Many of the decisions made by administrative or policy makers in an educational system are based on examinees' scores. In making decisions, common practices are the comparison of scores in multiple forms of the same assessment. Equating is a family of statistical models and methods that are used to adjust scores on two or more versions of a test, so that scores on these versions may be used interchangeably (see, e.g., Holland and Rubin 1982;Kolen and Brennan 2004;von Davier, Holland, and Thayer 2004;Dorans, Pommerich, and Holland 2007;von Davier 2011b). The goal in equating is to obtain an appropriate transformation that maps the scores of one test form into the scale of the other. Certain requirements concerning the measured construct, the reliability of test forms, the symmetry of the transformation, and the equity and population invariance principles, are needed for this mapping to be validly called an equating (for details on these requirements, see Kolen and Brennan 2004, Section 1.3;Dorans and Holland 2000).
Methods for test equating can be classified in two main classes: observed-score equating (OSE) and item response theory (IRT) equating. Examples of OSE methods are the mean, linear, and equipercentile equating; the Tucker, Levine observed-score, and Levine true-score methods; the (Gaussian) kernel method of equating, among others. IRT methods include true score and observed score IRT equating, and the class of IRT parameter linking methods such as mean-mean, mean-sigma, Haebara and Stocking-Lord methods. A good summary of the above mentioned techniques can be found in Kolen and Brennan (2004), and von Davier et al. (2004). We refer to these two groups of traditional equating methods as standard.
The development of new theoretical and sophisticated equating methods is nowadays common in equating research (see von Davier 2011a,b,c). Some are completely novel while others are extensions of the standard methods. For example, von Davier (2011b) contains methods such that topics that are typically found in statistics books, (e.g., exponential families, Bayesian nonparametric models, time-series analysis, etc.) are explicitly put into an equating framework (González 2013). Also, hybrid methods such as local equating (van der Linden 2011), and the Levine nonlinear method (von Davier, Fournier-Zajac, and Holland 2007) have emerged as new possibilities in equating. We refer to the group of new and more theoretical and sophisticated equating methods as nonstandard.
While nonstandard equating methods accommodate issues that standard methods do not handle well (von Davier 2011b), they have not been widely adopted by practitioners. One reason for this is the lack of software that implements new equating methods. The aim of this paper is to introduce the R (R Core Team 2014) package SNSequate (González 2014) which intends to fill this gap. The package supports both standard and nonstandard statistical models and methods for test equating. Currently, SNSequate implements the traditional mean, linear and equipercentile equating methods; the mean-mean, mean-sigma, Haebara, and Stocking-Lord IRT parameter linking methods; and the (Gaussian) kernel method of equating (KE). Nonstandard methods such as local equating, IRT parameter linking based on asymmetric item characteristic functions, and the implementation of the logistic and uniform kernels in the KE framework are also available. Additionally, many other methods will be implemented in future versions of the package (see Section 5). Key distinguishing issues that make SNSequate different from current software for equating are: (i) it is the only software that currently implements local equating, (ii) it is also the only software that implements IRT parameter linking based on asymmetric ICCs, (iii) it includes many data sets that have appeared and have been well studied in the equating literature, which helps in understanding better the implemented methods.
The rest of the paper is organized as follows: Section 2 gives an introduction to equating and briefly describes methods under both the OS and IRT approaches. In Section 3 the functions that are included in the R package SNSequate are described. All the capabilities of SNSequate are illustrated by several examples in Section 4. We conclude the paper with final comments and ideas for future research.

The statistical inference problem in equating
Let X and Y be the random variables denoting the scores on tests X and Y which are to be equated. In what follows we assume that scores on X are equated to the scale of scores on Y, but arguments and formulas for the reverse equating are analogue. Let F (x) and G(y) be the associated cumulative distributions functions (CDF) of the scores. We are interested in a transformation y = ϕ(x) which equates the scores of X into Y.
All the transformations in the equating literature are based on the so-called equipercentile function, defined as Sum scores (i.e., total number of correct responses) are commonly used test scores in measurement programs. Because the possible values that sum scores can take are consecutive integers, an evident problem with (1) is the discreteness of the score distributions rendering their inverse functions unavailable. The common solution to this problem is to continuize the discrete distributions F and G. Different continuization methods can be used each producing parametric, nonparametric, or semi-parametric statistical inference about ϕ (González and von Davier 2013). Regardless of the statistical approach adopted, the parameters (which are either finite or infinite-dimensional or a mixture of both) are estimated using empirical scores producing sampling variability associated with the estimated equating functions, which is quantified by the standard error of equating (SEE).
An important aspect in the estimation of the equating transformation concerns the way in which data should be collected so that differences between the tests, X and Y, do not confound the assessment of the examinee differences in ability (von Davier 2013). Different approaches consider data collection designs that use either common examinees or common items. Among those that assume common examinees, are the single group design (SG), in which the same group of examinees are administrated two different forms; the equivalent groups design (EG), in which students are randomly assigned the form to be administrated; and the counterbalanced design (CB) in which two random samples of examinees from a common population take both tests in different order. There also exist designs resorting in common items such as in the nonequivalent groups with anchor tests design (NEAT), where the two groups of students each taking one of the test forms, are sampled from different populations (for details see, e.g., von Davier et al. 2004, Chapter 2;Kolen and Brennan 2004, Section 1.4).

Observed-score equating methods
In what follows, we first briefly describe observed-score equating methods, showing in each case the form of the equating transformation ϕ(·) to be estimated. Detailed descriptions of these methods can be found in Kolen and Brennan (2004), von Davier et al. (2004), and von Davier (2011b).

Mean and linear equating
In mean equating, the score distributions F and G only differ in their means. The scores from two tests that are the same distance from their respective means are set equal: x−µ x = y−µ y . It follows that Linear equating further assumes that the score distributions differ in their means and standard deviations. Scores from two tests are assumed to be an equal distance from their means in standard deviation units are set equal: x−µx σx = y−µy σy . It follows that Theorem 1.1 in von Davier et al. (2004) summarizes the connection between (3) and (1). Note that in practice, the transformation of scores is made using the estimated equating function ϕ = ϕ(x;π), whereπ = (μ x ,μ y ,σ x ,σ y ) are directly estimated from the data. Moreover, because the equating transformation depends on parameters, both mean and linear equating methods are parametric.

Equipercentile equating
The equipercentile method generalizes mean and linear equating by not only considering differences between the first two moments, but differences between the entire score distributions.
A function e Y (x) is called an equipercentile transformation if the distribution of scores which results from the conversion of X scores to the Y scale is equal to G (the distribution of scores on Y in the population). It follows that To employ (4) for equating, both F and G are typically continuized by linear interpolation. Because ϕ is built from (distribution) functions, the equipercentile equating method is nonparametric.

The kernel method of equating (KE)
In KE, the originally discrete score distributions F and G are continuized by using kernel techniques (Silverman 1986). Let h X be a parameter controlling the degree of smoothness for the continuization. von Davier et al. (2004) showed that if V ∼ N (0, 1) (a standard normal distribution) and the continuized variable is defined as , and X is the originally discrete score random variable, then the Gaussian kernel smoothing of F (x), defined as is exactly the CDF of X(h X ). It should be mentioned that continuization with alternative kernels other than the Gaussian is also possible (Lee and von Davier 2011). The conversion of scores is finally based on the estimated equation function wherer andŝ are vectors of estimated score probabilities defined as r j = P(X = x j ) and s k = P(Y = y k ) respectively, with x j and y k taking values between 0 and the total number of items in X , and Y, respectively. Both,r andŝ are obtained using the so-called design functions (DF), which take into account the chosen data collection design in the estimation. This process is typically made after smoothing the (discrete) observed score frequency distributions (univariate and/or bivariate) by using log-linear models.
In order to statistically assess the effectiveness ofφ, von Davier et al. (2004) give a diagnostic measure called the percent relative error (PRE) which compares the moments of the distributions of the equated scores to the moments of the original discrete reference distribution. Also, the accuracy of the estimatedφ(x) is measured by SEE.
The main stages in the previous description of the KE method have been summarized in five steps (see, e.g., von Davier et al. 2004): (i) pre-smoothing, (ii) estimation of scores probabilities, (iii) continuization, (iv) computing and diagnosing the equating function, and (v) computation of accuracy measures.
Note that the estimation of ϕ involves both, vector (r and s) and function parameters (F and G) so that the KE method is semiparametric.

Local equating
Instead of using the "marginal" distributions of scores, as is the case in the equipercentile method, local equating methods (van der Linden 2011, 2013) utilize the "conditional" (on abilities) distributions of scores to obtain the transformation ϕ, leading to a family of abilitydependent equating transformations of the form To estimate θ, maximum likelihood (ML), maximum a posteriori (MAP) or expected a posteriori (EAP) estimates are typically computed and are obtained using the response patterns of individuals and assuming that an IRT model holds. The conditional score distributions are typically obtained by using an algorithm described by Lord and Wingersky (1984). Linear interpolation is used to continuize the resulting discrete conditional distributions. Because both vector and function parameters are involved, LE is also a semiparametric method.

IRT parameter linking and equating methods
IRT models (van der Linden and Hambleton 1997; De Boeck and Wilson 2004) are widely used nowadays for analyzing and scoring tests. As many testing programs use IRT to assemble tests, the use of IRT equating is a natural option (Skaggs and Lissitz 1986;Cook and Eignor 1991;Lord 1980, Chapter 13). Using IRT in the equating process requires a previous step, referred to here as IRT item-parameter linking. Because the parameters from different test forms need to be on the same IRT scale, linking is conducted to place the IRT parameter estimates, from separate calibrations of two test forms, on a common scale 1 .
Let Y ij be the random variable denoting the answer of individual i to item j. IRT models specify the respondent's probability of a correct answer to a test item, based on both a person's parameter θ i and a vector of item characteristics (e.g., difficulty, discriminant power, etc.), ω j . When it is assumed that θ i ∼ N (0, σ 2 θ ) the model is specified as where p is known as the item characteristic curve (ICC). In particular, the three parameter is the standard logistic function. Under this specification we have where ω j = (a j , β j , c j ) and D is a scaling constant.
Other IRT models are special cases of the 3PL. For instance, the two parameter logistic model (2PL) is obtained by setting c j = 0 for all j, whereas the 1PL model additionally sets all a j to be equal to 1. For details on IRT parameter estimation methods and software the reader is referred to Fischer and Molenaar (1995); Baker and Kim (2004); Tuerlinckx, Rijmen, Molenberghs, Verbeke, Briggs, den Noortgate, Meulders, and De Boeck (2004).

Parameter linking
When an IRT model is used to fit two different test forms, a linear equation can be used to convert both sets of IRT parameter estimates to the same scale (see Kolen and Brennan 2004, Section 6.2). The corresponding linear relations are where A and B are constants to be estimated, and the indices X and Y are used to differentiate between the scales. A detailed account of methods to calculate A and B can be found in Kolen and Brennan (2004, Chapter 6). A brief description of them is given next.

Mean-mean and mean-sigma methods
These methods use the means and standard deviations of the common item parameter estimates to obtain the constants A and B in the following way In both cases, means and standard deviations are taken only on the set of common items between tests forms X and Y.

Characteristic curves methods
These methods resort on the ICCs and iteratively search for optimal A and B. The functions to be minimized for the Haebara (Hcrit) and Stocking-Lord (SLcrit) criteria are and, respectively. In both cases, V denotes the set of common items between tests forms X and Y.
IRT true-score and observed-score equating After item parameters are estimated and placed on the same scale, IRT equating is used to relate scores from two tests forms in the following ways: IRT true-score equating. A true score ξ from form X associated with a given θ i , is considered to be equivalent to the true score η from Y that is associated with that same θ. The relation between ability and true scores is determined using the so-called test characteristics functions, T (θ i ), defined as follows, where the sum is over the items of the corresponding form. The resulting transformation to find an equivalent true score η of ξ is given by In practice, estimates of the item parameters are used to produce an estimated true score relationship for each test form by means of the tests characteristics functions. Also, because true scores are not observable, the estimated true score conversion is actually applied to the observed sum scores.
IRT observed score equating. This method uses both, conditional (on ability) observedscore distributions and the ability distribution. The product of these two distributions is integrated (or summed) across all ability levels to produce a marginal observed score distribution. Once this process has been completed for both X and Y, equipercentile equating is applied to relate scores between the two forms. Hence, if S X , S Y , and G(θ) represents the scores and ability distributions, respectively, the resulting transformation is where

The SNSequate R package
SNSequate contains several functions written in R that carry out test equating for a variety of approaches. SNSequate is freely available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=SNSequate. In this section we briefly describe the functions currently available in SNSequate.
The functions mea.eq(), lin.eq(), and eqp.eq() perform mean, linear and equipercentile equating, respectively. Their main arguments are the observed scores from the two test forms that are to be equated and the values in the scales that are to be equated.
The function le.eq() implements local equating. To obtain the equating transformation, this function first estimates the conditional score distributions and then performs equipercentile equating using the estimated conditional distributions.
The function ker.eq() implements the kernel method of equating under the EG, SG, CB, NEAT PSE, and NEAT CE designs. Currently, the Gaussian, uniform and logistic kernels are available. ker.eq() makes calls to other functions in order to obtain the parameters that are needed for kernel equating (e.g., the bandwidth and score probability parameters h, r, and s, respectively). The following functions to perform different tasks in the five steps of kernel equating are available: loglin.smooth() is a design specific function helping in the pre-smoothing (Step 1). The function also estimates the score probabilities ( Step 2) needed for the computation of the equating function, and the C matrices needed in the calculation of the SEE. The bandwidth() function is both a design specific and kernel specific function used to obtain optimal h values used in the continuization step. For the final stage (Step 5), the SEED() function performs standard error of equating differences by using two objects of class 'ker.eq' returned by ker.eq(). The function PREp() is also available to calculate the PREp diagnostic measure (see the information on KE in Section 2.1).
The irt.link() function implements four IRT parameter linking methods: the meanmean, mean-sigma, Haebara, and Stocking-Lord methods. For the characteristic curve methods (i.e., the Haebara and Stocking-Lord), besides the traditional logistic ICC used in IRT modeling, an asymmetric cloglog link is also available as an option.
Print and summary methods are provided for most of objects returned by the above described functions.
In order to illustrate the use of the functions in SNSequate, six data sets widely analyzed in the equating literature are provided: Math20EG, Math20SG, and CBdata to illustrate the KE method under the equivalent, single and counterbalanced group designs respectively (von Davier et al. 2004); and ACTmKB, KB36, and KB36.1PL to illustrate mean, linear, and equipercentile equating as well as mean-mean, mean-sigma, Haebara and SL irt parameter linking methods (Kolen and Brennan 2004).

Examples
In the following sections the main features of SNSequate are demonstrated using the provided data sets for each of the functions described in Section 3. Kolen and Brennan (2004) use data from two 40 items test forms of the ACT mathematics test. Form X is administered to 4,329 examinees, whereas form Y to 4,152. The following code can be used to reproduce Table 2.7 of Kolen and Brennan (2004), where equated scores are obtained by using mean, linear and equipercentile equating.

Mean, linear and equipercentile equating
R> data("ACTmKB", package = "SNSequate") R> act.m <-mea.eq(rep (0: The three functions, mea.eq(), lin.eq(), and eqp.eq() receive as the first two arguments the observed scores from forms X and Y, and the third corresponds to values on the scale that are to be equated.

Kernel equating
As mentioned in Section 3, different functions are available to carry out the five steps that describe the KE method. In what follows we show examples of the use of loglin.smooth(), bandwidth(), ker.eq(), PREp(), and SEED() functions, for different kernel and equating designs.

Log-linear models for pre-smoothing
Log-linear models are mostly used for the pre-smoothing of score distributions (see, e.g., Thayer 1987, 2000;Moses 2011). In this step, the objective is to find a model that best describe the data (i.e., that adequately fits the score distribution), as parsimoniously as possible.
The loglin.smooth() function fits log-linear models following the general equation where Equation 20 can be used to represent various models according to the different designs used. For instance, if the SG design is considered, then o = p, g = j, h = k, Z = X, W = Y , z = x, w = y, leading to the log-linear models used in an example below. In general, the possible values for the symbols in (20)  We illustrate the pre-smoothing step using the loglin.smooth() function and the Match20EG and Match20SG data for both univariate and bivariate frequency score distributions according to the EG and SG design, respectively. The example replicates some results reported by von Davier et al. (2004) from where the data are obtained.
In the following example a simple log-linear model with 2 power moments is used to obtain the estimated score probabilitiesr j for X scores. The degree = 2 argument means that a two-moment fit of the model is required and corresponds to the T r term in the resulting log-linear equation Similar code can be written in order to obtain the estimated score probabilities for Y scores, s k . A useful tool in the election of an appropriate log-linear model that helps to assess discrepancies is a plot showing both the observed and fitted score distributions. The following code can be used to obtain Figure 2.

Selecting an optimal bandwidth parameter
The way to optimally select the h bandwidth parameter used in kernel equating is described in von Davier et al. (2004). The bandwidth() function automatically selects h by minimizing where PEN 1 (h) = j (r j −f h (x j )) 2 , and PEN 2 (h) = j A j (1 − B j ). The terms A and B are such that PEN 2 acts as a smoothness penalty term that avoids rapid fluctuations in the approximated density (see, e.g., Chapter 10 in von Davier 2011b, for more details). The K term in (26) corresponds to the Kp argument of the bandwidth() function and its default value is set to 1. Ther values are assumed to be estimated by polynomial log-linear models of a specific degree, which come from a call to loglin.smooth(). The following example shows how to obtain h X : R> hx.gauss <-bandwidth(scores = Math20EG[, 1], kert = "gauss", degree = 2, + design = "EG") R> hx.gauss Automatically selected bandwidth parameter: [1] 0.6222771 Note that the bandwidth() function is design specific. That is, it will find the optimal values of bandwidth parameters according to the selected design. For example, in the CB design, both h X and h Y depend on weights w x and w y , respectively. The arguments wx and wy can be varied to obtain, for instance, F 1 and G 1 or F 1/2 and G 1/2 as shown in the following example: R> data("CBdata", package = "SNSequate") R> bandwidth(scores = CBdata$datx1y2, kert = "gauss", + degree = c(2, 2, 1, 1), design = "CB", Kp = 0, scores2 = CBdata$datx2y1, + J = 76, K = 77, wx = 1, wy = 1) Automatically selected bandwidth parameter: hx hy 1 0.5582462 0.6100749 R> bandwidth(scores = CBdata$datx1y2, kert = "gauss", degree = c(2, 2, 1, 1), + design = "CB", Kp = 0, scores2 = CBdata$datx2y1, J = 76, K = 77, + wx = 0.5, wy = 0.5) Automatically selected bandwidth parameter: hx hy 1 0.5580289 0.6246032 Note that in the previous examples, a call to loglin.smooth() is made in order to obtainr j andŝ j by fitting log-linear models with power degree 2 for both X and Y and no interaction term.

Obtaining equated scores
As described in Section 2.1, the conversion of scores from test X to test Y is based on e Y (x) = G −1 h Y (F h X (x;r);ŝ) wherer andŝ are estimated score probabilities obtained in a pre-smoothing stage (e.g., by using the loglin.smooth() function) and h X and h Y are bandwidth parameters (that can optimally be obtained by using the bandwidth() function). The ker.eq() function computes the equating transformation used in the KE method for various designs. The function makes calls to both bandwidth() (in case the bandwidth parameters are not specified by the user) and loglin.smooth(). Further, the ker.eq() function calculates many useful quantities such as summary statistics for the used data, the actual equated values, the SEE, and the SEE vector used by SEED() to obtain standard errors of equating difference between two equating functions, among others. A summary() method which summarizes the most important output is also available.

Evaluation of KE results
As mentioned in Section 2.1, the percentage of relative error (PRE) serves as a measure to assess the adequacy of the estimated equating functionê Y (x). The measure is formally defined as where µ p (Y ) = k (y k ) p s k and µ p (e Y (X)) = j (e Y (x j )) p r j . Similar formulas can be found when equating from Y to X. The PREp() function can be used for this purpose. It takes as an argument an object of class 'ker.eq' and the number of moments to be evaluated, and gives the corresponding PRE values as output. For example, using the mod.gauss object, the current example shows how to obtain the first 10 moments.
The ker.eq() function internally calculates the Jacobian matrices J e Y and J DF . The C matrix is obtained as an output of loglin.smooth(). As shown in previous examples, the SEE is part of the summary() method for 'ker.eq' objects.
A graphical display of the SEE's range for each score point can easily be obtained. Figure 3 shows an example when equating from X to Y and it was generated using the code R> plot(score, mod.gauss$SEEYx, ylab = "SEEy(x)", xlab = "X raw Score") In order to decide between two equating functions, von Davier et al. (2004) propose the use of a measure called the standard error of equating difference (SEED). The measure is formally defined as and is based on the SE-vector J e Y J DF C of each equating function being considered in the comparison. The SE-vector is part of the output of ker.eq(). The SEED() function uses as arguments two objects of class 'ker.eq' and returns the SEED between them. For example, following Theorem 1.1 from von Davier et al. (2004), the KE function approximates the standard linear equating function for large values of the bandwidth parameters. The previously obtained mod.gauss object is used to evaluate the difference between the Gaussian KE function and the linear equating method when equating from X to Y using SEED() as follows: R> mod.linear <-ker.eq(scores = Math20EG, kert = "gauss", hx = 20, hy = 20, + degree = c(2, 3), design = "EG") R> seed <-SEED(mod.gauss, mod.linear)$SEEDYx R> seed where it can be seen that the SEE differences range from 0.01 to 021. The difference between each of the equated values obtained using the Gaussian KE function with those obtained using the approximated linear equating function can be assessed for significance comparing them to their uncertainty (i.e., its SEED). A graphical alternative is to plot such differences of equated values along ±2SEED. Figure 4 shows an example of this and it was generated using the following code R> Rx <-mod.gauss$eqYx -mod.linear$eqYx R> plot(score, Rx, ylim = c(-0.8, 0.8), pch = 15) R> abline(h = 0) R> points(score, 2 * seed, pch = 0) R> points(score, -2 * seed, pch = 0) R> legend("topright", pch = c(0, 15), c("+-2SEED", "R(x)")) If different types of kernels, other than the Gaussian are used for equating, the SEED is a useful tool to compare and decide between them. Figure 5 shows the difference between the Gaussian kernel equating function and both the logistic and uniform kernel equating functions. The figure was generated using the following code and replicate results found in Lee and von Davier (2011).

Concluding remarks
The development of new equating models has become common in equating research. Motivated by the lack of software implementing traditional and new equating methods, this paper introduced the SNSequate R package which implements both standard and nonstandard statistical models and methods for test equating.
The examples used to demonstrate the capabilities of SNSequate were based on widely known data that has been analyzed in the equating literature, which helps illustrate the implemented methods. Moreover, all the examples can be easily modified to accommodate user's data.
Many improvements can be made in future versions of the package. For instance, plot() methods can be added for the objects returned by the loglin.smooth(), ker.eq(), and SEED() functions. This would allow for easily obtaining plots of the fitted score distributions, and of the observed and fitted conditional means and standard deviations of the two distributions being considered for the equating. In the case of ker.eq(), plots of the continuized distributions as well as the estimated equating functions evaluated at each score point will be considered. For SEED() this would allow to reproduce a plot like the one shown in Figure 4. These improvements will be implemented in the next version of SNSequate.
The flexibility and modularity of SNSequate allow one to easily extend existing methods and implement new ones. The following methods are currently being investigated by the author and will be part of future versions of SNSequate: Bayesian nonparametric methods of equating. These methods have shown good performance compared to alternative approaches (e.g., Karabatsos and Walker 2009). A Bayesian nonparametric model for test equating which allows the use of covariates in the estimation of the score distribution functions that lead to the equating transformation is described in González, Barrientos, and Quintana (2013). A major feature of this approach is that the complete shape of the score distribution may change as a function of the covariates. As a consequence, the form of the equating transformation can change according to covariate values.
Epanechnikov and adaptive kernels. Cid and von Davier (2009) examined potential boundary bias effects in the score distributions continuized by kernel smoothing. The use of the Epanechnikov and adaptive kernels in the framework of kernel equating is currently being investigated by the author.
IRT equating methods based on asymmetric ICCs. Although the use of asymmetric ICCs has been studied in IRT models (e.g., Samejima 2000), their role in IRT equating methods is unclear. IRT equating methods based on asymmetric ICCs are planned to be included in future versions of SNSequate. Additionally, besides the cloglog link currently implemented in the irt.link() function, other asymmetric links such as the skew-normal, will be included.