pcIRT: An R Package for Polytomous and Continuous Rasch Models

A key task in psychological assessment is the scaling of new psychological tests and questionnaires. Item response theory (IRT) is a well-established framework for this area of research. At present, IRT comprises a great variety of models, but to date, relatively little attention has been paid to the scaling of nominal and continuous data. The R package pcIRT provides functions for estimating IRT models for polytomous (nominal) and continuous data - the multidimensional polytomous Rasch model (Rasch 1961) and the continuous rating scale model (Muller 1987). Both models are extensions of the dichotomous logistic Rasch model (Rasch 1980) and retain its key feature of the separability of structural and nuisance parameters. The multidimensional polytomous Rasch model is suitable for nominal data under the assumption of a multidimensional space for the response categories, and the continuous rating scale model is a direct extension of the rating scale model developed by Andrich (1978) for continuous data.


Introduction
Instruments such as tests and questionnaires that measure a latent trait are well-established research methods in the social sciences. Item response theory (IRT) provides a range of statistical models for scaling these measurement tools. IRT is the predominant scaling method used in large-scale assessments administered in an educational context (cf. OECD 2012; Olson, Martin, and Mullis 2008) and is also an important scaling method for psychological tests and questionnaires. To date, the majority of IRT models have focused on dichotomous and ordinal data. However, new and innovative response formats make it necessary to provide scaling methods for other data types as well. The newly developed package pcIRT (Hohensinn 2018) -written in R (R Core Team 2018) -makes scaling methods for polytomous (nominal) and continuous data available to the user. More specifically, package pcIRT provides easy-touse functions for estimating the multidimensional polytomous Rasch model (MPRM), which handles polytomous (nominal) data. In addition, the package allows the estimation of the continuous rating scale model (CRSM), which scales continuous data. Package pcIRT is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project. org/package=pcIRT. IRT comprises a great variety of models. A majority of them have already been implemented in various R packages. An up-to-date overview of R packages for IRT measurement is provided by the CRAN Task View on Psychometric Models and Methods (Mair 2018).
The main characteristic of all IRT models is that they model the relationships between manifest behavior on test tasks or questionnaire items and latent traits. Thus, IRT is based on statistical models obtained by estimating the parameters of an item and of the respondents' characteristics (Baker and Kim 2004). More basically, in the (dichotomous logistic) Rasch model (Rasch 1980), the probability of a response of respondent v to item i, X vi = 1, is given by the following logistic function: where β i is interpreted as the difficulty of task or question i and θ v represents the ability or characteristic of the measured latent trait of respondent v.
The main characteristic of the Rasch model is the existence of sufficient statistics for the parameters and, consequently, the ability to estimate the structural (item) parameters separately from the nuisance (person) parameters. This separate estimation is realized through the derivation of the conditional likelihood for the Rasch model. In the conditional likelihood, the person parameters are conditioned out. The estimation of item parameters using the conditional likelihood is well known as conditional maximum likelihood (CML) estimation (see Baker and Kim 2004). Several extensions of the Rasch model given in Equation 1 have been derived that retain this key feature, whereas other IRT models, e.g., the 3-PL model developed by Birnbaum (1968), are no longer models of the exponential family. Like the eRm package (Maier, Hatzinger, and Maier 2015), the pcIRT package is based on CML estimation. Mair and Hatzinger (2007) emphasize the advantages of extended Rasch models belonging to the exponential family and of CML estimation in a very detailed way; please refer to their paper for an extensive explanation.
Extensions of the dichotomous logistic Rasch model that retain the characteristic of being a member of the exponential family can be divided into two broad categories: first, models for polytomous and continuous data, and second, multidimensional models that assume more than one latent trait θ. Among the larger set of R packages for IRT models, the eRm package (Maier et al. 2015) is the only one that implements CML estimation and includes functions for extended Rasch models that maintain parameter separability. Note that the eRm package offers a variety of functions for estimating unidimensional Rasch models for both dichotomous and polytomous data. By contrast, the pcIRT package presented here offers functions for estimating the multidimensional polytomous Rasch model (MPRM) and a (unidimensional) extension of the Rasch model for continuous data -the continuous rating scale model (CRSM).
In its conception, the MPRM is an exceptional model. Its main difference from the nominal response model developed by Bock (1972) (included, for example, in the mcIRT package, Reif 2014) is its assumption of a multidimensional latent space. Furthermore, its primary difference from all "typical" multidimensional IRT models is that the MPRM assumes a latent trait for each response category instead of latent traits for subsets of items.
Other R packages such as TAM (Kiefer, Robitzsch, and Wu 2015) and mirt (Chalmers 2012) allow the estimation of multidimensional IRT models but use marginal maximum likelihood (MML) estimation instead of a CML estimation procedure.
Moreover, package pcIRT allows the MPRM to be estimated in an easy-to-use way while still allowing the user to set a variety of constraints in the design matrix. In addition, the package offers functions for estimating the CRSM and the Rasch model. This paper focuses on describing the MPRM and its implementation in greater detail, with illustration through examples.

Multidimensional polytomous Rasch model
When one discusses multidimensional IRT (MIRT) models, one is most often referring to models in which a specific kind of MIRT model is included. These models have the common feature that an item is assigned to a specific latent trait or to more than one trait (for an overview of multidimensional IRT models, see Reckase 2009). The multidimensional polytomous Rasch model (MPRM), which was formulated by Georg Rasch himself (Rasch 1961), has a completely different model formulation. It is also a multidimensional model, in that it assumes a multidimensional latent trait space. However, in contrast to other well-known MIRT models, the MPRM assigns response categories to latent traits instead of items. Huang and Mislevy (2010) demonstrated how the use of the MPRM provides additional information about a multiple-choice achievement test: They analyzed a multiple-choice mechanics test in physics and aimed at measuring both the competence level and the (mis)conceptions of students who are learning physics. Therefore, the distractors for the items were developed to represent different fundamental misconceptions (e.g., an Aristotelian instead of a Newtonian view). The MPRM assumes a latent dimension θ for each response category -that is, for each conceptual approach in physics, in this application. The results provide details regarding the difficulties associated with each distractor and information about the (mis)conceptions to which each student clings.
To emphasize the differences between the MPRM and "typical" MIRT models, as a first step, the very general model formulation developed by Adams, Wilson, and Wang (1997) is presented which is given by 1 The probability of responding to item i in category h depends on the latent trait θ d = (θ 1d , . . . , θ N d ) and the item components η k . v indicates the person, and d indicates the latent trait. Now, for deriving the MPRM from the general model given in Equation 2, the design matrices A and B must be specified in the following way: Each response category corresponds to a different underlying latent trait d. Consequently, the number of latent dimensions d must θ 1 θ 2 θ 3 h = 1 1 0 0 h = 2 0 1 0 h = 3 0 0 1 Table 1: An example of a design matrix B for a certain item i is given. Each response category h is assigned to a different latent dimension θ d in the MPRM. η 1 η 2 η 3 η 4 η 5 η 6 η 7 η 8 η 9 η 10 η 11 η 12 h = 1 1 0 Consequently, B is a three-dimensional diagonal matrix that maps each response category h exclusively to one θ d , as shown in Table 1.
A design matrix A expresses the relations between each response category h and the item component parameters η. In the MPRM, an item parameter is assigned to each item for each category, which means that they are item category parameters. Continuing with the example of three response categories h = 1, 2, 3, four items i = 1, . . . , 4 are now assumed. This results in a total of 12 item component parameters η k . Table 2 shows the design matrix A for item 1. There are three item parameters for item i = 1: η 1 for category 1, η 2 for category 2 and η 3 for category 3. Obviously, η 4 , η 5 and η 6 are the item parameters for item 2, and so on. For the MPRM, the term "item category parameter" expresses the meaning of the parameters more precisely than the general term "item component parameter". Therefore, these parameters will henceforth be referred to as the item category parameters β ih . For example, item component parameter η 2 which is assigned to item 1 and category 2 is indicated as β 12 . Restricting the general MIRT model expressed in Equation 2 in this way leads to the MPRM (cf. Andersen 1995): . ( The probability that a person v responds to item i in category h depends on a person parameter, θ vh , and the item parameter, β ih . The MPRM described thus far suffers from over-parameterization. This is avoided by imposing the following constraints (see Fischer 1974): where m is the highest response category, h = 1, . . . , m, and r is the number of items, i = 1, . . . , r. The item component parameters for the highest category are set to zero. In the example given above, this reduces the number of estimated η k parameters to eight (see Table 2) and the number of estimated θ d parameters for person v to two (see Table 1).
As a consequence, it follows that the MPRM reduces to the dichotomous logistic Rasch model in the case of only two categories.
In contrast to most other MIRT models, the MPRM belongs to the family of Rasch models, which means that sufficient statistics for the parameters are available. For the MPRM given in Equation 3, the vector that sums all responses of person v for each category h provides sufficient statistics for the person parameter vector of person v, θ v . By virtue of the existence of sufficient statistics y v , the conditional likelihood for the data set X can be derived as follows (Andersen 1973): with Y based on the data set X according to Equation 5. γ is the combinatoric function that yields the number of all possible response vectors according to a given marginal response vector y v . Andersen (1972) developed an algorithm for the recursive calculation of γ that is computationally efficient. The conditional likelihood expressed in Equation 6 enables parameter estimation using the CML method.
Of course, the MPRM is primarily a model for scaling nominal data, as it has been used, for instance, by Huang and Mislevy (2010). However, the MPRM also offers other interesting applications. Andersen (1973) and Fischer (1974), in particular, note the possibility of testing whether the assumed multidimensional space Θ can be reduced to a unidimensional one. This test for unidimensionality is performed by setting the item category parameters β i1 , . . . , β im for an item i as linearly dependent: β ih = β i · φ h . In this manner, one can test whether the assumption of ordinal scoring (an assumption that is always made, e.g., with regard to rating scales for questionnaires) holds and, furthermore, whether the scoring function of an ordinal scale can be estimated. This opens up an important field of application in the context of test and questionnaire construction.

General
Like most R packages, pcIRT uses S3 classes. The MPRM function enables the user to set constraints on the model in a flexible way by specifying a design matrix through the argument desmat as well as the arguments ldes and lp. How the restrictions on this model can be set will be explained in greater detail and illustrated in Section 4.2. Furthermore, in Section 2, the unidimensionality test for the MPRM was mentioned. This test is implemented in an extended form. In its original conception, the test assumes the linear dependency β ih = β i φ h for i = 1, . . . , r and h = 1, . . . , m. MPRM also allows the setting of linearly dependent parameters for only a subset of user-specified items i and categories h. This leads to far greater flexibility in testing hypotheses regarding the unidimensionality of different response categories. The resulting conditional likelihood of the adapted MPRM is derived as follows: where c is an indicator variable. Thus, c = 1 indicates a given specification of the linearly dependent parameters. As in Equation 6, Y is the sufficient statistic for the person parameters and is defined in Equation 5. This conditional likelihood is maximized by the optim function from the base R stats, using the BFGS optimizer by default. It is possible to choose another available optimizer using the control argument.
The functions for estimating the model are supplemented with several additional functions that allow the user to check the fit of the model. A function for generating data sets in accordance with the MPRM is also included.

Accuracy of parameter estimation
The described functions of the pcIRT package were evaluated through simulation studies. The results of one of these simulations are presented here to illustrate the accuracy of parameter estimation.
Considering that the main focus of this paper is on the MPRM, a comparison of the parameter estimation accuracy of this model offered by the pcIRT package with that of the TAM package 2 is presented. For this purpose, 1000 data sets were simulated with the number of items r = 15, the number of categories m = 3 and the sample size n = 1000. The β ih were randomly drawn from N (0, 1.5) once and then fixed (parameters ∈ (−2.41; 4.01)). The θ vh were drawn from N (0, 1). The MPRM was estimated using the pcIRT package and the TAM package. Figure 1 shows the differences between the fixed β ih values and the estimated ones (the means and the quantiles at 0.05 and 0.95 are displayed). β 121 and β 142 show a larger difference compared with the other parameters. This is because these parameters were those with the highest absolute values. The estimates produced by both packages are very similar so that the two facets of the plot looks almost identical.
According to Figure 1, the mean square errors of parameter estimation are 0.01297 for the pcIRT package and 0.01285 for the TAM package. The median time required to estimate the MPRM was 2.55 seconds for the TAM package and 33.89 seconds for the pcIRT package.

Application of the MPRM function
This section illustrates the application of the MPRM with examples.

Estimation of the MPRM
As noted in Section 2, the MPRM function treats the input data as nominal. Thus, the main field of application of the MPRM is the scaling of nominal data. The application of Huang q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q   TAM   pcIRT   I1c1  I1c2  I2c1  I2c2  I3c1  I3c2  I4c1  I4c2  I5c1  I5c2  I6c1  I6c2  I7c1  I7c2  I8c1  I8c2  I9c1  I9c2  I10c1  I10c2  I11c1  I11c2  I12c1  I12c2  I13c1  I13c2  I14c1  I14c2  I15c1  The labels identify the parameters, e.g., the parameter β 11 for item 1 and category 1 is labeled as I1c1.
and Mislevy (2010) described above illustrates this main purpose very well. Therefore, a different example was chosen here to provide insight into the variety of possible applications. The example concerns a fundamental problem in test and questionnaire construction: Very often, scoring functions are determined without further empirical evaluation. Subsequently, the MPRM is used to assess and, finally, to estimate the appropriate scoring weights of the response categories. This example follows an approach described in Fischer (1974).
Intelligence and competence tests are often administered as speed-and-power tests by additionally recording the time taken by the examinee to give a response. The purpose of this analysis is to determine whether a fast solution expresses a different competence or facet of ability than a slow solution. If this is true, then a fast solution would be linked to a different latent trait than a slow solution, and a scoring method that awards additional points for fast solutions (which is done in some intelligence tests) would be inappropriate. Otherwise, if all three categories represent only graduated levels of the same competency, such awarding of points would lead to a higher measurement precision. However, in either case, the selection of an appropriate scoring function for the response categories is crucial. This question is studied using the MPRM.
The data set reason is used, which contains the responses and response times for each of eleven items on the reasoning test "META" (Gatternig and Kubinger 1994). The respondents are asked to solve encoding tasks without time pressure. The data set was obtained from a low-stakes test situation and includes observations of 404 students. After excluding three respondents who quit the test and 21 persons who apparently only clicked through the test, data from 380 students are analyzed. As usual, the items are scored as "not solved" (X vi = 0) or "solved" (X vi = 1). In addition to the item responses, the response times in seconds for each item are stored.
The generic summary function displays the convergence and estimated item category parameters.  Finally, 20 item parameters are estimated; the parameters of the last (eleventh) item and those for the highest category are fixed for normalization reasons (in accordance with Equation 4). The highest category is the reference category and is set to 0. For the fixed parameters, no SE is estimated, and the SE is therefore reported as NA in the summary. The normalization constraints mentioned above must be taken into account when interpreting the parameter estimations. For example, when comparing the different category parameters of item 1, one should be aware that these parameter estimations are affected by the other item category parameters because of the normalization within a category to a sum of zero. Thus, for example, item 1 seems to be a rather easy item in general. That is, a response in the category "fast solution" is more likely than one in the category "slow solution". Similarly, both "solved" categories are more likely than the (reference) category "not solved". By contrast, for item 4, it appears distinctly more difficult to solve the item quickly than to solve it with a longer response time. The task of item 4 therefore seems to be considerably more difficult than that of item 1. Figure 2 displays the item characteristic curves for item 4. The plots show the variations in response probability for a certain response category depending on the location with respect to the latent traits. For instance, the ICC here shows that the response probability for category 3 increases with low latent trait locations in the first two latent trait dimensions θ 1 and θ 2 . This trait location corresponds to a high position in θ 3 . These plots make it clear that the person parameters are interdependent. This is already obvious from the normalization constraints of the person parameters (cf. 4).

R>
In the next step, the person parameters θ vh are estimated for all observed raw score vectors except those with extreme scores (that is, at least one observation and at most r − 1 observations in each response category).  The output displays one part of a table providing the person parameter estimates for the observed raw score vectors. The row names indicate the raw score vectors. According to the normalization constraints given in Equation 4, the person parameters must be interpreted as intra-individual tendencies to choose a response category. For instance, respondents with a raw score vector of x vi = (1, 1, 9) have a strong tendency to respond in category 3, witĥ θ 3 = 1.69.
Next, it is tested whether the categorization of fast and slow solutions is a gradual one or whether it is a qualitative difference. In the former case, the β ih parameters should be linearly dependent for a given item i. This is tested by means of a likelihood-ratio test comparing the likelihood of the MPRM to the likelihood of a restricted MPRM that includes the linear dependency β ih = β i · φ h described in Section 2. To test this general unidimensional hypothesis, the dLRT function is applied.

Estimation of a constrained MPRM
It is illustrated how the user can easily set constraints on the MPRM. To explain the relevant functions, the example from Section 4.1 is continued. The example as presented thus far, a case of three-categorical scoring with bonus points for a fast solution, is appropriate for a reasoning test. Now, using a constrained MPRM, the optimal scoring should be determined empirically. For this purpose, constraints are set on the MPRM such that each item category parameter of category 2 is linearly dependent on the corresponding parameter of category 1: β i2 = β i1 · φ h . Furthermore, the scoring parameter is set equal for all categories h, resulting in one general φ. In essence, this is a unidimensional polytomous model with a scoring parameter to be estimated.
To obtain this specification, three arguments must be passed to the function MPRM: First, the design matrix must be built. The easiest way to achieve this is by using the automatically built design matrix for the estimated general MPRM (from the estimation presented in Section 4.1) and applying the desired modifications.

R> design <-MPRM.res$design
Each column of the design matrix represents an item category parameter that must be estimated, whereas each row displays the resulting item category parameters (including fixed parameters or parameters resulting from normalization). For the general model, the resulting number of item category parameters is 33 (11 items multiplied by 3 categories). Because of the normalization constraints (see Equation 4), 20 item category parameters are estimated in the general MPRM. Now, in this application, only the category parameters for category 1 and the scoring parameter φ -that is, a total of 11 parameters -need to be estimated. In the first step, the item category parameters of categories 2 and 1 are set equal in the design matrix for all items: R> s1 <-seq(2, 30, by = 3) R> s2 <-seq(1, 20, by = 2) R> for (s in 1:length (s1) [, -c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)] Modifying the design matrix in the presented way and using it in the function MPRM would lead to a constrained MPRM, with the item category parameters of category 2 set equal to those of category 1. In our example, however, we wish to estimate a constrained MPRM by setting the item category parameters to be linearly dependent (not equal). Therefore, exactly which item category parameters should be set as linearly dependent must be specified. This is achieved by means of the argument ldes. For this purpose, a vector with a length equal to the number of resulting item category parameters is created. Each position represents the corresponding item category parameter. For example, in the present case, the item category parameter in position 5 in the vector represents the parameter for item 2 and category 2 (β 22 ). Now, in this vector, the position of the item category parameter on which an item category parameter is linearly dependent is plugged into that dependent item category parameter. For example, if item category parameter β 22 is estimated as linearly dependent on β 21 , the position of item category parameter β 21 (which is 4) must be input into the vector at the position of item category parameter β 22 (which is 5).
The vector then looks as shown in the code output below: R> ldc <-rep(0, nrow(designC)) R> ldc [c(2, 5, 8, 11, 14, 17, 20, 23, 26, 29)] <-c (1, 4, 7, 10, 13, 16, 19, + 22, 25, 28) R> ldc Finally, the user uses the argument lp to specify exactly how many scoring parameters φ need to be estimated and which of these scoring parameters are equal. In our example, only one φ should be estimated for all items. The three arguments desmat, ldes and lp are passed to the MPRM function. The code output displaysφ with its standard error. This result yields the empirically derived optimal scoring proportions of (0; 0.6; 1) for the reason data set. Thus, an appropriate scoring approach for the test is to award 2 points for a fast and correct solution, 1.2 points for a slow and correct solution, and 0 points for an incorrect response.

Future plans and outlook
There are plans to add additional possibilities for assessing the fit of item and response patterns -that is, item and person fit indices for the models implemented thus far. Another major feature planned for implementation is the extension of the model estimation for handling values that are missing by design. In analogy to other extensions of the Rasch model, the likelihood will be modified such that the MPRM can be estimated even if not all items were administered to all examinees (which is often the case in large-scale assessments, for example).