Latent Class Probabilistic Latent Feature Analysis of Three-Way Three-Mode Binary Data

The analysis of binary three-way data (i.e., persons who indicate which attributes apply to each of a set of objects) may be of interest in several substantive domains as sensory proﬁling, marketing research or personality assessment. Latent class probabilistic latent feature models (LCPLFMs) may be used to explain binary object-attribute associations on the basis of a small number of binary latent variables (called latent features). As LCPLFMs aim to model object-attribute associations using a small number of latent features they may be more suited to analyze data with many objects/attributes than standard multilevel latent class models which do not include such a dimension reduction. In this paper we describe new functions of the plfm package for analyzing binary three-way data with LCPLFMs. The new functions provide a ﬂexible modeling approach as they allow to (1) specify diﬀerent assumptions for modeling statistical dependencies between object-attribute pairs, (2) use diﬀerent assumptions for modeling parameter heterogeneity across persons, (3) conduct a conﬁrmatory analysis by constraining speciﬁc parameters to pre-speciﬁed values, (4) inspect results with print , summary and plot methods. As an illustration, the models are applied to analyze data on the perception of midsize cars, and to study the situational determinants of anger-related behavior.


Introduction
The analysis of three-way data is of interest in several substantive domains. In personality psychology, for instance, one may study the behavior of persons in different situations (Kuppens, Van Mechelen, and Meulders 2004;Mischel, Shoda, and Mendoza-Denton 2002;Vansteelandt and Van Mechelen 1998) or one may study the activation of emotion components (e.g, appraisals, action tendencies) when persons react to situations (Ceulemans, Kuppens, and Van Mechelen 2012). In consumer research one may study consumers' perception of products in terms of attributes in order to derive a perceptual mapping of the product space (DeSarbo, Grewal, and Scott 2008;Monteiro, Dibb, and Almeida 2012;Torres and Bijmolt 2009). In sensory science it is common to study the sensory characterization of food samples by panellists (Meilgaard, Civille, and Carr 1999;Varela 2014).
Data from three-way studies (e.g., persons who judge object-attribute relations) may be analyzed in many ways. A common approach is to apply classical multivariate techniques (e.g., principal components analysis, factor analysis, correspondence analysis) to two-way data obtained by aggregating the three-way data across one of the modes (Varela 2014). However, as the analysis of such aggregated data provides only a partial view, specific three-way methods have been developed as well. In particular, with ratio-scaled three-way data popular threeway methods as Tucker3 and Candecomp/Parafac (Carroll and Chang 1970;Harshman and Lundy 1994) can be used to summarize the entities of each mode using a limited number of components. Functions for estimating these models are available in the R packages ThreeWay (Giordani, Kiers, and Del Ferraro 2014), PTAk (Leibovici 2010) and multiway (Helwig 2018). Furthermore, as these classical three-way methods actually make the rather strict assumption that component scores assigned to entities of one mode are homogeneous across entities of other modes, methods that allow to capture such heterogeneity in the component scores have recently been proposed. In particular, one may use a clusterwise extension of the Parafac model (Wilderjans and Ceulemans 2013), or (clusterwise extensions of) simultaneous components analysis (De Roover, Ceulemans, Timmerman, Vansteelandt, Stouten, and Onghena 2012b;De Roover, Timmerman, Van Mechelen, and Ceulemans 2013b;De Roover, Ceulemans, Timmerman, Nezlek, and Onghena 2013a;Helwig 2013). Simultaneous components analysis and its (clusterwise) extensions can be estimated using the R package multiway or a software program based on MATLAB code (De Roover, Ceulemans, and Timmerman 2012a).
In addition to component-based methods, classification-based methods have been developed for three-way data to classify the elements of one mode while accounting simultaneously for information of the other two modes. In particular, Vichi (1999) described a least-squares approach to one-mode clustering of continuous three-way data and Vichi, Rocci, and Kiers (2007) further extended this approach to combine clustering and dimension reduction. Basford and McLachlan (1985) used a finite mixture of multivariate normal distributions to conduct one-mode classification of continuous three-way data. This approach was further extended by Hunt and Basford (1999) to model both continuous and categorical variables and by Hunt and Basford (2001) to deal with missing observations. Viroli (2011) considered finite mixtures of matrix-normal distributions to model continuous three-way data. Finally Vermunt (2007) used a hierarchical mixture approach and adapted multilevel latent class models to analyze continuous or categorical three-way data. Finite mixtures of normal distributions can be estimated with the R packages mclust (Scrucca, Fop, Murphy, and Raftery 2016;Fraley and Raftery 2002), mixtools (Benaglia, Chauveau, Hunter, and Young 2009) and flexmix (Leisch 2004;Leisch 2007, 2008). Hierarchical mixture models and special cases thereof can be estimated with Latent GOLD (Vermunt and Magidson 2005).
Although the analysis of continuous three-way data has received a lot of attention, the analysis of binary three-way data may be of special interest in studies where persons have to make object-attribute judgments. More specifically, check-all-that-apply (CATA) questions have recently gained in popularity in sensory product characterization with consumers (Ares, Deliza, Barreiro, Giménez, and Gámbaro 2010;Ares and Jaeger 2013). Responding to CATA questions (i.e., selecting the relevant attributes from a list for each product) has been reported to be easier for participants while providing stable and reproducible product characterizations Ares et al. 2014).
As with continuous three-way data, methods for binary three-way data are often extensions of existing two-way methods. The deterministic hierarchical classes (HICLAS) model (De Boeck and Rosenberg 1988), for instance, explains two-way binary data by describing objects and attributes in terms of a limited set of binary latent variables (further called latent features). Several deterministic three-way extensions of the HICLAS model have been described (Leenen, Van Mechelen, and De Boeck 1999;Ceulemans, Van Mechelen, and Leenen 2003;Ceulemans and Van Mechelen 2004;Wilderjans, Ceulemans, and Kuppens 2012). In addition, Maris, De Boeck, and Van Mechelen (1996) developed a stochastic likelihood-based extension of the HICLAS model to analyze two-way object-by-attribute frequency data (obtained by aggregating three-way binary object × attribute × person data across persons). This model is further called the probabilistic latent feature model (PLFM). Meulders (2013) described the R package plfm to estimate the PLFM and to evaluate its fit. Furthermore several (multilevel) latent class extensions were proposed to extend PLFMs to the case of binary three-way data (Meulders, De Boeck, Kuppens, and Van Mechelen 2002;Meulders, De Boeck, and Van Mechelen 2003;Meulders, Tuerlinckx, and Vanpaemel 2013).
The goal of this paper is to describe the LCplfm function that has been added to the R package (R Core Team 2018) plfm (Meulders 2013) and that implements the (multilevel) LCPLFMs described by Meulders et al. (2003) and Meulders et al. (2013). The proposed LCplfm function offers a flexible modeling approach. First, it allows to specify different assumptions for modeling statistical dependencies between object-attribute pairs as well as different assumptions for modeling parameter heterogeneity across persons. Second, as the LCplfm function allows to constrain specific parameters to pre-specified values, it can handle confirmatory LCPLFM analysis. Third, in contrast to standard multilevel latent class models for three-way data (Vermunt 2007), LCPLFMs are suited to provide a parsimonious description of the data as they aim to model objects-attribute associations in terms of a small number of latent features. Finally, user-friendly print, plot and summary methods are added to enhance data analysis with the LCplfm function.
In Section 2 we first describe the original PLFM. Next, in Section 3 we describe multilevel LCPLFMs as described in Meulders et al. (2003) and Meulders et al. (2013). In Section 4 we describe the LCplfm function for probabilistic latent feature analysis of three-way binary data. In Section 5 we provide an example on the perception of midsize cars, and in Section 6 we use LCPLFMs to study the situational determinants of anger-related behavior. In Section 7 we conduct a small simulation study to investigate to what extent data generated using a specific decision rule can be fitted using another decision rule. Finally, in Section 8 we discuss topics that may be interesting to pursue in future research.

Probabilistic latent feature models
Probabilistic latent feature models (PLFMs, Maris et al. 1996) can be used to model objectby-attribute frequency data that are derived by aggregating binary object × attribute × person data across persons. The models have been used to analyze data from several substantive domains as marketing research (Candel and Maris 1997;Meulders 2013), personality psychology (Meulders 2013), psychiatric diagnosis (Maris et al. 1996), cross cultural psycholoy (Meulders, De Boeck, Van Mechelen, Gelman, and Maris 2001) and the facial perception of emotions (Meulders, De Boeck, Van Mechelen, and Gelman 2005).
Observed object-attribute associations are described using the observed variable D ijk which equals 1 if object j (j = 1, . . . , J) has attribute k (k = 1, . . . , K) according to person i (i = 1, . . . , I), and 0 otherwise. To explain observed object-attribute associations, PLFMs assume that persons first classify objects and attributes in terms of F binary latent features and that, in a next step, observed associations are derived by applying a decision rule to these classifications.
As an example consider consumers who judge for each of a set of car models (e.g., 'Porsche Cayenne', 'Ford S-Max', 'Toyota Prius') and for each of a set of attributes (e.g., 'low CO2 emissions', 'high status', 'comfortable', 'city-focused', 'powerful', 'spacious', 'safe') whether or not an attribute applies to certain car model. To explain binary car-attribute associations, PLFMs assume that consumers classify car models and attributes in terms of so-called latent features. In this example, the latent features could represent basic car properties that may be ascribed to a car and that can be linked to certain car attributes. For instance, the latent feature 'environmentally friendly' is likely to be attributed to the 'Toyota Prius', and this feature is likely to be linked with attributes as 'low CO2 emissions', 'city-focused', and 'high fuel efficiency'. Furthermore, the latent feature 'family car' is likely to be linked with the attributes 'spacious', 'safe', 'easy to enter' and this feature is likely to be attributed to a car model as the 'Ford S-Max'. Next, PLFMs assume that car-attribute associations are obtained by applying a decision rule to the pattern of features perceived in a certain car and the pattern of features linked to the attribute.
As another example, consider persons who judge for each of a set of situations and for each of a set of behaviors whether or not they would display a certain behavior (e.g., suppress anger, shout, quarrel) when they are being frustrated by someone in a certain situation (e.g., you are falsely accused of cheating on an exam, you bang your shins against a park bench, you find out that someone has told lies about you). To explain situation-behavior judgments PLFMs assume that persons classify situations and behaviors in terms of a number of latent features. In this case the latent features could for instance represent latent situational encodings (e.g., whether or not you are alone in the situation, whether the person that frustrates you has a higher status than you) that may elicit a certain behavior (shout, become irritated, bottle up anger). PLFMs assume further that situation-behavior judgments are derived by applying a decision rule to situation and behavior classifications.
The two-stage process of PLFMs can be formally described as follows: 1. The classification of object j in terms of latent features is modeled using latent Bernoulli variables. In particular, X ijkf ∼ Bern(σ jf ) equals 1 if object j has latent feature f (f = 1, . . . , F ) according to person i when she judges object-attribute pair (j, k), and 0 otherwise. Likewise, the classification of attribute k in terms of latent features is modeled using latent Bernoulli variables. More specifically, Y ijkf ∼ Bern(ρ kf ) equals 1 if attribute k is linked to latent feature f (f = 1, . . . , F ) according to person i when she judges object-attribute pair (j, k), and 0 otherwise. Note that upper-case notation is used here to denote random variables and lower-case notation will be used to denote specific realizations of the corresponding random variable.
2. It is assumed that the observed judgment of person i on object-attribute pair (j, k) results from applying a decision rule C(·) to the latent object and attribute classifications. Formally the decision rule specifies the conditional probability that an object is associated to an attribute given the corresponding object and attribute classifications, that is, . For deterministic decision rules, the observed association directly follows from the object and attribute classifications, that is P (D ijk = 1|X ijk , Y ijk ) can only take the values 0 or 1 and On the other hand, for probabilistic decision rules P (D ijk = 1|X ijk , Y ijk ) takes values in the interval [0, 1], that is the decision rule specifies the degree of the object-attribute association given the object and attribute classifications. Maris et al. (1996) discuss two deterministic non-compensatory decision rules. First, a disjunctive communality rule (DC) implies that an object is associated to an attribute if the object has at least one latent feature that is linked to the attribute, that is, Second, a conjunctive dominance (CD) rule implies that an object is associated to an attribute if the object has all the latent features that are linked to the attribute, that is, In addition to the described deterministic non-compensatory decision rules, we also propose a probabilistic compensatory so-called additive (ADD) decision rule which assumes that the conditional probability of an object-attribute association equals the average number of features that the object and attribute have in common, that is: From the above model assumptions X ijkf ∼ Bern(σ jf ) and Y ijkf ∼ Bern(ρ kf ) one may derive the object-attribute association probability: For the disjunctive communality (DC) rule one may derive that Likewise, for the conjunctive dominance (CD) rule one may derive that Finally, for the additive decision rule one may derive that Depending on the application at hand, and depending on available substantive theory a particular decision rule may be more appropriate. A disjunctive rule, for instance, may be used to formalize the idea that, when a feature is linked to the attribute, the presence of the feature in the object is a sufficient condition to observe an object-attribute association. For instance, to explain binary patient-symptom associations on the basis of latent syndromes, a disjunctive rule can be used to formalize the idea that a patient will have a symptom if he/she has at least one syndrome that implies the symptom (Maris et al. 1996).
On the other hand, a conjunctive rule may be used to formalize the idea that, in order to observe an object-attribute association, all the features linked to the attribute need to be present in the object. For instance, a conjunctive decision rule may be used to test the hypothesis of configural encoding in the processing of facial expressions (Meulders et al. 2005). More specifically this hypothesis implies that an emotion will only be perceived in a facial expression if all the facial features linked to the emotion are also activated when processing the facial expression.
The fact that observed variables D ijk are derived as a mapping of independent (Bernoulli distributed) latent variables implies that the likelihood of the PLFM can be expressed as a product of Bernoulli distributed variables: Note that D ijk is used to indicate the observed random variable and d ijk represents a specific realization. The vector d comprises the I × J × K observations d ijk (i = 1, . . . ; I, j = 1, . . . , J; k = 1, . . . , K).
Furthermore, as the basic PLFM assumes homogeneity of object-attribute association probabilities across persons, it is easy to see that the PLFM is actually a model for the aggregated J × K two-way two-mode frequency matrix that is obtained by summing the I × J × K three-way three-mode array of binary observations across persons. The likelihood of the basic PLFM reads

Latent class probabilistic latent feature models
Due to the assumptions involved, the basic PLFM is not well suited to model the observed three-way binary associations. First, the assumption that all observations D ijk are statistically independent may be unrealistic because object-attribute associations with a common object or attribute tend to be correlated across persons. As dependencies between objectattribute associations with a common element may actually be caused by qualitative individual differences in object or attribute perception (i.e., persons differ in the pattern of features they ascribe to objects or attributes), and as such individual differences are likely to occur, dependence is a realistic model assumption. Second, the assumption that object-attribute association probabilities are homogeneous across persons may be unrealistic in practice as it implies that there are no individual differences. To deal with these problems, different types of latent class extensions of the PLFM have been proposed.

Modeling dependencies
To deal with unrealistic independence assumptions, Maris (1999) and Meulders et al. (2003) developed PLFMs with adapted stochastic assumptions. More specifically, to model dependencies between object-attribute associations with a common element, one may assume that object-feature classifications are not realized again at each new judgment, but that they remain constant when a person judges which attributes apply to a certain object. Likewise, one may model dependencies between observations with a common attribute by assuming that attribute-feature classifications are not realized again at each new judgment, but that they remain constant when a person judges whether the attribute applies to each of the objects.
Models that assume constant object-feature classifications or constant attribute-feature classifications per person, model dependencies between object-attribute pairs with a common element by including qualitative individual differences in the model. In particular, a model with constant object-feature classifications assumes that persons differ in the pattern of features they ascribe to an object. In the same way, a model with constant attribute-feature classifications assumes that the pattern of features linked to each attribute is a source of individual differences.
Formally, models in which object-feature classifications or attribute-feature classifications remain constant across all the judgments made by a person can be described as constrained latent class models (Formann 1992). Consider for instance a model in which object-feature classifications remain constant across the judgments made by a person and in which attributefeature classifications are renewed at each new judgment made by the person. Let x ij = (x ij1 , . . . , x ijF ) indicate the pattern of features assigned by person i to object j. Assuming that judgments of different persons are statistically independent, and that judgements D ij = (D ij1 , . . . , D ijK ) of person i about the attributes that apply to object j are conditionally independent given x ij , the likelihood of a model with constant object-feature classifications reads: and with π jk|x the conditional probability of an object-attribute association which depends on the decision rule at hand. For instance, assuming a disjunctive communality rule one may derive that

Latent Class Probabilistic Latent Feature Analysis
The model defined by (1), (2) and (3) can be regarded as a constrained latent class model with F binary latent variables (X) at the object-person level and with F binary latent variables (Y ) at the object-attribute-person level. The model classifies persons in 2 F classes for each object, and it models 2 F × K conditional probabilities with K × F attribute parameters. Note that a model with constant attribute-feature classifications can be obtained by switching the role of objects and attributes in Equation 1.
As an alternative one may consider a model in which attribute-feature classifications remain constant across the judgements made by a person and in which object-feature classifications are renewed at each new judgements made by the person. Let y ik = (y ik1 , . . . , y ikF ) indicate the pattern of features assigned by person i to attribute k. Assuming that judgements of different persons are statistically independent, and that judgements D ik = (D i1k , . . . , D iJk ) of person i are conditionally independent given y ik , the likelihood of the model reads: and with π jk|y the conditional probability of an object-attribute association which depends on the decision rule at hand. For instance, assuming a disjunctive communality rule one may derive that The model defined by (4), (5) and (6) can be regarded as a constrained latent class model with F binary latent variables (Y ) at the attribute-person level and with F binary latent variables (X) at the object-attribute-person level. The model classifies persons in 2 F classes for each attribute, and it models 2 F × J conditional probabilities with J × F object parameters.
Finally, we note that a latent class extension of the PLFM in which both object-feature classifications and attribute-feature classifications remain constant within a person is not feasible if the involved decision rule is deterministic because it is not always possible to define latent object and attribute classifications (x ij , y ik ) so that d i = C(x ij , y ik ).

Modeling heterogeneity
To relax the assumption of homogeneous object-attribute association probabilities among persons one may use a latent class extension of the basic PLFM that includes class-specific object and/or attribute parameters . Let Z it indicate a binary latent variable that equals 1 if person i belongs to class t (t = 1, . . . , T ), and 0 otherwise. It is further assumed that P (Z it = 1) = ξ t with t ξ t = 1. Assuming that judgements of different persons are independent and that, within persons, judgements are conditionally independent given latent class membership, the likelihood of a model with both class-specific object and attribute parameters equals with π jk|t being the conditional object-attribute association probability which depends on the decision rule of the model. For instance, using a disjunctive communality rule one may derive that The assumption that both object and attribute parameters are class-specific implies that the interpretation of the features may be different in each of the latent classes. As an alternative one may also constrain the object or attribute parameters in Equation 8 to be homogeneous across classes. In particular, using only class-specific object parameters the conditional probabilities equal Likewise, using only class-specific attribute parameters, the conditional probabilities equal A model with only class-specific object or only class-specific attribute parameters may be interesting from a substantive point of view. For instance, a model with class-specific object parameters and homogeneous attribute parameters implies that the features have the same interpretation for all persons, but that persons of different classes may have different probabilities to ascribe a particular feature to an object. When applying the model to analyze product-by-attribute judgements of consumers, for instance, it could be meaningful to assume that product perception is driven by the same set of features for all consumers (i.e., there is a common reality), but that the probability to ascribe a latent feature to a product depends on the latent class the person belongs to ).
Finally, note that the model defined by Equation 7 and Equation 8 can be regarded as a constrained latent class model with T binary latent variables (Z) at the person level and with 2 × F binary latent variables (X, Y ) at the object-attribute-person level. The model classifies persons in T classes, and it models J × K × T conditional probabilities with J × F × T object parameters and K × F × T attribute parameters.

Modeling both dependency and heterogeneity
To model heterogeneity in the object-attribute association probability among persons as well as dependencies between object-attribute pairs with a common element, Meulders et al. (2013) developed multilevel LCPLFMs with constant object-feature or attribute-feature classifications that include class-specific object and/or attribute parameters. Consider for instance a model that assumes constant object-feature classifications per person as well as class-specific object and attribute parameters. This type of model actually includes two types of person classifications. First, the model classifies persons in one of T latent classes to model heterogeneity in the object and attribute parameters. This (outer) classification of persons is represented by a T -element vector Z i = (Z i1 , . . . , Z iT ) in which Z it equals 1 if person i belongs to class t and in which Z it equals 0 otherwise. It is further assumed that P (Z it = 1) = ξ t with t ξ t = 1. Second, a model with constant object-feature classifications classifies persons in 2 F latent classes for each object. This (inner) classification is represented by x ij which indicates the pattern of latent features assigned by person i to object j. Assuming that judgements of different persons are independent, and that judgements of the same person are conditionally independent given latent class membership, the likelihood of this model can be expressed as follows: and with π jk|x,t a conditional probability that depends on the decision rule involved. More specifically, for a disjunctive communality rule one may derive that: Table 1 provides an overview of (multilevel) LCPLFMs that can be obtained by combining different dependency and heterogeneity assumptions. The basic PLFM (Maris et al. 1996;Meulders 2013) is indicated in Table 1 as m0. This model can be estimated with the plfm function of the plfm package. Models D and E have been described by Meulders et al. (2003) for three-way three-mode binary data, and by Maris (1999) for two-way two-mode binary data. Models A to C and m1 to m6 have been described by Meulders et al. (2013). In this paper we present the function LCplfm which has been added to the plfm package to estimate disjunctive, conjunctive and additive models m1 to m6 and models D and E. Models that only allow to model heterogeneity in the model parameters (i.e., models A to C) are not included in the LCplfm function because they are usually outperformed by models that allow to model both dependencies between object-attribute pairs with a common element and heterogeneity in the model parameters (i.e., models m1 to m6).

Taxonomy of models
Why do models m1 to m6 typically outperform model A to C on real data? First, in real data, object-attribute pairs with a common element (i.e., a common object or attribute) tend to be correlated and, unlike models A to C, models m1 to m6 are specifically designed to model such dependencies. Therefore, when applying the LCPLFMs to real data, models m1 to m6 typically have a higher likelihood than models A to C. Second, as the distribution of objectfeature classification patterns (in models m1 to m3) and the distribution of attribute-feature classification patterns (in models m4 to m6) is modeled using an independence model (see Equation 2 and Equation 5), modeling dependencies between object-attribute pairs with a common element does not increase the number of model parameters. That is, assuming T=t latent classes are used to model heterogeneity among object and/or attribute parameters, models {A, m1, m4} are equally complex. In the same way, models {B, m2, m5} and models {C, m3, m6} have the same number of model parameters. Therefore, if models m1 to m6 have a higher likelihood than models A to C they will also be selected by AIC or BIC because the models are equally complex.
As some of the models in Table 1 are equivalent, six different algorithms are sufficient to estimate 24 models (i.e., disjunctive, conjunctive and additive models of type m1 to m6, D and E). First, three different algorithms (i.e., for disjunctive models m1 to m3) are needed to estimate disjunctive models m1 to m6 and models D and E. More specifically, by switching the role of objects and attributes, model m5 can be obtained from m1, m4 can be obtained from m2, and m6 can be obtained from m3. Furthermore, models D and E can be obtained respectively by fitting models m1 and m4 with one latent class (T = 1). Second, as π CD jk (1 − σ, ρ) = 1 − π DC jk (σ, ρ) conjunctive models can be obtained by fitting the corresponding disjunctive model to transformed data D * ijk = 1 − D ijk and by applying a transformation to the object parameters, namely σ * = 1 − σ. Third, using the same reasoning as for disjunctive models, only three algorithms (i.e., for additive models m1 to m3) are needed to estimate additive models of type m1 to m6, D and E.

Components of the package
The R package plfm comprises the following components for conducting latent class probabilistic latent feature analysis: • The function LCplfm can be used to compute point estimates of disjunctive or conjunctive LCPLFMs with a specific number of features, a specific dependency assumption (i.e., constant object classification per person or constant attribute classification per person) and with a specific heterogeneity assumption (i.e., class-specific object parameters and/or class specific attribute parameters). In addition, the function computes standard errors of the parameters as well as criteria for model selection and descriptive model fit.
• The function stepLCplfm can be used to fit a series of disjunctive, conjunctive or additive LCPLFMs with different numbers of features and different numbers of latent classes.
• There are print and plot methods provided for each function. The plot method of the LCplfm function, can be used to visualize (class-specific) point estimates of object or attribute parameters and corresponding 95% confidence intervals (CI) for each feature.
The plot method of the stepLCplfm function can be used to visualize the value of model selection criteria (i.e., AIC, BIC) or descriptive fit measures for models with different numbers of latent classes and with different numbers of features.

Function for latent class probabilistic latent feature analysis
The function LCplfm can be used to compute parameter estimates for disjunctive, conjunctive or additive LCPLFMs with a particular number of latent features and with a specific number of latent classes. In addition, the function computes standard errors of the parameter estimates, as well as criteria for model selection, and measures of descriptive model fit. The LCplfm function takes the following arguments: • data is a I × J × K array of binary observations. Observation (i, j, k) equals 1 if object j (j = 1 . . . , J) is associated to attribute k (k = 1, . . . , K) according to person i (i = 1, . . . , I).
• F indicates number of latent features included in the model.
• T indicates the number of latent classes used to classify persons.
• M indicates number of runs that should be conducted using random starting points.
• emcrit1 is used to specify the convergence criterion for estimating M candidate models with the EM algorithm.
• emcrit2 is used to specify the convergence criterion for estimating the best model (out of M models) with the EM algorithm.
• model is used to specify the type of dependency and heterogeneity assumption included in the model. More specifically, model = 1, model = 2, and model = 3 represent models with a constant object classification per person and with, respectively, class-specific object parameters, class-specific attribute parameters, and class-specific object and attribute parameters (i.e., models m1, m2 and m3 in Table 1). Furthermore, model = 4, model = 5, and model = 6 represent models with a constant attribute classification per person, and with, respectively, class-specific object parameters, class-specific attribute parameters and class-specific object and attribute parameters (i.e., models m4, m5 and m6 in Table 1).
• start.objectparameters is an array of object parameters to be used as a starting point in each of M runs. The size of the array equals J × F × T × M when model = 1, 4, 3, 6 and J × F × M when model = 2, 5. If start.objectparameters = NULL, randomly generated starting values are used.
• start.attributeparameters is an array of attribute parameters to be used as a starting point in each of M runs. The size of the array equals K × F × T × M when model = 2, 5, 3, 6 and K × F × M when model = 1, 4. If start.attributeparameters = NULL, randomly generated starting values are used.
• start.sizeparameters is a T × M -vector of latent class size parameters to be used as starting point in each of M runs. If start.sizeparameters = NULL, randomly generated starting values are used.
• delta indicates the precision used to compute standard errors of the parameters with the method of finite differences.
• printrun is used to specify printing options. If printrun = TRUE the mapping rule (i.e., disjunctive, conjunctive or additive), the number of features (F), the number of latent classes (T) and the number of the run (out of M runs) are printed to the output screen, whereas printrun = FALSE suppresses the printing.
• update.objectparameters is a binary valued array that indicates for each object parameter whether it has to be estimated from the data or constrained to the starting value. A value of 1 means that the corresponding object parameter is estimated and a value of 0 means that the corresponding object parameter is constrained to the starting value provided by the user. The size of the array equals J × F × T when model = 1, 4, 3, 6 and J × F when model = 2, 5. If update.objectparameters = NULL all object parameters are estimated from the data.
• update.attributeparameters is a binary valued array that indicates for each attribute parameter whether it has to be estimated from the data or constrained to the starting value. A value of 1 means that the corresponding attribute parameter is estimated and a value of 0 means that the corresponding attribute parameter is constrained to the starting value provided by the user. The size of the array equals K × F × T when model = 2, 5, 3, 6 and K × F when model = 1, 4. If update.attributeparameters = NULL all attribute parameters are estimated from the data.
• Nbootstrap indicates the number of bootstrap iterations to be used for simulating the reference distribution of log odds-ratio dependency measures.

EM algorithm for computation of the posterior mode(s) of multilevel LCPLFMs
As the complete-data likelihood of LCPLFMs has a simple structure, it is convenient to compute maximum-likelihood estimates of the model parameters with an EM algorithm (Maris et al. 1996;Meulders et al. 2003Meulders et al. , 2013. As an example, Meulders et al. (2013) provide a detailed derivation of the EM algorithm for model m1 in Table 1. The derivation of the EM algorithm for other models in Table 1 is similar.
Furthermore, as maximum likelihood estimates will not always exist in the interior of the parameter space, it is convenient to impose a convex prior distribution on the model parameters as this will guarantee the existence of posterior mode estimates within the interior of the parameter space. In particular, for the LCPLFM with class-specific object and attribute parameters we specify the following conjugate convex prior distribution: with α σ = β σ = α ρ = β ρ = 1 and γ t = 2 (t = 1, . . . , T ). Note that these constants and their weights in Equation 12 are chosen to obtain the default prior used by the standard software Latent Gold (Vermunt andMagidson 2005, 2013).

Computation of asymptotic standard errors
Let θ = (σ, ρ, ξ) and letθ denote the posterior mode of the model. The standard error of the parameter θ i can be computed as follows: To compute the second derivative of the log posterior with respect to the parameter θ i we use a central difference approximation to numerically differentiate the gradient g(θ) = ∂log p(θ|d)/∂θ. More specifically, with θ (−i) a vector in which the i-th parameter has been omitted. Section A of the appendix describes how the gradient g(θ) can be analytically derived for the disjunctive model. For additive models, the derivation is similar.

Estimation strategy
For F > 1 the posterior distribution of LCPLFMs is always multimodal. Different local maxima may exist, and in addition, for each local maximum the posterior distribution of a model with F latent features contains F ! identical posterior modes because the labels of the latent features can be switched.
As the posterior distribution can be highly multimodal, a two-stage estimation strategy is used to efficiently locate the model with the highest posterior density. In a first stage, the EM algorithm is used to estimate M candidate models using random starting points and using the less strict convergence criterion emcrit1 with default value 10 −3 . In a second stage, the EMalgorithm is used to estimate the best model (i.e., with the highest posterior density among the M candidate models) using the more strict convergence criterion emcrit2 with default value 10 −8 . As many local maxima may exist, it is recommended to estimate a sufficiently high number of candidate models, that is specify M ≥ 50.

Model selection criteria
When applying LCPLFMs to real data one needs to select an appropriate model to describe the data. As there are many possible models one could start by fitting the basic PLFM to get an idea about the approximate number of features needed to explain the object × attribute associations. In a second step, LCPLFMs with different dependency and heterogeneity assumptions could be fitted and model selection criteria could be used to determine the best among the candidate models. In a third step, descriptive fit measures can be inspected to further evaluate the validity of the selected model. Finally, as LCPLFMs are often used as an exploratory tool, practical or substantive arguments can also play a role to select a final model, for instance, meaningful feature interpretation, sufficient size of extracted latent classes, accuracy of the parameter estimates, and so on.
To choose the best model among a set of candidate LCPLFMs one may use model selection criteria as AIC (Akaike 1974) or BIC (Schwarz 1978). Both criteria are computed as the sum of a badness-of-fit term (minus twice the log-likelihood of the fitted model) and a penalty term which measures the complexity of the model. For AIC and BIC the penalty term equals 2k and k log(I), respectively with k being the number of free model parameters and I being the number of persons who have judged the object-attribute pairs. AIC and BIC both have desirable properties. AIC is an efficient model selection criterion. This means that when the true model is infinite dimensional, the prediction error in the model selected with AIC is asymptotically the same as the prediction error of the best candidate model. On the other hand BIC is asymptotically consistent, which means that it will select the true model from a set of candidate models if the true model is finite dimensional and among the candidate models (Liu andYang 2011, p. 2075). A simulation study with LCPLFMs has shown that BIC has a good performance to select the true model from a set of candidate models (Meulders et al. 2003). However, in real data applications it is unlikely that the true model is among the candidate models and hence AIC may be more appropriate than BIC.

Measures of model fit
When evaluating the fit of LCPLFMs we are especially interested in two aspects, (1) the fit of the model to the marginal object-by-attribute table, and (2) to what extent observed individual differences are captured by the model.
First, to describe the fit of the model to the marginal object-by-attribute tablet, the LCplfm function computes the correlation and squared correlation (i.e., variance accounted for (VAF)) between observed and expected frequencies in the marginal object × attribute table.
Furthermore, the plfm function includes an absolute Pearson-χ 2 goodness-of-fit test that can be used to test the null hypothesis that model m0 fits the marginal object × attribute frequency table. However, with increasing sample size this test may become very powerful to detect differences between the true and the fitted model, and as the fitted model is usually incorrect in some sense the test is likely to be significant. Therefore, it may also be interesting to study the descriptive fit of the model. For the basic PLFM (model m0), the correlation between observed and expected frequencies in the object × attribute table is a useful measure of descriptive fit, because it indicates to what extent the latent features included in the model can model the object × attribute interaction.
In contrast to PLFMs, LCPLFMs do not directly fit the object × attribute table of frequencies.
For instance, an LCPLFM of type m1 models frequencies of patterns d ij . The relevant goodness-of-fit test for this type of model is a Pearson-χ 2 test that compares observed and expected frequencies of the patterns d ij . However, as LCPLFMs are usually applied to data sets with many objects and attributes, the number of patterns d ij is usually large (e.g., if K = 10 there are 2 10 = 1024 patterns for object j). As a result, the Pearson-chi square statistic will not have a χ 2 distribution because the table of observed frequencies is sparse and expected frequencies are too small.
Although the LCplfm function does not directly fit the frequencies in the object × attribute table it still reports the correlation between observed and expected frequencies for this marginal table, because this correlation gives an indication of how well the object × attribute interaction can be explained by the latent features included in the model. Second, LCPLFMs model observed individual differences by classifying persons with respect to the pattern of features they ascribe to an object or attribute, or by assuming class-specific object or attribute parameters. If the involved classifications represent the observed individual differences well, observed object-attribute associations will be conditionally independent (i.e., conditional on latent class membership), and dependencies between pairs of observed associations will match dependencies predicted by the model.
To check whether observed dependencies between pairs of object-attribute associations D ijk and D ij k are fitted by the model, one may use a parametric bootstrap procedure to simulate the distribution of the dependencies under the model, and check whether observed dependencies are in the corresponding 100(1 − α)% CI.
The dependency between D ijk and D ij k can be measured using the log-odds ratio computed on the D ijk × D ij k cross-table (de la Torre 2008): Note that we add 0.5 to each cell of the cross-table to make sure that log-odds ratio can be computed if a cell frequency equals 0.

Function for estimating a series of LCPLFMs
The function stepLCplfm can be used to fit a series of disjunctive or conjunctive latent class probabilistic latent feature models that assume minF to maxF latent features and minT to maxT latent classes. As stepLCplfm repeatedly calls the LCplfm function, it takes the same arguments except the number of features and the number of latent classes which are specified with the following arguments: • minF is the minimum number of latent features.
• maxF is the maximum number of latent features.
• minT is the minimum number of latent classes.
• maxT is the maximum number of latent classes.

Example: Perception of midsize cars
The list car2 contains data of 147 persons who indicated for each of 12 midsize cars whether or not they have each of 23 attributes. In an initial analysis we use the basic PLFM (i.e., model m0 in Table 1) to model car-attribute associations on the basis of binary latent features. More specifically, after loading the data, we use the function stepplfm to estimate disjunctive PLFMs with 1 up to 7 latent features. For each model M (=50) runs with random starting points are conducted. A set.seed() statement is used to guarantee the reproducibility of the results.
R> data("car2") R> set.seed(723756) R> car2.m0.lst <-stepplfm(freq1 = car2$freq1, freqtot = car2$freqtot, + M = 50, minF = 1, maxF = 7, emcrit1 = 10e-5, emcrit2 = 10e-8) A print of the results shows that a model with 5 latent features has lowest BIC (i.e., 36566) and that a model with six features has lowest AIC (i.e., 35952). Furthermore, the results indicate that models with 5 and 6 features fit the frequencies in the marginal car-by-attribute table rather well (i.e., the correlation between observed and predicted frequencies equals 0.968 and 0.983, respectively), although both models fail to fit the car-by-attribute frequencies in an absolute sense (χ 2 = 433, df = 101, p < 0.01 and χ 2 = 278, df = 66, p < 0.01, respectively). In a subsequent analysis, we use LCPLFMs to model person differences in car perception, and to model dependencies between car-attribute pairs with a common element. Although the sixfeature model has a lower AIC value, we present the results for the five-feature model because interpretation and descriptive fit of both models are similar. More specifically we use the function stepLCplfm to estimate disjunctive five-feature latent class PLFMs with only classspecific car parameters (T = 1, . . . , 3) and with either a constant car-feature classification per person (i.e., model m1) or a constant attribute-feature classification per person (i.e., model m4). The assumption that attribute parameters are constrained to be equal across classes is convenient as it implies that the latent features have the same interpretation for each of the latent classes.
Furthermore, among models with constant attribute-feature classification, there is evidence for heterogeneity of car parameters across respondents. In particular AIC is lowest for the three-class model (i.e., 34937), whereas BIC is lowest for the two-class model (i.e., 35693). In what follows, the parameters of the five-feature two-class model will be interpreted more in detail. We present the results of the two-class solution rather than the three-class solution because one of the classes in the three-class solution is very small (4% of the respondents), resulting in inaccurate estimates for the corresponding car parameters.
• logpost.runs is a list containing the logarithm of the posterior density for each of the M computed models.
• best is an index which indicates the model with the highest posterior density among each of the M computed models.
• objpar is a J × F matrix or a J × F × T array of estimated object parameters for the best model.
• attpar is a K × F matrix or a K × F × T array of estimated attribute parameters for the best model.
• sizepar is a T -vector of estimated class size parameters for the best model.
• SE.objpar is a J × F matrix or a J × F × T array of estimated standard errors for the object parameters of the best model.
• SE.attpar is a K × F matrix or a K × F × T array of estimated standard errors for the attribute parameters of the best model.
• SE.sizepar is a T -vector of estimated standard errors for the class size parameters of the best model.
• gradient.objpar is a J × F matrix or a J × F × T array containing the gradients of object parameters for the best model.
• gradient.attpar is a K × F matrix or a K × F × T array containing the gradients of attribute parameters for the best model.
• gradient.sizepar is a T -vector containing the gradients of class size parameters for the best model.
• fitmeasures is a list of fit measures for the best model including log-likelihood, log posterior density, deviance, information criteria (AIC and BIC), and two measures of descriptive fit on the object-by-attribute table of frequencies (i.e., correlation between observed and expected frequencies and variance accounted for by the model).
• postprob is a I × T matrix of posterior probabilities for the best model.
• margprob.JK is a J × K matrix of expected marginal object-attribute association probabilities for the best model.
• condprob.JKT is a J × K × T array of expected conditional object-attribute association probabilities (i.e., expected probability of object-attribute association given latent class membership).
• report.OR.attpair is a matrix that contains per object, for all attribute pairs, the observed OR dependency (OR.obs), the expected OR dependency (OR.mean) and the upper and lower bounds of the corresponding simulated 95 and 99 percent confidence interval (OR.p025, OR.p975, OR.p005, OR.p995).
• report.OR.objpair is a matrix that contains per attribute, for all object pairs, the observed OR dependency (OR.obs), the expected OR dependency (OR.mean) and the upper and lower bounds of the corresponding simulated 95 and 99 percent confidence interval (OR.p025, OR.p975, OR.p005, OR.p995).
In order to interpret the latent features, one may use the plot method to visualize point estimates and a 95% CI of car-and attribute parameters for a specific feature. For instance, to visualize the car-and attribute parameters for feature 1 one may use the following code: R> plot(car2.m4.lst [[1, 2]], element = "attribute", feature = 1, + positionlabel = -0.6, main = "Attribute parameters feature 1") R> plot(car2.m4.lst [[1, 2]], element = "object", feature = 1, + main = "Car parameters feature 1") The visualization of the parameters for other features is similar. The top panels of Figure 1 to Figure 5 show point estimates and 95% CIs for the attribute parameters of feature 1 to feature 5, respectively. The bottom panels of Figure 5 contain point estimates and 95% CIs for the car parameters of the two latent classes. Note that, classes 1 and 2 contain respectively 18% and 82% of the persons in the sample.
As can be seen from an inspection of estimated attribute parameters in Figure 1 to Figure 5, the latent features have a meaningful interpretation. In particular, feature 2 is ascribed to environmentally friendly cars with high fuel efficiency that have low CO 2 emissions (e.g., 'Toyota Prius'). Feature 1 and feature 3 are similar in that they are ascribed to comfortable and spacious cars that are suited for family use. However, these features also have a different interpretation as feature 1 is ascribed to cars that are also being perceived as safe and reliable (e.g., 'Volkswagen Passat') whereas feature 3 is ascribed to cars that are also easy to exit or enter. Feature 4 and feature 5 are similar in that they are ascribed to luxurious sporty cars with a nice design. In contrast, these features also have a different interpretation as feature 5 is ascribed to cars that are also being perceived as powerful and as having a high status and a high trade-in value (e.g., 'Mercedes C-Class', 'BMW Series 3', and 'Audi A4') whereas feature 4 (which is ascribed to 'Peugeot 508') is not linked to these attributes.
Further inspection of the car parameters indicates that, depending on the latent class they belong to, persons may have a different perception of specific car models. For instance, unlike persons of class 1, persons of class 2 have a much higher probability to ascribe feature 3 to a 'Citroen C5' (i.e., 0.57 versus 0.02). In contrast, persons of class 1 are more likely to ascribe feature 1 to a 'Citroen C5' (i.e., 0.34 versus 0.10). Hence persons of class 2 perceive the 'Citroen C5' as a comfortable and spacious family car that is also easy to exit or enter, but that is not particularly reliable or safe. On the other hand, persons of class 1 perceive the 'Citroen C5' as a comfortable, spacious family car that is also reliable and has good safety, but that is not particularly easy to exit or enter. As another example, we see that persons of class 2 are more likely to ascribe feature 5 (luxurious, sporty, nice design, powerful, high status, high trade-in value) to 'Mercedes C-class', 'Audi A4' and 'BMW Series 3' than persons of class 1. On the other hand, persons of class 1 are more likely to ascribe feature 4 (luxurious, sporty, nice design, comfortable but lacking high status, high trade-in value) to 'BMW Series 3' than persons of class 2.

Example: Determinants of anger-related behavior
As a second example we analyze data on the situational determinants of anger-related behavior (Kuppens et al. 2004). The data, which are stored in the list anger2, consist of the binary judgements of 115 first-year psychology students who indicated whether or not they would display each of 14 anger-related behaviors when being angry at someone in each of 9 situations. The 14 behaviors consist of 7 pairs that reflect a particular behavior type that can be elicited in situations in which one is angry at someone (1) anger-out (you flew off the handle, you started a fight), (2) avoidance (you avoided a confrontation, you went out of the other's way), (3) social sharing (you unburdened your heart to others, you told others what had happened), (4) assertive behavior (you said what was bothering you in a direct and sober way, you calmly explained what was bothering you), (5) indirect behavior (you showed something was bothering you without saying anything, you started to sulk), (6) anger-in (you suppressed your anger, you bottled up your anger), and (7) reconciliation (you reconciled, you talked things out). The 9 situations are constructed by crossing the levels of two factors with three levels: (1) the extent to which one likes the instigator of anger (like, unfamiliar, dislike), and (2) the status of the instigator of anger (lower status, equal status, higher status).
An important goal of personality research is to identify personality types with stable situationbehavior relations (Mischel et al. 2002). As will be illustrated in this section, LCPLFMs are especially suited to meet this challenge. First, they aim for a parsimonious description of situation-behavior associations using latent features that represent basic behavior types. Second, they allow to model heterogeneity in the situation-behavior association probability by assuming latent person classes with class-specific situation-feature and/or behavior-feature parameters. Note that in the present example behavior-feature parameters indicate to what extent a behavior reflects a certain behavior type, and situation-feature parameters indicate to what extent a certain behavior type is appropriate in a situation.
To study the occurrence of the 7 behavior types hypothesized in the design of the study across situations we will adopt a confirmatory approach. In particular we use a model with 7 latent features each of which represents a behavior type hypothesized in the study. Furthermore, we assume that a behavior-feature parameter equals 0 if the behavior does not reflect the behavior type hypothesized in the study. Table 2 shows the matrix of behavior-feature parameters to be estimated assuming that these parameters are equal across persons. On the other hand, as we want to study to what extent situations affect the occurrence of specific behavior types and how this differs among persons, we consider a model with class-specific situation parameters.
To evaluate whether the behaviors included in the study actually measure the hypothesised behavior types of the study design, we may inspect the estimated behavior parameters and the corresponding standard errors.
R> round(m1.constr.T2$attpar, 2) Hence, we can conclude that behavior parameters are accurately estimated (i.e., standard errors are small) and that the link between the behaviors and the assumed behavior type is generally rather strong, except for the behavior 'starting to sulk', which is only weakly linked (0.29) to the intended behavior type (i.e., indirect behavior).
As can be seen in Figure 6 and Figure 7 the probability that a certain behavior type is considered appropriate in a situation by a person depends on the latent class the person belongs to and on situation characteristics. First, it turns out that the two latent classes can be interpreted as an 'approach' and an 'avoidance' class. That is, when being angry at someone, persons in class 1 are generally more likely to consider approach behaviors appropriate, and persons in class 2 are more likely to consider avoidance behaviors and social sharing appropriate. Second, situation characteristics have a systematic effect on the extent to which behavior types are considered appropriate in the situation. Anger-in behaviors, for instance, are more considered appropriate if the status of the person one is angry with increases. In the same way, anger-out behaviors are more considered appropriate if the status of the person one is angry with decreases. Assertive behaviors are more likely considered adequate if the person one is angry with is liked or if he/she has a lower status. Reconciliation is especially considered appropriate if one likes the person one is angry with and it is considered inappropriate if the person one is angry with has a higher/equal status and is not liked/unfamiliar.

Example: Simulated data
When applying LCPLFMs one may wonder to what extent a model using a specific decision rule can fit data generated by another decision rule, and relatedly, which decision rule is more flexible to fit data generated by other decision rules. Furthermore, one may wonder about the quality of the parameter recovery when the fitted model involves a correct or incorrect decision rule.
To research these questions, we conduct a small simulation study. Data generated using a disjunctive, conjunctive or additive decision rule are fitted using models with a disjunctive, conjunctive or additive rule. Apart from the decision rule, true and fitted models are assumed to be equivalent. That is, true and fitted models always have the same number of features F (F= 1, . . . , 4) and always are of type m1 (i.e., assuming constant object-feature classification and class-specific object parameters) with T= 2. For each of 12 true models (i.e., disjunctive, conjunctive and additive models with 1 up to 4 features) 100 data sets are generated using randomly sampled true parameter values and assuming 10 (= J) objects, 15 (= K) attributes and 200 (= I) persons. Each model is fitted using M= 40 runs of the EM-algorithm with random starting points.
Third, the results indicate that when the true model is additive, the additive model only has a slightly better fit than the disjunctive or conjunctive model. In fact, especially the disjunctive model can fit data generated by the additive model well. Moreover, the difference in fit between additive, disjunctive and conjunctive models decreases if the number of features increases.
In sum, we may conclude that when F > 1 and with an increasing number of features, additive models become less flexible to fit data generated from disjunctive or conjunctive models. In addition, with an increasing number of features in the true model, conjunctive and especially disjunctive models can fit data generated by the additive model rather well. Finally, we note that other fit measures as the proportion of variance accounted for in the J × K table and the proportion of OR dependencies between object-attribute pairs with a common object or attribute that are in the simulated 95% CI lead to the same conclusions.
Finally, further inspection of (a sample of) the simulations shows that parameter recovery is usually very good (i.e., correlations between true and estimated object/attribute parameters of 0.99) when the decision rule of the fitted model is also used for generating the data. Note however, that for models with F > 2 it may be required to (manually) switch the feature/class labels of the estimated model so that they match the true feature/class labels.
On the other hand, when the decision rule of the estimated model is not the same as the decision rule of the true model, parameter recovery of object/attribute parameters is usually poor. In particular, fitting a disjunctive (conjunctive) model to data generated by a conjunctive (disjunctive) model often leads to a poor recovery of object and attribute parameters. This is probably because the probability of an observed association is a very different function of the object and attribute parameters in disjunctive and conjunctive models. On the other hand, when fitting a disjunctive (additive) model to data generated by an additive (disjunctive) model, correlations between true and estimated parameters may be rather high (i.e., > 0.85), but estimated object/attribute parameters may have considerable bias. Finally, note that recovery of true latent class sizes is generally quite good, also if the incorrect decision rule is used.
To summarize, parameter recovery of object/attribute parameters is expected to be satisfactory if the true model involving the correct decision rule is used for fitting the data and is expected to be poor if a model with an incorrect decision rule is fitted. Note that the poor recovery of models with an incorrect decision rule should not be considered problematic because the simulation study shows that models with a correct decision rule will generally fit better than models with an incorrect decision rule. In addition, a meaningful substantive interpretation of the features should also help to select the appropriate decision rule.

Discussion
For LCPLFMs several types of model extensions could be interesting to pursue in future research. A first extension is related to modeling correlations among the latent object or attribute variables. Currently, the models included in the LCplfm function all use an independence model for the marginal distribution of the latent feature patterns (see Equation 2 and Equation 5). Using an independence model has the important advantage that models are parsimonious and that objects and attributes are described using a limited number of parameters. However, if needed correlations among latent object or attribute variables can be modeled in several ways. As an example we consider modeling correlations among the latent object variables in models of type m1. A first approach to model correlations among the latent variables X ij1 , . . . , X ijF is to specify a saturated model for the 2 F patterns x ij , that is p(x ij |σ j ) = σ jq with q = 1, . . . , 2 F and q σ jq = 1 (Maris 1999). A disadvantage of this approach is that the number of parameters increases rapidly as a function of the number of latent features, leading to overparameterized models. A second approach is to model the multinomial probability p(x ij |σ j ) with a multinomial logit model including main-effects, second-order interactions, third-order interactions and so on (Vermunt and Magidson 2005). For instance, the following multinomial logit model includes main-effects and second-order interactions among the latent variables X ij1 , . . . , X ijF : with h = {1, . . . , F }. LCPLFMs using this kind of parameterization to include interactions among latent variables can be estimated using the syntax module of Latent Gold (Vermunt and Magidson 2013). For an example of estimating LCPLFMs with Latent Gold see . A possible drawback of models including interactions among latent variables is that selecting the model with the appropriate correlation structure may be difficult as there are many alternative models. A third approach to model correlation among the latent variables is to use a mixture of independence models with class-specific parameters, that is, assuming a mixture with Q components q = 1, . . . , Q: LCPLFMs that use a mixture to model the marginal distribution of the latent feature patterns can be estimated using the syntax module of Latent Gold (Vermunt and Magidson 2013).
A second extension that may be of interest is adapting LCPLFMs for other data structures. The models that are now included in the LCplfm function have been developed to analyze fully crossed binary three-way three-mode data. However, it could for instance be interesting to adapt the models to the case of nested binary two-way three-mode data. An example of such data are respondents i (i = 1, . . . , I j ) of country j (j = 1, . . . , J) who indicate for k (k = 1, . . . , K) aspects of national pride (e.g., the performance of the economy, the quality of the social security system, quality of the educational system, an effective health-care system, a powerful army, past achievements in sports, science etc.) whether or not they take pride in their country with respect to this aspect. Let the binary variable D ijk equal 1 if respondent i of country j is proud of his/her country with respect to aspect k, and 0 otherwise. Using an LCPLFM to model the data we assume that pride in the K aspects can be explained from F more basic sources of national pride (e.g., being proud about past achievements, being proud about nowadays achievements, etc.). Suppose the binary latent variable X ijf ∼ Bern(σ jf ) equals 1 if respondent i of country j takes pride in the basic source f , and 0 otherwise. Furthermore, the latent variable Y ijkf ∼ Bern(ρ kf ) equals 1 if aspect k measures the basic source of pride f when respondent i of country j judges statement k, and 0 otherwise. Using a disjunctive LCPLFM we assume that a respondent is proud of an aspect if he/she takes pride in at least one basic source of pride measured by the aspect. An LCPLFM that models dependencies among the responses of each respondent has the following likelihood: The previous model assumes measurement invariance because aspect-feature parameters ρ kf are the same for respondents of different countries. The measurement invariance assumption can now be relaxed by assuming that respondents belong to latent classes with class-specific aspect-feature parameters. Let Z ijt be equal to 1 if respondent i of country j belongs to class t (t = 1, . . . , T ), and 0 otherwise. Furthermore it is assumed that P (Z ijt = 1|ξ) = ξ t with t ξ t = 1 and that Y ijkf |Z ijt = 1 ∼ Bern(ρ kf t ). An LCplfm that models dependencies among the responses of each respondent, and that relaxes the assumption of measurement invariance by assuming class-specific aspect-feature parameters has the following likelihood: Third, the present paper uses a latent class extension of PLFMs for modeling qualitative individual differences in object or attribute classification (i.e., the pattern of features a person assigns to an object or attribute) and for modeling person heterogeneity in object and/or attribute parameters. As PLFMs are actually constrained latent class models, using a latent class approach for modeling qualitative individual differences and person heterogeneity in the model parameters was straightforward. An advantage of using a latent class approach is that individual differences can be described in terms of person types. In domains as personality psychology or marketing this may be a natural way to proceed. However, a disadvantage of the latent class approach is that many local maxima may exist leading to many different solutions with a comparable fit. Furthermore, the assumption that reality is categorical may be unrealistic. As alternative, it could be interesting to investigate how a random effects approach can be used for modeling person heterogeneity in object or attribute parameters of (LC)PLFMs. Furthermore, it would be interesting to compare existing LCPLFMs or future random effects extensions with (random effects) structural equations models that have been developed for binary three-way three-mode data (González, De Boeck, and Tuerlinckx 2008).
Finally, to speed up the computation of LCPLFMs it would be interesting to use parallel computing in the LCplfm function when using the EM algorithm to estimate M candidate models from random starting points (Hofert and Mächler 2016;Schmidberger, Morgan, Eddelbuettel, Yu, Tierney, and Mansmann 1978).