lslx : Semi-Conﬁrmatory Structural Equation Modeling via Penalized Likelihood

Sparse estimation via penalized likelihood (PL) is now a popular approach to learn the associations among a large set of variables. This paper describes an R package called lslx that implements PL methods for semi-conﬁrmatory structural equation modeling (SEM). In this semi-conﬁrmatory approach, each model parameter can be speciﬁed as free/ﬁxed for theory testing, or penalized for exploration. By incorporating either a elastic net or minimax concave penalty, the sparsity pattern of the parameter matrix can be eﬃciently explored. Package lslx minimizes the PL criterion through a quasi-Newton method. The algorithm conducts line search and checks the ﬁrst-order condition in each iteration to ensure the optimality of the obtained solution. A numerical comparison between competing packages shows that lslx can reliably ﬁnd PL estimates with the least time. The current package also supports other advanced functionalities, including a two-stage method with auxiliary variables for missing data handling and a reparameterized multi-group SEM to explore population heterogeneity.


Introduction
For the past two decades, statistical modeling with sparsity has become a popular approach to learn associations among a large number of variables.In a sparse model, only a relatively small number of parameters play an important role (Hastie, Tibshirani, and Wainwright 2015).From a substantive view point, it is easier to interpret a sparse model than a dense one.From a statistical view point, a sparse model can yield an estimator with smaller mean squared error (e.g., Knight and Fu 2000;Negahban, Ravikumar, Wainwright, and Yu 2012).Since the exact sparsity pattern of a model is generally unknown in advance, the model is often probed by a sparse estimation procedure with penalization (or regularization; e.g., Tibshirani 1996;Fan and Li 2001;Zhang 2010).Package glmnet (Friedman, Hastie, and Tibshirani 2010) and library LIBLINEAR (Fan, Chang, Hsieh, Wang, and Lin 2008) are well-known software solutions for sparse linear models with regularization.
Psychometric models mostly impose a strong sparsity assumption for identification or interpretation purpose (e.g., Thurstone 1947).Recently, several penalized likelihood (PL) based data-driven methods have been proposed to depict sparsity patterns in psychometric models.(e.g., Chen, Liu, Xu, and Ying 2015;Hirose andYamamoto 2015, 2014;Huang, Chen, and Weng 2017;Jacobucci, Grimm, and McArdle 2016;Tutz and Schauberger 2015).The present work describes an R (R Core Team 2017) package called lslx (latent structure learning Package lslx was constructed to overcome the drawbacks of existing packages for SEM with PL.The author believes that lslx has at least four advantages: (1) Model specification is relatively easy.It adopts a lavaan-like model syntax with carefully designed operators and prefixes.Through the model syntax, users can set each coefficient as free, fixed, or penalized.When the syntax is not convenient enough, built-in methods can be also used to modify the initially specified model.(2) The optimization algorithm in lslx is reliable.Motivated by the works of Friedman et al. (2010) and Yuan, Ho, and Lin (2012), lslx implemented a quasi-Newton method that conducts line-search and checks the first-order condition in every iteration to ensure optimality.Furthermore, related numerical conditions can be plotted to monitor the optimization quality.(3) lslx is reasonably fast.The implemented quasi-Newton algorithm can achieve locally superlinear convergence under suitable conditions (Yuan et al. 2012).In addition, the core of lslx is written via Rcpp (Eddelbuettel and François 2011) and RcppEigen (Bates and Eddelbuettel 2013).As we shall see in Section ??, lslx is significantly faster than both lsl and regsem.(4) lslx has several additional functionalities.Like usual SEM packages, lslx provides formal statistical test results, including tests for goodness-of-fit and coefficients.Besides, lslx handles missing data via a two-stage method with auxiliary variables (Savalei and Bentler 2009), and conducts multi-group analysis with a reparameterized SEM formulation (Huang 2018).This paper is organized as follows.Section 2 describes the PL method for semi-confirmatory SEM.In Section 3, a quasi-Newton algorithm for optimizing the PL criterion is introduced.Section 4 demonstrates how to implement the semi-confirmatory SEM with lslx.In Section 5, advanced functionalities for lslx are described, including a two-stage method for missing data and multi-group SEM analysis.Finally, merits, limitations, and further directions concerning lslx are discussed.

Semi-confirmatory SEM and PL
In this section, the SEM formulation adopted for lslx and the theory of a semi-confirmatory SEM via PL are described.

SEM and theory testing
Let η denote a (P + M )-dimensional random column vector, which we partition into a Pdimensional observable response y and an M -dimensional latent factor f , that is, η ⊤ = (y ⊤ , f ⊤ ).In lslx, the following SEM model formulation is adopted where α is a (P + M )-dimensional intercept, B is a (P + M ) × (P + M ) regression coefficient matrix, and ζ is a (P + M )-dimensional random residual with zero mean and covariance matrix Φ.Let θ denote a Q-dimensional parameter vector with general component θ q .The parameter vector contains the non-trivial elements from α, B, and Φ.Under the assumption that (I − B) −1 exists, the model-implied mean and covariance of y can be written as where I is the identity matrix and G is a selection matrix such that y = Gη.Equation 1 and 2 can be understood as the RAM formulation (McArdle and McDonald 1984) covering the well-known Jöreskog-Keesling-Wiley model (Keesling 1972;Jöreskog 1973;Wiley 1973) and the Bentler-Weeks model (Bentler and Weeks 1980).Many statistical techniques can be represented in this framework, including linear regression with random covariates, path analysis, factor analysis, and growth curve models.Note that lslx users are not required to understand how Equation 1 and 2 represent any of these models.They only need to correctly specify the relationship between y and f via operators and variables (see Section 4.1 for model syntax).
The aim of a SEM analysis is to verify if there exists a θ * such that the population mean and the covariance of y are closely represented by the implied moments, i.e., µ ≈ µ(θ * ) and Σ ≈ Σ(θ * ).Because µ, Σ, and θ * are unknown population quantities, their sample counterparts m, S, and θ are considered for actual calculations.Given a random sample Y = {y n } N n=1 of size N , the most commonly used estimation procedure is to minimize the maximum likelihood (ML) loss function (3) The plausibility of the specified model can be evaluated by the likelihood ratio (LR) test on N D( θ) and by many other goodness-of-fit indices (see West, Taylor, and Wu 2012, for a review).If the specified model doesn't fit data well, the model should be abandoned.
In practice, an initially specified model is rarely abandoned immediately.Jöreskog (1993) distinguished three situations for applying SEM: strict theory confirmation, model comparison, and model generation.He argued that model generation is the most common case.Under model generation, users successively improve the initially specified model via modification indices (e.g., Sörbom 1989) or other strategies.Discussing whether a confirmatory or exploratory study is more appropriate is beyond the scope of this paper.From the author's perspective, however, several instances of SEM analysis are both confirmatory and exploratory.On the one hand, the analyst aims to test the core hypotheses in the specified model, on the other hand, the analyst seeks an optimal pattern for the exploratory part to avoid the price of model misspecification (e.g., Bentler and Mooijaart 1989;Yuan, Marshall, and Bentler 2003).The author's preference is best called a semi-confirmatory approach.Huang et al. (2017) proposed a semi-confirmatory method for SEM via PL.In this method, a SEM model contains two parts: a confirmatory part and an exploratory part.The confirmatory part includes all freely estimated parameters and fixed parameters that are allowed for theory testing.The exploratory part is composed of a set of penalized parameters to describe relations that cannot be determined by existing substantive theory.By implementing a sparsity-inducing penalty and choosing an optimal penalty level, the relationships in the exploratory part can be efficiently determined.This semi-confirmatory method can be seen as a methodology that links the traditional SEM with the exploratory SEM (Asparouhov and Muthén 2009).

PL for semi-confirmatory SEM
To conduct the semi-confirmatory approach for SEM, lslx considers the following optimization problem min where U(θ, λ) is a PL objective function with the form R(θ, λ) is a penalty term (or regularizer), and λ > 0 is a regularization parameter to control the penalty level.In particular, the penalty term can be written as R(θ, λ) = Q q=1 c θq ρ(|θ q |, λ) with ρ(|θ q |, λ) being a penalty function and c θq ∈ {0, 1} being an indicator to show whether θ q is penalized.The first version of lslx implements the minimax concave penalty (MCP; Zhang 2010) where δ is a parameter to control the convexity of MCP.The MCP produces a sparse estimator, and has many good theoretical properties (see Mazumder, Friedman, and Hastie 2011;Zhang 2010).If δ → ∞, the MCP reduces to the case of L 1 penalty or LASSO (least absolute shrinkage and selection operator; Tibshirani 1996).On the other hand, a small value of δ makes the MCP behave like hard thresholding.In linear regression with standardized covariates, δ must be larger than one.However, when incorporating MCP in SEM, the lower bound of δ depends on the Hessian matrix of the ML loss function (see Section 3.2).After version 0.6.6,lslx can also implement elastic net (EN;Zou and Hastie 2005) where δ ∈ [0, 1] is a parameter that controls the relative importances of the L 1 and L 2 penalties.If δ = 1, the EN reduces to the LASSO.Conversely, δ = 0 makes the EN equivalent to the ridge (Hoerl and Kennard 1970).Note that ridge penalty cannot result in a sparse estimate.
Given a penalty level λ and a convexity level δ, a PL estimator θ ≡ θ(λ, δ) is defined as a solution of Equation 4. In practice, an optimal pair of (λ, δ), denoted by ( λ, δ), is often selected by minimizing an information criterion (or cross-validation error) over Λ × ∆, where Λ and ∆ are two user-defined sets formed by λ 1 < λ 2 < ... < λ K and δ 1 < δ 2 < ... < δ L , respectively.lslx utilizes several information criteria that can be written as where C N is a sequence that depends on sample size N and df ( θ) is the degrees of freedom.Leeb and Pötscher 2006;Pötscher 1991).An exception is if the procedure can yield an oracle estimator (i.e., an estimator performs as well as if the true sparsity pattern is known in advance), the associated statistical inferences become valid.It has been shown that PL with MCP and BIC selector asymptotically results in an oracle estimator, both theoretically and empirically (Huang et al. 2017).Despite this, the oracle property might not hold under small sample size scenarios.Users should be cautious about the hypothesis testing and confidence interval results for the penalized parameters.
Algorithm 1: Semi-confirmatory structural equation modeling via penalized likelihood.Model specification: determine which elements in α, B, and Φ should be free, fixed, or penalized.
The overall procedure of semi-confirmatory SEM via PL is shown in Algorithm 1.

Optimization algorithm
In lslx, a quasi-Newton method is implemented to solve the problem in Equation 4. The method is established based on Yuan et al. (2012) who modified the coordinate descent algorithm in glmnet (Friedman et al. 2010) to ensure its convergence for L 1 -penalized logistic regression.This section describes how this modified algorithm can be extended to minimize a PL criterion with MCP for SEM.Roughly speaking, the algorithm consists of outer iterations and inner iterations.The implementation of the quasi-Newton method is summarized in Algorithm 2. For the procedure to optimize the PL criterion with EN penalty, please see Appendix B.

Outer iteration
Let θ(t) ≡ θ(t) (λ, δ) denote the estimate at the t th outer iteration under λ and δ.Suppose a corresponding quasi-Newton direction d is available.The outer iteration aims to find a step size ŝ such that the updated estimate θ(t+1) = θ(t) + ŝ × d satisfies where 0 < c Armijo < 1 is a specified Armijo's constant.With some given 0 < s < 1, lslx adopts ŝ as the largest element in {s j } J j=0 such that Equation 9 is satisfied.According to the first-order optimality condition, a PL estimate θ should satisfy where sign(•) is a function that extracts the sign of a number.Note that U(θ, λ) is not differentiable in the usual sense if θ q = 0 for some c θq = 1.∂U ( θ,λ)   ∂θq actually represents the q th component of the sub-gradient of U(θ, λ) evaluated at θ.In lslx, the outer iteration stops if the following condition is satisfied.
where ǫ out > 0 is a specified tolerance for convergence of the outer loop.Let vech(•) denote an operator that stacks non-duplicated elements of a symmetric matrix.The sub-gradient can be obtained by and where with σ(θ) = vech(Σ(θ)) and D P being the duplication matrix of size P P × P (P + 1)/2 (see Magnus and Neudecker 1999).
The specific form of the model Jacobian ∂τ (θ) ∂θ ⊤ can be found in Bentler and Weeks (1980) and in Neudecker and Satorra (1991).
The success of the outer iteration relies on a good starting value θ(0) .In lslx, if θ(λ k , δ l ) is computed after deriving a PL estimate in the neighborhood of (λ k , δ l ), θ(0) (λ k , δ l ) will be set as θ(λ k , δ l+1 ) or θ(λ k−1 , δ l ) for a warm start.The warm start approach makes θ(λ k , δ l ) readily available (see also Friedman et al. 2010).If no appropriate PL estimate is available, a default θ(0) is calculated by the method of McDonald and Hartmann (1992).The author's experience shows that this method works well if the scales of the response variables are similar, and the variances of the latent factors are approximately one.

Inner iteration
Consider the following quadratic approximation for U( θ(t) + d, λ) − U( θ(t) , λ) where g (t) and H (t) denote the gradient and the Hessian matrix (or an approximation thereof ) of D( θ(t) ), respectively.The inner iteration looks for a quasi-Newton direction d by minimizing Because the quadratic approximation is not differentiable at θ when θ q = 0 for some c θq = 1, the proposed quasi-Newton method minimizes Equation 14 by a coordinate descent method.Let d(r+(q−1)/Q) denote the estimated direction at inner iteration r + (q − 1)/Q.The inner iteration updates d(r+(q−1)/Q) by d(r+(q−1)/Q) + ẑq × e q with an appropriate step size of ẑq , where e q is the q th vector in the standard basis.
The step size of ẑq can be obtained by solving the following univariate problem where with a = g qq , and c = θ(t) q + d(r+(q−1)/Q) q .Under MCP penalty, the solution of Equation 15is If no penalty is imposed for θ q , ẑq is simply −a b .It should be noted that the problem in Equation 16is convex if and only if b − 1 δ > 0. For b − 1 δ ≤ 0, the proposed algorithm may fail.As the value of b is generally unknown in advance, it is better to start with larger values of δ.
Let ẑ1 , ẑ2 , ..., ẑQ denote the obtained directions for some inner iteration.In lslx, the inner looping stops if max q where ǫ in > 0 is a specified tolerance for the inner iteration.The implementation of the inner iteration relies on the choice of H (t) .The exact Hessian is generally too expensive to be calculated.A natural choice for replacement is the expected Hessian (or the expected Fisher information matrix) which results in a Fisher scoring-type algorithm.A simple alternative approximation is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, which yields a BFGS type algorithm.
Based on the results of the numerical comparison in Section ??, the two choices for H (t) yield similar performances in terms of computation time.

lslx package
This section describes how to specify a model and conduct semi-confirmatory SEM under lslx.The main function lslx is a class generator to initialize an lslx object.Object lslx is constructed via R6 system (Chang 2017), adopting encapsulation object-oriented programming style.Despite that commonly used S3 or S4 system is also capable of constructing an equivalent object, the author adopts R6 for mainly two reasons.First, R6 methods belong to objects, which means that it is straightforward to create many small functions to manipulate objects under R6.Second, the mutability of R6 allows us to develop methods to modify the initially generated object without returning a new one.It is particularly useful when a user hope to modify an initially specified model by freeing, fixing, or penalizing elements in some block of a coefficient matrix.As we shall see later, lslx objects can be flexibly manipulated through many build-in methods with a consistent naming scheme.
Algorithm 2: Quasi-Newton method for solving penalized likelihood with MCP.
if max q H (t) In the simplest case, the use of lslx object involves three major steps 1. Initialize a new lslx object by specifying a model and a data set.r6_lslx <-lslx$new(model, data) 2. Fit the specified model to the data with given fitting options.

r6_lslx$summarize(selector)
By default, lslx implements MCP as penalty function (penalty_method = "mcp") with Λ and ∆ constructed by the method described in Section 4.4 and Appendix C. The other arguments are all required to be specified (i.e., no default values).After an lslx object is initialized, a print method can be used (e.g., r6_lslx$print()) to hint on how the object can be further manipulated.

Model syntax
The model syntax in lslx is highly motivated by lavaan (Rosseel 2012).There the relationships among observed responses and latent factors are specified via an equation-like syntax.To demonstrate the model syntax, consider the following example for a regression model with one dependent variable (y) and four covariates (x1 -x4)

R> model_reg <-"y <= x1 + x2 + y <~x3 + x4"
Here, the operator <= indicates that the regression coefficients from the right-hand side (RHS) variables should be set as freely estimated parameters.On the other hand, <~makes the coefficients from the RHS variables to be estimated with penalization.The distinction between <= and <~in lslx can help users set the types for each coefficient.The example model estimates coefficients from x1 and x2 to y freely, and it sets coefficients from x3 and x4 to y as penalized.
In addition, the model implicitly sets the following coefficients to be free: (1) the variances of x1 -x4 and the residual variance of y; (2) the covariances among x1 -x4; and (3) the intercepts of x1 -x4 and y.
Through the previous example, we can see that in lslx a model can be formulated by simply classifying the (parameter) types for the regression coefficients.Associated variances, covariances, and intercepts will be automatically determined according to the following rules.
1.The variances of exogenous variables and the residuals of endogenous variables will be set as freely estimated coefficients.The variance coefficients can be classified by the operators <=> or <~>.For example, the variance of the residual of y can be explicitly set as free via y <=> y.To penalize the variance, we can use y <~> y.Note that it is rare to treat variances as penalized.
2. The covariances among all the exogenous variables will be freely estimated.The covariances can be also classified through <=> or <~>.For example, the covariance among x1 -x4 can be set as free by 3. The intercepts of all observed responses will be treated as free.The types of intercepts can be declared via a directed operator with intercept variable 1 on the undirected side.For example, the freely estimated intercepts can be specified with y + x1 + x2 + x3 + x4 <= 1.Note that once the intercept variable 1 appears in the specified model, the intercept for every endogenous response must be declared.Otherwise, it will be set as zero by default.
lslx always includes a mean structure in the specified model because it helps researchers to interpret the values of estimates under their original scales.Note that the inclusion of the default saturated mean structure will not alter the fitting result made by SEM with only covariance structures.
The specified regression model can be represented in several equivalent ways.For example, it is possible to use reverse operators In lslx, all directed operators can be reversed.Another way to declare parameter type is using prefix operators
Here, pen() and free() are both prefixes.pen() makes the corresponding coefficient penalized and free() forces it to be freely estimated.A starting value can be placed inside the parenthesis of a prefix.fix() is another important prefix that fixes the coefficient at the specified value in the parenthesis.Note that prefixes can stand on either side of the operand.However, any prefix cannot simultaneously appear on both sides of the operand, which may result in an ambiguous specification.In lslx, operators :=> and :~> (or their reversed counterparts <=: and <~:) are used to define latent factors.Operator :=> sets loadings to be freely estimated and :~> makes them penalized.The above factor model is depicted by the path diagram in Figure 1.All factor loadings are estimated.Some of them are freely estimated (solid line arrow) and some of them are penalized (broken line arrow).The specification says that each response variable is mainly an indicator of some latent factor (represented by the freely estimated loadings).However, the possibility of cross-loadings is not excluded (explored through the penalized loadings).The relationships among latent factors can also be specified via =>, ~>, <=>, and <~> (or their possibly reverse versions) according to the syntax introduced earlier.Because the covariances among exogenous variables will be automatically set as free, they do not have to be explicitly stated here.Only the variances of latent factors are declared as fixed for scale setting.Note that lslx will not automatically fix appropriate loadings or variances to set the scales of latent factors.Users must do this job by their own for the purpose of clarity.
lslx also accepts basic lavaan operators, including ~, =~, and ~~.For example, the previous factor model can be equivalently respecified as The model parser automatically replaces lavaan operators ~, =~, and ~~by <=, :=>, and <=>, respectively.In addition, a numeric prefix makes the corresponding parameter to be fixed at the specified value.

$new(), $fit(), and $summarize() methods
To conduct a semi-confirmatory factor analysis, an lslx object can be initialized via the $new() method Here, an lslx object called lslx_fa is created with the previously specified model_fa and the data set HolzingerSwineford1939 in lavaan.The data argument must be supplied with a data.framewith column names that match the names of the response variables specified in the model argument.If an lslx object is successfully initialized, the names of the response variables and the latent factors will be displayed by the shell.To suppress the displaying of feedback, one can use verbose = FALSE.lslx also supports initialization through importing sample moments.In that case, arguments sample_cov and sample_size must be supplied (possibly sample_mean).
To fit the specified model to the imported data, the $fit() method can be used R> lslx_fa$fit(penalty_method = "mcp", + lambda_grid = seq(.01,.60,.01),delta_grid = c(1.5,3.0, Inf)) CONGRATS: Algorithm converges under EVERY specified penalty level.Specified Tolerance for Convergence: 0.001 Specified Maximal Number of Iterations: 100 The $fit() method requires users to mainly specify three arguments: penalty_method, lambda_grid, and delta_grid.The penalty_method argument is used to specify the penalty function by values of "none", "lasso", "ridge", "elastic_net" (for EN), or "mcp" (for MCP).The lambda_grid and delta_grid arguments are designed to set the penalty level and the shape of penalty function, respectively.The above sample code implements Algorithm 2 to compute PL estimates with MCP under Λ × ∆ = {.01,.02,..., .60}× {1.5, 3.0, ∞}.By default, a message will be printed to show whether the optimization algorithm converges across all penalty and convexity levels.The fitting results are stored in lslx_fa and can be later manipulated by other built-in methods of lslx class.
After deriving the fitting results, the easiest way to probe their content is to implement the $summarize() method with a selector specified in argument selector R> lslx_fa$summarize(selector = "bic", interval = FALSE) The result shown is for fitting the model under ( λ, δ) = (.14, 1.5) selected by BIC.The confidence intervals of the coefficients were not printed because of interval = FALSE.The fit indices show that the final model fits the data reasonably well.Despite that 18 penalized loadings were estimated, only four of them were identified as non-zero, including x5<-visual, x7<-visual, x9<-visual, and x1<-textual.By default, lslx always implements robust statistical inferences provided that the raw data is available.This includes the mean-adjusted LR test (Satorra and Bentler 1994), the mean-adjusted RMSEA (root mean square error of approximation) intervals (Brosseau-Liard, Savalei, and Li 2012;Li and Bentler 2006), and the sandwich standard errors for coefficients (e.g., Browne 1984;Yuan and Hayashi 2006).
However, these inference results should be cautiously interpreted because the final model is determined by some selection process.
An interesting property of PL is that a sufficiently large λ can shrink all the penalized parameters to be zero.For example, lslx_fa$summarize(lambda = 0.6, delta = Inf) shows that all the penalized loadings are exactly zero, which coincides with the confirmatory factor analysis (CFA) model often used to fit HolzingerSwineford1939 (e.g., the example of cfa() in lavaan).In fact, this PL fitting result is numerically identical to the CFA made by lavaan (e.g., the same value of LR statistic).On the other hand, a very small λ can yield a result similar to an exploratory factor analysis.This is why the PL can be seen as a methodology bridging the traditional SEM and the exploratory SEM.

Other build-in methods
In lslx, there are several built-in methods to adjust the inner status and manipulate the fitting results of an lslx object.These methods are listed in Table 1 and 2. From the viewpoint of data analysis, the set-related methods and plot-related methods are the most relevant.
The set-related methods are designed to alter the initially specified model.To penalize coefficients by name (with given starting values), we may use $penalize_coefficient().In lslx, every coefficient name is constructed by variable names appended by <-/<->, where <and <-> describe a directed and an undirected effect, respectively.For example, x1<-visual is the name for the regression coefficient (or loading) derived from visual to x1.Note that <=:/<~: are not used for naming coefficients and <-cannot be reversed.However, penalizing many coefficients by their names may be tedious.Instead, we can use $penalize_block() to penalize coefficients by type in a given block.A block is formulated by y/f/1 and <-/<->, where y stands for response variables, f denotes latent factors, and 1 represents the intercept variable.For example, the block formulated by y<-1 contains all intercepts of the response variables.In the illustrated examples in Section 5, we will show how to use set-related methods to quickly modify the initially specified model.The plot-related methods can be used to plot the fitting results stored in an lslx object.For example, the $plot_numerical_condition() method displays the numerical conditions for assessing the quality of optimization.It can be called by

R> lslx_fa$plot_numerical_condition()
Figure 2 displays how the number of iterations, the maximum element of absolute subgradient, and the minimum diagonal element of approximated Hessian change by penalty level and convexity level.We can see that the algorithm converges within a few iterations under all specified penalty and convexity levels, and there are no non-convexity problems (indicated by positive values of all objective Hessian convexity).Note that the non-convexity problem is detected via an approximated Hessian used in the optimization algorithm, not the exact one.Another useful plotting method is $plot_coefficient(), which draws solution paths.
The following code plots the solution paths of coefficients in the block of y<-f, i.e., the factor loadings.

R> lslx_fa$plot_coefficient(block = "y<-f")
In Figure 3, we can observe how the values of loadings change by penalty level and convexity level.Under δ = 1.5, MCP shrinks the values of estimates sharply.On the contrary, infinite δ yields relatively smooth solution paths.

Practical guidelines
This subsection discusses several practical issues when using lslx, including the check of model identification, the initialization of Λ and ∆, and the choice of selectors.
The first issue is about model identification.In a semi-confirmatory analysis, the specified model is sometimes not identified under the usual SEM framework (e.g., model_fa).However, PL can still estimate it because the penalty term introduces additional constraints on the penalized parameters.For example, with LASSO the optimization problem in Equation 4can be equivalently represented by min θ D(θ) such that Q q=1 c θq |θ q | ≤ γ for some γ > 0 (see Tibshirani 1996).In fact, PL is often implemented as a solution to overcome the P > N problem (underidentified model) in regression analysis.Despite there is no general rule to ensure the identifiability before PL fitting, it can be checked empirically.Motivated by Shapiro and Browne (1983), the local identifiability of the selected model can be checked by examining whether the smallest singular value of ∂τ ( θ)  ∂ϑ ⊤ is numerically larger than zero (see also Huang et al. 2017), where θ is a vector formed by the freely estimated and penalized non-zero elements of θ, the so-called effective elements of θ.For example, our BIC selected factor model can be checked by

R> moment_jacobian <-lslx_fa$extract_moment_jacobian( + selector = "bic", type = "effective") R> min(svd(moment_jacobian)$d)
[1] 0.211 Because the value is evidently larger than zero, we conclude that the selected model is at least locally identified.Note that by default $extract_moment_jacobian() returns the whole model Jacobian matrix.To only extract the Jacobian with respect to the effective parameters, type = "effective" should be used.The second issue is how to initialize Λ and ∆.Despite that lslx automatically initializes them by setting lambda_grid = "default" and delta_grid = "default", the discussion below can help users to understand how lslx works.In linear regression, Λ is often initialized by (1) setting λ 1 as a small number (e.g., λ 1 ≈ log(N )/N in lslx) and λ K as a minimal value that shrinks all the penalized parameters to be zero; (2) constructing K values decreasing from λ K to λ 1 on the log scale (e.g., Friedman et al. 2010).However, making all penalized parameters to be zero is unnecessary in practice.Let t denote a specified upper bound such that λ K can shrink any standardized and unpenalized | θ * q | ≤ t to be zero.Based on the rationale described in Appendix C, λ K can be loosely approximated by where σ min and σ max are the maximum and minimum standard deviation of both response variables and latent factors, respectively, and r 2 max is the largest coefficient of determination for endogenous variables.For example, by using t = 0.3, r 2 max = 0.6, and σ max = σ min = 1, we have λ K ≈ 0.75.To initialize ∆, it should be known that a small δ may result in a non-convexity problem and a too large δ suffers from the problem of biased estimation.A loose approximation for δ 1 , the smallest element of ∆, is where r 2 min is the smallest coefficient of determination for endogenous variables.The largest element δ L is often set as infinity to obtain a LASSO solution, which is used as a warm start for calculating θ under a smaller δ (see Mazumder et al. 2011).In practice, too small values of δ often make problematic fitting results (e.g., non-convergence or non-convexity of approximated Hessian matrix).By default, lslx will not use them in $summarize() or other methods relying on fitting results.To include these problematic results, users should set include_faulty = TRUE, but it is generally not recommended.The third issue is the choice of selectors.The information criteria in lslx can be distinguished into three types: AIC-type (AIC and AIC3), BIC-type (BIC and CAIC), and mixed-type (HBIC and ABIC).By their relative orders of C N , it is expected that BIC-type is the most conservative (i.e., it results in the sparsest estimate with the price of lower goodness-of-fit), followed by mixed-type, and then AIC-type which tends to choose a relatively complex model.In theory, a BIC-type criterion asymptotically choose a quasi-true model with probability one (e.g., Huang 2017a).However, under small sample sizes or weak signals, an AIC-type criterion generally yields better selection results (e.g., Vrieze 2012).The behavior of a mixed-type criterion is also mixed.It performs close to AIC under small sizes and becomes similar to BIC asymptotically.Despite a mixed-type criterion might not outperform its competitors in the home field of AIC or BIC (e.g., small sample size or strong signal settings), its overall performance across different conditions is good (e.g., Lin, Huang, and Weng 2017).If users don't have strong arguments to use an AIC-type or a BIC-type criterion, the author would recommend employing a mixed-type one.

Advanced functionality
In this section, two advanced functions of lslx are described: the two-stage method with auxiliary variables for handling missing data and the reparameterized multi-group SEM to explore population heterogeneity.

Two-stage method for missing data
When conducting SEM, it is likely to encounter missing values.In lslx, missing values can be handled by the listwise deletion (LD) method and the two-stage (TS) method (Yuan and Bentler 2000).LD only uses complete observations for further analysis.If the mechanism is missing completely at random (MCAR; Rubin 1976), LD can yield a consistent estimator.However, LD suffers from loss of efficiency because the dropped incomplete cases still carry information for estimation.On the other hand, TS first minimizes the likelihood based on all available observations to calculate saturated moments.The obtained moment estimates are used for subsequent SEM analysis.Under the assumption of missing at random (MAR; Rubin 1976), TS is consistent.In addition, the standard errors of coefficients can be consistently estimated (e.g., Yuan and Lu 2008).Compared with LD, TS is generally valid (in terms of consistency) and more efficient (with respect to mean squared error), thus lslx sets TS as the default.The current version also supports the inclusion of auxiliary variables to make the MAR assumption more plausible (Savalei and Bentler 2009).Specifically, let Y o = {y o n } N n=1 denote an observed random sample with y o n being the observed part of y n .The first stage of TS aims to estimate the saturated mean µ(τ ) and covariance matrix Σ(τ ) based on Y o , where τ ⊤ = (µ ⊤ , σ ⊤ ) is a saturated moment vector with σ = vech(Σ).To obtain τ , the TS method maximizes the following function where µ n (τ ) and Σ n (τ ) are the saturated mean and covariance structure for y o n .Equation 22 can be optimized by using an expectation-maximization (EM) algorithm (Dempster, Laird, and Rubin 1977).In the second stage of TS, Equation 4 is solved using Algorithm 2 with m and S replaced by µ(τ ) and Σ(τ ), respectively.
One may ask why lslx doesn't implement the so-called full-information (FI) approach to handle missing values (Enders and Bandalos 2001).The main reason is that the TS method is efficient in terms of computation time.The additional cost introduced by TS is only for calculating τ in the first-stage.In contrast, the FI approach requires an expectation step before each outer iteration (Jamshidian and Bentler 1999).Additionally, current evidence shows that the FI approach has no particular advantage over TS, both theoretically (Yuan and Bentler 2000) and empirically (Savalei and Bentler 2009;Savalei and Falk 2014).Now, we demonstrate how to use lslx to conduct TS using the data from Holzinger and Swineford (1939)  In this example, "ageyr" and "agemo" are set as auxiliary variables.Since the CFA model may not fit the data well due to its independent cluster structure for loadings, motivated by Bayesian SEM (Muthén and Asparouhov 2012), a correlated residuals structure is considered.In fact, PL can be also seen as a maximum of a posteriori (MAP) method under a Bayesian framework (see Meng 2008;Strawderman, Wells, and Schifano 2013).The $penalize_block() method can be used to penalize coefficients in a specified block by type R> lslx_miss$penalize_block(block = "y<->y", type = "fixed", verbose = FALSE) This code sets all coefficients with type = "fixed" in block = "y<->y" as penalized.Despite that such a model is not identified under the usual SEM framework, PL can still estimate it.The CFA model with correlated residuals is fitted to the data via the $fit_lasso() method, which is a convenient wrapper for $fit() with penalty_method = "lasso"

R> lslx_miss$fit_lasso(verbose = FALSE)
By default, lslx implements the TS method for handling missing data.It can be explicitly set by missing_method = "two_stage".If any auxiliary variable is specified when initializing an lslx object, the variable will be included to estimate the saturated moments.Finally, the robust AIC is utilized to select an optimal penalty level R> lslx_miss$summarize(selector = "raic", style = "minimal") From the summary with style = "minimal", we can see that 11 of the 36 penalized coefficients are identified as non-zero.To see which residual covariances are non-zero, the $extract_coefficient_matrix() method can be used.
R> lslx_miss$extract_fit_index(selector = "raic") rmsea cfi nnfi srmr 0.0249 0.9971 0.9921 0.0316 In general, the author doesn't recommend the use of the correlated residuals structure because the specified model is too exploratory and the resulting model may be difficult to interpret.

Multi-group SEM analysis
Multi-group SEM (MGSEM) is often used to examine the heterogeneity of model coefficients across several populations (Jöreskog 1971;Sörbom 1974).Suppose there are G populations sharing common moment structures µ(•) and Σ(•) but having possibly different values for their model parameter θ g .Let θ gq denote the q th component of θ g .Without loss of generality, we assume that θ 1q , θ 2q , ..., θ Gq represent the same element selected from α, B, or Φ.Hence, a coefficient θ gq is said to be homogeneous across the G populations if θ 1q = θ 2q = ... = θ Gq .Otherwise, we call θ gq heterogeneous.
Given a multi-group random sample Y = {{y gn } Ng n=1 } G g=1 , ML estimation tries to obtain θ1 , θ2 , ..., θG by minimizing the following multi-group loss function The current version of lslx cannot impose equality constraints on the model parameters.Therefore, it may seem that it is incapable of examining coefficient homogeneity/heterogeneity.However, by using a reparameterized MGSEM with penalization, lslx can still explore homogeneity/heterogeneity patterns (see Huang 2018).In lslx, each group parameter θ g is parameterized as where θ is called the reference component and θ g is called the increment component of group g.The meaning of θ and θ g relies on the choice of the reference group.When there is no reference group, i.e., θ = 0, θ g and θ g are equivalent.If group j is set as reference, θ j will be set as zero and θ g will represent the difference of θ g and θ j , i.e., θ g = θ g − θ j .Under this setting, θ gq is homogeneous if and only if θ gq = 0 for g = 1, 2, ..., G (g = j).Therefore, we can examine the homogeneity/heterogeneity of θ gq by exploring the sparsity pattern of θ 1q , θ 2q , ..., θ Gq .
In lslx, MGSEM analysis is implemented by minimizing a PL criterion composed by the multi-group loss function in Equation 23 and a penalty term as follows This multi-group optimization problem can also be solved by Algorithm 2. After that, the PL estimates over Λ × ∆ are derived, and an optimal pair of (λ, δ) can be chosen by minimizing some information criterion in Equation 8.In general, the implementation of semiconfirmatory MGSEM is similar to the single-group case except for the emphasis on the homogeneity/heterogeneity of the coefficients across groups.
The following example shows how to use lslx to examine strong factorial invariance via MCP.
A possible initialization for this purpose is Here, c(fix(0), fix(1)) is a vectorized prefix that sets the corresponding coefficients to zero and one, respectively.In this example, the first group is "Grant-White" and the second is "Pasteur".This order corresponds to the sort() result for the group names.Note that the coefficients are set to 1 in "Pasteur" since "Pasteur" is the reference group.For "Grant-White", the increment component should be restricted to 0 to make the corresponding loadings equal to 1 as desired.If c(fix(0), fix(1)) is replaced by fix(1), the lslx parser will still interpret fix(1) as c(fix(0), fix( 1)).Of course, if the reference_group argument is not specified, fix(1) will be interpreted as c(fix(1), fix(1)), i.e., two corresponding loadings will be set to 1.
So far, the model specification is still not complete.A measurement satisfies the condition of strong factorial invariance if all loadings and intercepts are homogeneous across the considered groups (Meredith 1993).By default, lslx freely estimates all increment components in "Grant-White".To penalize some of them, we can use the $penalize_heterogeneity() method R> lslx_mgfa$penalize_heterogeneity(block = c("y<-f", "y<-1"), + group = "Grant-White", verbose = FALSE) The code penalizes every coefficient belonging to block "y<-f" and "y<-1" in "Grant-White" according to its reference component.Since restrictions for loadings and intercepts are imposed, the intercepts of latent factors in "Grant-White" can be safely estimated R> lslx_mgfa$free_block(block = "f<-1", + group = "Grant-White", verbose = FALSE) To understand how to impose minimal constraints for identification in multi-group factor analysis, refer to Millsap (2011).The specified model is now fitted with MCP through the $fit_mcp() method, a wrapper for $fit() with penalty_method = "mcp"

R> lslx_mgfa$fit_mcp(verbose = FALSE)
Finally, we display the values of loadings and intercepts to evaluate whether they are invariant under the penalty and convexity level selected by HBIC R> loading <-lslx_mgfa$extract_coefficient_matrix( + selector = "hbic", block = "y<-f") R> intercept <-lslx_mgfa$extract_coefficient_matrix( + selector = "hbic", block = "y<-1") R> loading$"Grant-White" -loading$"Pasteur" The result shows that the loadings are invariant across the two schools.However, the intercepts of x3 and x7 are different.We conclude that the condition of strong factorial invariance is violated.The measurement only satisfies the condition of weak factorial invariance (Meredith 1993), i.e., only loadings are homogeneous across the two schools.

Conclusions
In this work, an R package lslx is described for semi-confirmatory structural equation modeling (SEM) via penalized likelihood (PL).The package implements a quasi-Newton method to optimize the PL criterion with either minimax concave penalty (MCP) or elastic-net (EN).To ensure the optimality of the obtained solution, the algorithm checks the first-order condition in each outer iteration.Our experiences show that lslx can reliably and efficiently find PL estimates.
Package lslx adopts a lavaan-like syntax for model specification.The current version also offers an S3 interface for using lslx objects, including a wrapper function plsem() and related S3 methods.The author believes that most lavaan users can easily learn to use lslx.The semi-confirmatory SEM is most appropriate when limited substantive theory is available for model specification.Although lslx is not the first package for SEM with PL, it is probably the most sophiscated one in terms of usability, dependability, efficiency, and functionality.
Even though the current version of lslx can fit a wide class of SEM models, there are still limitations.(1) lslx cannot impose linear or non-linear constraints for model parameters.It is worth modifying the current algorithm to incorporate parameter constraints.(2) lslx can only handle an ordinal response by treating it as continuous.However, such an approach is only valid under limited conditions (e.g., Rhemtulla, Brosseau-Liard, and Savalei 2012).Implementing a natural PL method for ordinal SEM could enhance the applicability of the current package.(3) lslx utilizes inference methods assuming that no model selection has been conducted, which may result in inflated Type I error or confidence interval undercoverage.Recent advances in post-selection inference allow making valid inferences even after model selection (e.g., Berk, Brown, Buja, Zhang, and Zhao 2013;Lee, Sun, Sun, and Taylor 2016).
Future versions of lslx should implement these methods to control the proportion of false positive findings.

A. Standard errors and robust degrees of freedom
This appendix describes technical details of computing the standard errors and the so-called robust degrees of freedom in lslx.Let θ denote a vector formed by the freely estimated and penalized non-zero elements of PL estimate θ.The expected Fisher information matrix under θ is Similarly, the corresponding observed information is In lslx, F is computed via analytical formulas and H is obtained by numerical differentiation.By inverting F or H, normal-theory standard errors can be obtained from the diagonal elements of the inverse matrix.
By default, lslx uses a sandwich covariance matrix to construct standard errors.Let τ denote a consistent estimate of population moment vector τ such that √ N (τ − τ ) → N (0, Π).If τ is calculated by the method described in Section 5.1, Π can be estimated by where ] (e.g., Yuan and Lu 2008).In lslx, the sandwich covariance matrix is obtained by V = H −1 ∂τ ( θ) ∂ϑ ⊤ W ( θ) ΠW ( θ) Let vii denote the i th diagonal element of V. vii /N can be used as a standard error for θi , the i th element of θ.Without the presence of penalization and model selection, the use of V for two-stage estimation can be justified (Yuan and Bentler 2000).
The robust degrees of freedom is defined as the asymptotic expectation of LR statistics under null hypothesis.The expectation can be approximated by (e.g., Yuan et al. 2007).This robust degrees of freedom (or equivalent) is often used for model selection with misspecified likelihood (e.g., Konishi and Kitagawa 1996;Stone 1977;Varin and Vidoni 2005).

B. PL procedure with elastic net
To optimize the PL criterion with EN, the first-order optimality condition becomes ∂U( θ, λ) ∂θ q ≡

C. Bounds for penalty and convexity level
This appendix describes how to obtain approximated values of λ K and δ 1 for Λ and ∆ initialization.Let β ij denote the (i, j) element of B. According to the ECM algorithm for SEM with PL (Huang et al. 2017), the thresholding rule for where βij is the current unpenalized estimate and ŵβ ij is a working weight for β ij .Under uncorrelated residuals, the working weight can be written as ŵβ ij = where r 2 i is the coefficient of determination for η i .The approximation is based on c 2 j ≈ σ 2 j +µ 2 j .For a system such that all exogenous variables are centered, a loose approximation can be obtained by λ K ≈ σmax σ min (1−r 2 max ) t.By Equation 33, 1 − ŵβ ij /δ must be larger than zero.Therefore, δ 1 can be approximated by Again, without the consideration of µ j , we can loosely use δ 1 ≈ σ 2 max (1−r 2 min ) σ 2 min .
In principle, lslx initializes Λ and ∆ based on the approximations in Equations 34 and 35.When variable scales are all the same, these approximations become very simple.Hence, standardization is a good strategy to simplify the initialization problem.Note that the approximations can be further improved if exogenous and endogenous variables are distinguished.However, how to obtain good estimates for σ 2 i and r 2 i without actual model fitting is still a challenging task.

Figure 2 :
Figure 2: Maximum element of absolute sub-gradient (top), minimum diagonal element of approximated Hessian (middle), and number of iterations (bottom) across all the penalty and convexity levels.

Figure 3 :
Figure 3: Solution paths of factor loading estimates across all the penalty and convexity levels.Freely estimated and penalized loadings are represented by solid and broken lines, respectively.
gn − m g )(y gn − m g ) ⊤ , and w g = Ng N with N = G g=1 N g .To test the homogeneity/heterogeneity of parameters across groups, users may impose equality constraints on coefficients.To evaluate the appropriateness of the constraints, they can perform formal statistical tests or use goodness-of-fit indices.
current estimate for φ 2 i , and c 2 j is the j th diagonal element of C = E( 1 N N n=1 η n η ⊤ n |Y, θ).Let µ i and σ i denote the mean and the standard deviation of η i , the i th element of η.The standardized βij can be written as β * ij = σ j σ i βij .If we hope to shrink all | β * ij | ≤ t to be zero, Equation 33 indicates λ K should at least satisfy

Table 1 :
List of set-related, plot-related, and test-related methods in lslx.For details, please see the help page of lslx.

Table 2 :
List of get-related and extract-related methods in lslx.For details, please see the help page of lslx.
again.Because the original data set is complete, missing values are created according to the example of twostage() in package semTools (semTools Contributors 2016).Missingness in x5 and x9 depends on the values of x1 and age, respectively.An lslx object is initialized with a relatively parsimonious CFA model.To include auxiliary variables, the auxiliary_variable argument should be declared.
For simplicity, a commonly used independent cluster structure (i.e., each response is only influenced by one latent factor) is considered here.The model fixes the loadings of x1, x4, and x7 at one for scale setting.Argument group_variable specifies which variable should be used as group label, and reference_group determines the reference group.Note that since "Pasteur" is set as the reference group, model parameters in "Grant-White" are now increment components for representing differences.If argument reference_group is missing, the reference component will be set to zero, which is equivalent to the usual parameterization of MGSEM.By default, lslx treats all non-trivial model parameters as heterogeneous.The syntax for specifying a multi-group model is generally similar to that of the single-group model, except that a vectorized prefix can be used.An explicit way to specify the multi-group model is