covsim : An R Package for Simulating Non-Normal Data for Structural Equation Models Using Copulas

In factor analysis and structural equation modeling non-normal data simulation is traditionally performed by specifying univariate skewness and kurtosis together with the target covariance matrix. However, this leaves little control over the univariate distributions and the multivariate copula of the simulated vector. In this paper we explain how a more flexible simulation method called vine-to-anything (VITA) may be obtained from copula-based techniques, as implemented in a new R package, covsim . VITA is based on the concept of a regular vine, where bivariate copulas are coupled together into a full multivariate copula. We illustrate how to simulate continuous and ordinal data for covariance modeling, and how to use the new package discnorm to test for underlying normality in ordinal data. An introduction to copula and vine simulation is provided in the appendix.


Introduction
Structural equation modeling (SEM) and factor analysis are regularly applied to data in the psychological, educational, business, behavioral, and medical sciences. The central component in these methods is the covariance matrix from which the model parameters are identified. In this article we present software for simulating from a class of distributions with a fixed covariance matrix, which therefore can be used in SEM simulation studies. This distributional class is more flexible than the methods currently in use, and may therefore extend the range of conditions investigated with simulations.
When the examined data are continuous, the most popular SEM estimation method used is the normal-theory based maximum likelihood (NTML) method (Jöreskog 1967) which is asymptotically efficient when data are normally distributed. NTML estimation is a special case of a class of moment based estimators known as minimum discrepancy function estimators (see, e.g., Shapiro 1983), and is therefore known to be consistent also under non-normality. When using classical standard errors with NTML, valid inference is attained mainly when the data are normal. While several standard error and test statistic formulas have been proposed in order to robustify inference with NTML and other minimum discrepancy estimators under non-normality (e.g., Satorra and Bentler 1988;Wu and Lin 2016;Marcoulides, Foldnes, and Grønneberg 2020), their performance depends heavily on the distribution and sample size of the data (e.g., Curran, West, and Finch 1996;Fouladi 2000;Foldnes and Olsson 2015;Grønneberg and Foldnes 2019b). In settings with ordered-categorical data, least squares estimation based on polychoric correlations is the most prevalent estimation method (Christoffersson 1977;Muthén 1984). Polychoric correlations are essentially the correlations among continuous bivariate normally distributed vectors underlying the observed ordinal data, but may be heavily biased outside normality Grønneberg 2019b, 2022b). Hence, under both continuous and ordinal data analyses, the normality assumption is a central starting point for estimation and inference.
Unfortunately, in most empirical research situations data are seldom drawn from populations in which the normality assumption holds exactly (Micceri 1989;Cain, Zhang, and Yuan 2017). And while several estimation and inference methods that do not assume normality have been suggested (for an overview, see Tarka 2018), it is in most conditions not feasible to analytically derive results on their performance as a function of the data generating distribution. Monte Carlo simulation studies have as a result become essential tools for evaluating the behavior of various aspects of SEM techniques, such as parameter and standard error bias, performance of test statistics, and power calculations, relative to distributional characteristics (Boomsma 2013). The external validity of these studies is weakened if the chosen data generation mechanism does not resemble the real-world distributions encountered in the relevant field of practice. To be able to model such distributions, we need simulation methods that can match a given covariance matrix but still offer distributional flexibility.
The aims of the present paper are twofold. Firstly, to present the R (R Core Team 2021) package covsim (Foldnes and Grønneberg 2022a), available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=covsim, which implements nonnormal data simulation methods proposed by Foldnes and Olsson (2016) and Grønneberg and Foldnes (2017). Both methods generate data with a prescribed covariance matrix. Our emphasis will be on the method called vine-to-anything (VITA), since it offers greater flexibility that we deem particularly useful to SEM methodologists. For instance, the flexibility of VITA renders it uniquely well-suited for being employed in simulation studies with ordinal SEM, as further discussed in Section 6.2.
VITA is a simulation method based on vine copulas. Copulas are multivariate distributions with uniform marginals, and vine copulas are a special type of copulas. Since copulas, vines and multivariate simulation theory are not well known to SEM practitioners and methodologists, the second aim of the paper is to introduce these topics during our presentation of the VITA method and the covsim package. We also include a technical appendix with an elementary though mathematically complete introduction to multivariate simulation theory with vine copulas, as this seems to be missing from the literature.
We next give an overview of statistical software for drawing data from non-normal multivariate distributions with a predefined covariance matrix. The classical and still most frequently used approach is that of Vale and Maurelli (1983), where the user specifies the univariate skewness and kurtosis. This method is currently the only option in popular commercial software such as EQS (Bentler 2006) and LISREL (Jöreskog and Sörbom 2006), and in the widely used R package lavaan (Rosseel 2012). Other approaches that also focus on controlling moments have recently been proposed. The independent generator approach proposed by Foldnes and Olsson (2016) can match pre-specified univariate skewness and kurtosis, and is more flexible than the Vale-Maurelli method. This method is available in the rIG() function in package covsim, and its use is described in a later section. Recently Qu, Liu, and Zhang (2019) used independent generator variables in a method which controls multivariate skewness and kurtosis, at the expense of control over univariate skewness and kurtosis. This method is available in package mnonr (Qu and Zhang 2020). A method that fully controls the univariate distributions (not only the lower-order moments) is the NORTA method of Cario and Nelson (1997), which is implemented in package SimCorMultRes (Touloumis 2016). The Vale-Maurelli, independent generator and NORTA approaches have the great benefit of being technically easy to analyze and implement. For instance, the technical tractability allows the asymptotic covariance matrix of the empirical covariances to be exactly calculated . The simplicity of the methods also allows for fast simulation. However, the simplicity and speed of these methods come at a cost: NORTA always has a normal copula (Cario and Nelson 1997), while Vale-Maurelli in most cases has a normal copula (Foldnes and Grønneberg 2015). This means that the true multivariate dependence structure does not depart from that of the multivariate normal distribution. In addition, only NORTA completely controls the univariate marginals. To the best of our knowledge, besides the approach taken in the present article, there is only one method that offers some control of the copula when simulating from distributions with a given covariance matrix. Mair, Satorra, and Bentler (2012) proposed a two-stage data-generation process where a very large sample is first simulated from a copula combined with marginal specification, whose distribution we denote by F pre . Then the inverse of a square root of the sample covariance matrix from this large sample is computed. To simulate data, in the second stage a sample of desired size is drawn from F pre , and multiplied by first the inverse of the square root matrix from the previous stage and then by a square root matrix of the target covariance. The two-stage approach guarantees that the rows are independent and identically distributed, and it follows from construction that the simulated vector has the correct population covariance matrix. Also, the simulated vector has a nonnormal copula, provided F pre was chosen to have a non-normal copula. However, both the margins and the copula are distorted by the post-multiplication of square root matrices. That is, although F pre is fully specified in terms of multivariate copula and univariate distributions, the simulated vector does not inherit this copula nor the margins and control is lost both in terms of copula and marginal distributions. Mair et al. (2012) illustrated their code using common multivariate copula families using Gumbel and Clayton copulas, as implemented in package copula (Hofert, Kojadinovic, Maechler, and Yan 2020). An implementation of this method is available in package simsem (Pornprasertmanit, Miller, Schoemann, and Jorgensen 2021). The flexibility of this implementation is presently limited to a rather restricted class of multivariate copulas, comprising elliptical, Archimedean, extreme-value and some other copula families available in package copula.
VITA improves upon the approach of Mair et al. (2012) by allowing complete control of the p marginal distributions, of the bivariate copulas of a chosen set of p − 1 pairs of variables, and of certain conditional bivariate copulas of the remaining (p − 1)(p − 2)/2 pairs of variables.
The increased degree of control and flexibility of our approach relative to existing methods is made possible by employing the powerful multivariate copula-based construction called a regular vine. A primary aim of the present article is to present and illustrate our approach using the newly developed covsim package.
The remainder of the article is organized in the following way. First, we explain the copula approach in a two-dimensional setting. We then demonstrate a very flexible copula-based approach to non-normal simulation in the two-dimensional case. An important advantage of this approach is that the copula class and the exact marginal distributions of the twodimensional case may be fully specified by the user. Next, we develop the full multivariate extension, which still allows for complete control of the marginal distributions, and considerable flexibility in the dependence structure. We then detail the implementation of the covsim package, and end the paper with two additional examples: First, we show how to simulate from a non-normal continuous SEM with fixed parameters, and then we show how to simulate data for an ordinal SEM. Example code is provided throughout the paper, and complete replication code is available in the online supplementary material. The appendix provides an introduction to implementing the simulations on a computer, and follows the progression of the paper.
Throughout this article we illustrate the capacity of covsim in the context of simple structural equation modeling settings. We use only a limited set of non-normal distributional conditions in each illustration, and we caution that the external validity of our findings is therefore limited. To enhance the validity a larger range of distributional and sample size conditions must be included.

The bivariate case
We start by considering the bivariate case. Our aim is to introduce the concept of a copula and how it can be used to simulate non-normal random variables with a given correlation. In subsequent sections we extend our simulation procedure to the general multivariate case. For a textbook treatment of copula theory, see Nelsen (2007). Note that this book, together with most books on copulas, assume that the reader has a strong mathematical background, including some measure theory, and that we do not assume such a background in the current presentation. Some useful introductory papers on copulas which can be read without such a background are Yan (2007); Genest and Favre (2007); Frees and Valdez (1998).
A copula is a distribution with uniform univariate margins. Copulas are used to describe the dependency structure between variables, when taking the marginal distributions out of the equation. There are many classes of copulas, and within each class there is typically a parameter that controls the strength of dependence. We start with the normal copula. Let Φ(x) denote the cumulative distribution function (CDF) of a standard normal distribution, and let Φ 2 (x, y; ρ) denote the CDF of the bivariate standard normal distribution with correlation parameter ρ, i.e., Φ 2 (x, y; ρ) = P (Z 1 ≤ x, Z 2 ≤ y) where Z 1 , Z 2 are bivariate normal and standardized, and have correlation ρ. Then the normal copula with parameter ρ ∈ (−1, 1) is given by As an example of a non-normal copula class, consider Clayton copulas, which are parameter-  ized by the dependence parameter θ ∈ (0, ∞): Clayton copulas are useful for modeling lower tail dependence, a measure of dependence between two variables in the lower left tail of the joint distribution. Figure 1 depicts random draws of size n = 1000 from each of these copulas. We set ρ = 0.8 for the normal copula and θ = 3.4 for the Clayton copula. In both Figures 1a and 1b we see that the marginal empirical distributions are close to uniform. A notable difference is the lower tail dependence in Figure 1b which does not appear in Figure 1a.
Bivariate copulas are important since they constitute one of two fundamental building blocks for bivariate distributions. The other building block consists of the two univariate marginal distributions. A fundamental theorem (Sklar 1959) guarantees that any bivariate distribution may be decoupled into a bivariate copula and the two marginal distributions, and vice versa; given two marginal distributions F 1 (x 1 ) and F 2 (x 2 ) and a copula C(u 1 , u 2 ; θ), then is a valid bivariate CDF, whose univariate margins are distributed according to F 1 (x 1 ) and F 2 (x 2 ). For instance, if F 1 and F 2 are the standard normal distribution, the bivariate distributions stemming from the normal copula with ρ = 0.8, and from the Clayton copula with θ = 3.4, will both result in bivariate distributions with standard normal marginals, and with a Pearson correlation of 0.8. That is, setting θ = 3.4 yields a Clayton copula such that when combined with standard normal marginals will yield a distribution with ρ = 0.8. Figure 2 shows random samples from these two distributions, obtained by applying the standard normal quantile function to the observations in Figure 1, which will change the marginals of the simulated data to be standard normal. Figure 2 depicts two very different bivariate distributions. Although sharing the same standard normal marginal distributions, and the same correlation coefficient 0.8, it is clear that the distribution in Figure 2b is far from the bivariate normal distribution in Figure 2a.
This illustration hints at the following process for a researcher that wants to simulate data from a bivariate distribution with pre-specified covariance and univariate marginals:  1. Specify marginal distributions F 1 and F 2 and specify a target covariance.
For a given set of marginals, and a given copula, the set of attainable covariances is usually constrained. Then, in step 3 there is no solution θ 0 . In such a case, the copula class or the marginal specifications should be adjusted.
The three steps are conducted in the covsim package in R as follows.
R> library("rvinecopulib") R> cov(rvine(10^5, calibrated.vita)) As indicated above, we may simulate from a distribution of the form C(F 1 (x 1 ), F 2 (x 2 ); θ) by first simulating (U 1 , U 2 ) from the copula C, and then apply the quantile functions of the marginals to each coordinate. That is, (F −1 1 (U 1 ), F −1 2 (U 2 )) has marginals F 1 , F 2 and copula C, meaning its full distribution equals C(F 1 (x 1 ), F 2 (x 2 ); θ). See the technical appendix for an explanation for why this is so and how to simulate from a copula.

The trivariate case: Introducing vines
In the previous section we studied bivariate copulas, and the calibration of their dependence parameter so that the coupling of given marginals will meet a target covariance. There are many classes of bivariate copulas, but few classes of higher-dimensional copulas. In this section we will circumvent the lack of parametric multivariate copula classes by using a statistical construction called a regular vine (Bedford and Cooke 2002). Vines allow us to construct multivariate copula distributions by combining two-dimensional copulas. For the purpose of covariance modeling and simulation, the procedure detailed here was originally proposed by Grønneberg and Foldnes (2017).
Let us first proceed to the case of three variables. Our goal is to construct distributions with given marginal univariate distributions for each of the three variables, and with a given 3 × 3 covariance matrix. Imagine a researcher is concerned with whether non-normal correlated errors in growth curve modeling may affect the quality of inference for the correlation ρ between the intercept and slope factors. The default in growth curve modeling is to assume that residual errors are mutually independent across measurement occasions. However, correlated errors may be meaningful as they represent carryover effects from previous occasions not accounted for by the intercept and the linear slope latent variable (Grimm and Widaman 2010;Marcoulides 2019). The issue of how residual error structures in latent growth curve modeling should be specified (e.g., as constrained, free, independent, autocorrelated, homogeneous, or non-homogeneous) is currently of great concern in the SEM literature, as it is now becoming much more recognized that considerable bias in the latent variable variancecovariance matrix can arise from the improper specification of these errors (Dimitrov 2002; The researcher wants to simulate data with correlated residual errors and sets up a simple linear growth model, see Figure 3, where the population values are indicated: There is zero correlation ρ between the slope and the intercept, all latent variables have unit variance, and the errors δ = (δ 1 , δ 2 , δ 3 ) ⊤ are correlated with covariance matrix The researcher is concerned with non-normality in the error vector δ, and therefore wants to construct a trivariate non-normal distribution whose covariance is Σ δ .
We remark that the illustrations in the present article do not discuss the reasons behind specific choices of marginal distributions and dependence structure. Such choices depend on the purpose of the simulation study. Our aim in this and following illustrative analyses is simply to demonstrate how vine constructions work. However, we note that routines exist to select best-fitting vine structures and bicopula families relative to an existing real-world dataset (e.g., function vinecop() in package rvinecopulib). This could be done to increase external validity of simulation studies. For an example of how to construct a VITA distribution based on a well-known empirical dataset, see Grønneberg and Foldnes (2017, Section 3.2). General papers on the selection and usage of copulas are Yan (2007); Genest and Favre (2007); Embrechts, Lindskog, and Mcneil (2003); Grønneberg and Hjort (2014), an influential paper on vine-based modeling is Aas, Czado, Frigessi, and Bakken (2009), and a book with practical issues on vine modeling is Kurowicka and Joe (2011).
First, the researcher considers the three univariate error distributions, and decides that the first error should be standard normally distributed, the second error should be a scaled chisquared distribution with one degree of freedom (DF), and the third error should follow a scaled Student's t distribution with five DFs. The scalings are necessary to obtain unit variance in the two latter distributions. Clearly, it might be questioned whether the case of different error distributions for the same variable at different measurement occasions is realistic, but since our main purpose here is to illustrate the flexibility of VITA, we proceed with three different error distributions.
Even though the marginal distributions in δ have now been specified, there are still many trivariate distributions with these marginals and with covariance Σ δ . The specification will be complete once the copula of δ is selected, but this must be done cautiously and in a way that ensures δ has covariance Σ δ . Let us denote the copula as V . The joint distribution of δ will result when we couple together three marginals using V . That is, the CDF F δ of δ is given by where G 1 and G 2 are the CDFs of the scaled chi-square and t distributions, respectively. As in the bivariate case, simulation from the above distribution involves first simulating (U 1 , U 2 , U 3 ) from V , and then applying the quantile functions of the marginals to each coordinate of this vector. That is, the final simulated vector will be δ = To construct the vine V the researcher decides to couple the uniform marginals U 1 and U 2 with a Clayton copula, and U 2 and U 3 with a Joe copula. The dependence parameters of each of these bivariate copulas is numerically determined as described in the previous section, so that 3, and to achieve this we next introduce the concept of a vine.
Vines are convenient graphical tree structure models that can be used to build up highdimensional distributions from conditional two-dimensional copulas. Vines therefore decompose the multivariate copula into a hierarchy of bivariate copulas. A vine on p variables can be represented as a set of connected trees V = {T 1 , . . . , T p−1 }, where the edges of tree j are the nodes of tree j + 1, j = 1, . . . , p − 2 and are used to facilitate the picking out of various distributional characteristics, see Figure 4 for our current illustration with p = 3. The first tree has the variables as its nodes, and an edge between two variables means that these two variables are unconditionally coupled as in the previous section. In our case, we chose at the beginning to couple U 1 with U 2 and to couple U 2 with U 3 . This corresponds to the tree at the bottom of Figure 4. The second tree has the edges of the first tree as its nodes. In our case the first tree has only two edges: U 1 , U 2 and U 2 , U 3 . The second tree must therefore join U 1 , U 2 and U 2 , U 3 . This tree has one single edge, which is denoted by U 1 , U 3 |U 2 . That is, the second tree specifies the copula between U 1 and U 3 , conditional on U 2 . Note that we could have chosen a different tree at the first level, with edges, say, U 1 , U 2 and U 1 , U 3 , which would yield a different distribution. Also the bivariate copulas chosen for coupling pairs of variables could have been chosen differently, yielding other types of vine distributions. However, we do not explore the flexibility of vines in the present paper. We see in Figure 4 that there are a total of three edges in the vine, and that the edges correspond to the pairwise correlations among the three variables. This holds also in higher-dimensional vines: There is an exact correspondence between the edges in the set of trees, and pairs of variables. So each off-diagonal element in the covariance matrix corresponds to a unique edge in the vine. The researcher's goal now is to define the distribution of U 1 and U 3 . As suggested in Figure 4, this is done by specifying the distribution of U 1 and U 3 , conditional on U 2 . The researcher chooses a Frank copula for this distribution.
Returning to the illustrative example, we sum up the researcher's specifications for the residual error vector δ: • δ 1 follows a standard normal distribution, δ 2 follows a scaled chi-square distribution with one DF, and δ 3 follows a scaled t distribution with five DFs.
• The vine structure is given in Figure 4.
A visualization of n = 1000 randomly drawn error vectors is presented in Figure 5. Note that, expectedly, the first marginal distribution is approximately standard normal, while the second and third marginal distributions are in accordance with scaled chi-square and Student's t distributions. Now, having constructed a VITA distribution for the residual errors, the researcher may use simulation to assess whether the quality of NTML inference for ρ, the correlation between the intercept and slope factors, deteriorates under non-normal residual errors. As a benchmark, the researcher simulates from a fully normal distribution on the observed variables x 1 , x 2 , and x 3 . For the non-normal case, the researcher first simulates VITA residual errors, and then combines these with simulated intercept and slope values, each drawn from standard normal distributions, to obtain simulated observations on x 1 , x 2 and x 3 . The researcher replicates 1000 samples of size n = 1000 from both the fully normal distribution and the distribution with residual errors stemming from VITA. 1 The growth model was estimated with seven free parameters: three correlated residual errors; the residual error variances, which were constrained to be the same at each of the three measurement occasions; the correlation ρ; and the means of the intercept and slope variables. The model has one DF. As a measure of inference quality for ρ the researcher decides to calculate the confidence interval coverage rate, at the 95% confidence level, for ρ = 0, using classical standard errors that assume exact normality. Under full normality, the coverage rate of 0.94 was close to nominal. With a nonnormal error vector, the coverage rate was 0.905. Hence, the researcher found some support for the claim that non-normality in the error vector may affect the quality of intercept-slope correlation NTML inference.

The independent generator approach
The covsim package exports, in addition to vita(), the function rIG(). This simulation function is not based on a copula perspective and does not allow for full specification of the univariate marginal distributions. Instead it is closer in approach to the method of Vale and Maurelli (Vale and Maurelli 1983), where only univariate skewness and kurtosis are prespecified. However, the independent generator (IG) algorithm (Foldnes and Olsson 2016) is more flexible than the Vale and Maurelli method, defining a larger class of non-normal distributions for each set of skewness and kurtosis values. Although the main focus of the present manuscript is the flexible use of bivariate copulas in simulating non-normal data with given marginals and covariance matrix, we here for completeness give a short introduction to the IG algorithm.
The IG transform represents the non-normal vector ξ stochastically as where A is a square matrix and X a vector consisting of mutually independent generator variables with unit variance. The user specifies desired skewness and kurtosis values in ξ, and the IG algorithm numerically determines the skewness and kurtosis in each generator variable : Scatterplots and histograms for a n = 1000 sample drawn from a three-dimensional IG distribution.
to match the desired values. The matrix A is a square root of the specified covariance matrix Σ. In rIG() the user may specify a triangular square or a symmetrical square root matrix, which gives two different distributions. Also the marginal distributions for X may be freely chosen, further expanding the distributional class defined by IG. In its current implementation, rIG() uses the Pearson family of distributions (Pearson 1895). Let us reconsider the marginal distributions used above. We note that the chi-square distribution with one DF and the Student's t distribution with five DFs have skewness √ 8 and 0, respectively, and kurtosis 12 and 6, respectively. In the following code we ask for an IG distribution that matches the first four moments of the three marginal distributions considered in the previous section. The scatterplot is given in Figure 6. R> set.seed(1) R> ig.sample <-rIG(N = 10^3, sigma.target = sigma.target, reps = 1, + skewness = c(0, sqrt(8), 0), excesskurtosis = c(0, 12, 6))

A six-dimensional growth curve illustration
In this section we use the flexibility of VITA to further study the effect of non-normality in growth curve residual error vectors on normal-theory based inference. We focus on the chi-square statistic of model fit. We consider a linear growth curve with scores across six time-points. We assume that the errors δ i , i = 1, . . . , 6, have unit variance, and that they are autocorrelated according to the following banded structure (i.e., a Toeplitz structure): We calibrate the following three VITA distributions for the δ vector: VITA1 δ 1 , δ 2 , δ 3 , δ 4 , δ 5 , and δ 6 are standard normal, and all 18 bivariate copulas are of Clayton type.
Using the vita() function in package covsim the code is as follows: R> residual.covariance <-toeplitz(1:6) R> residual.covariance[residual.covariance > 3] <-0 R> residual.covariance[residual.covariance == 2] <-0.5 R> residual.covariance[residual.covariance == 3] <-0.2 R> margins.nonnorm <-list(list(distr = "norm"), + list(distr = "chisq", df = 5), list(distr = "chisq", df = 4), + list(distr = "chisq", df = 3), list(distr = "chisq", df = 2), + list(distr = "chisq", df = 1)) R> margins.norm <-list(list(distr = "norm"), list(distr = "norm"), + list(distr = "norm"), list(distr = "norm"), + list(distr = "norm"), list(distr = "norm")) R> margin.variances <-c(1, 10, 8, 6, 4, 2) R> sigma.target <-diag(sqrt(margin.variances)) %*% residual.covariance %*% + diag(sqrt(margin.variances)) R> set.seed(1) R> vita1 <-vita(margins.norm, residual.covariance, family_set = "clayton") Data generation first simulates independent random draws from the standard normal for the intercept and slope variables, and then adds the residual errors simulated from VITA distributions. A growth model with 15 degrees of freedom, which correctly specifies the structure for Σ δ , is fitted to the data. Our research question is to what extent non-normality in the errors affects the sampling distribution, and in particular, the type I error control of the regular normal-theory chi-square statistic T NTML . Since VITA1, VITA2, and VITA3 are different distributions, with different mixes of marginal and copula non-normality, there might also be insights to draw from their differential effect on the chi-square test. One way of conducting this research is to use conventional small-sample simulations and to calculate rejection rates over many replications. Here we choose a different approach. We calculate the exact asymptotic distribution of T NTML in each distributional condition. This will also give us asymptotic type I error rates. 2 First, we simulate a very large n = 10 6 sample from each of the VITA distributions. Then the model is fitted to each of the three datasets, and we extract the eigenvalues of the matrix U Γ (see, e.g., Foldnes and Grønneberg 2018 for further details). Theory (Box 1954) dictates that T NTML is asymptotically distributed as the weighted sum of independent chi-square distributions, each with one degree of freedom, where the weights are the eigenvalues of U Γ. This allows us to calculate the density of T NTML under the three distributional conditions. Under multivariate normality, this density is that of the nominal chi-square distribution with 15 degrees of freedom, which is used to calculate asymptotic type I error control of T NTML . Figure 7 depicts the asymptotic sampling distribution of T NTML under four conditions, namely multivariate normality, and the distributions involving the three VITA error distributions. It is seen that T NTML becomes inflated as we move from multivariate normality, and as we progress through the three VITA error distributions. The vertical line represents the critical value when referring T NTML to its critical value at the α = 0.05 level of significance. VITA1 has standard normal marginals and a non-normal copula. In this condition the asymptotic type I error control is 7.3%, quite close to the nominal level. VITA2 has four non-normal marginals, and a normal copula, and affects T NTML to a larger extent than VITA1. The asymptotic rejection rate under VITA2 is 29.4%, which is far above the nominal 5% level. VITA3 introduces more non-normality compared to VITA2, by having a non-normal copula. The effect on T NTML is critical, whose asymptotic rejection rate is 50.6% under VITA3 errors. In sum we see that non-normality in the residual error vector may markedly inflate the rejection rates of T NTML , but we may speculate that the effect is mild as long as the univariate marginals are normally distributed.

The implementation of VITA in covsim
In this section we briefly explain how the function vita() implements the VITA algorithm. Grønneberg and Foldnes (2017) provided as supplementary material a VITA implementation using package VineCopula (Schepsmeier, Stoeber, Brechmann, Graeler, Nagler, and Erhardt 2021) for constructing and simulating from regular vines. This package is no longer in active development, and package rvinecopulib (Nagler and Vatter 2021) was instead used in vita().
The most important benefits of rvinecopulib relative to VineCopula for our purposes is a sleeker and more modern application programming interface (API) and shorter simulation runtimes. In experiments (see supplementary material) with a five-dimensional vine, on a computer with 4 CPU cores, simulation runtimes at a sample size of n = 1000 were shorter with rvinecopulib compared to VineCopula by a factor of four. Also, as explained below, the initial calibration of VITA parameters involves a series of large-sample random draws from regular vines, which means that VITA calibration is computationally demanding. The rootfinding routine provided by Grønneberg and Foldnes (2017) has been improved in vita(), by splitting it into a high-speed routine which identifies an interval for the root, followed by high-precision root-finding in this interval, based on curve-fitting. This two-stage root-finding routine is faster than the basic method in Grønneberg and Foldnes (2017). Combined with faster simulation times in package rvinecopulib, the calibration time for a five-dimensional vine using vita() instead of the original code provided by Grønneberg and Foldnes (2017) was reduced by a factor of 13.
The main arguments to vita() are • margins. A list that specifies the univariate marginal distributions.
• vc. A vine copula structure in the format defined by package rvinecopulib. That is, a specification of a hierarchy of p − 1 trees, and, for each tree node, a bivariate copula family. If not provided by the user, vita() will initialize vc as follows. The vine structure of vc is specified as the simplest regular vine, namely the D-vine on p dimensions. See Figures 4 and 10 for the D-vine with p = 3 and p = 4, respectively. In addition, the bivariate copula family in each node in the D-vine will be taken as the first element of the argument family_set.
• family_set. A vector that specifies which bivariate copula families are to be calibrated. If vc is provided by the user, and the algorithm can not identify a feasible solution for the family dictated by vc, the algorithm instead tries to calibrate the dependence parameter for the first family in family_set. If not successful, an attempt is made to calibrate the parameter in the second family, and so forth. If vc is not provided, the algorithm attempts first to calibrate the dependence parameter in the first member of family_set, and if not successful, the second member, and so forth.
The above arguments specify a class of VITA distributions, parameterized by the p(p − 1)/2 dependence parameters in the bivariate copulas. The task of vita() is to numerically determine the values of these dependence parameters so that the resulting VITA distribution has the required covariance matrix given by sigma.target. That is, vita() searches for a dependence parameter value θ in each of the copulas, so that the covariance matrix of the resulting full distribution is sigma.target, up to numerical precision. This search can be done in the same order as when simulating from a vine, building our way up the tree, connecting more and more distributions with pairwise conditional distributions. As shown in Grønneberg and Foldnes (2017), the correlation of each pair of variables is typically a strictly increasing function of θ for most single parameter copulas, making the numerical search well behaved. Unfortunately, there is no simple formula for the correlation matrix of a vine. Worse still, no simple formula can be derived for pairwise bivariate distributions connected at higher levels of the vine tree. In the implementation of VITA in vita(), we resort to Monte Carlo simulation to approximate the required correlation.
VITA calibrates each pairwise bivariate distribution and combines them to form the full vine distribution in a specific order. A formal algorithmic description of the methodology is given in Grønneberg and Foldnes (2017). We here informally summarize the main steps of the method, and later describe in technical detail the new implementation of the root finding procedure that underlies the calibration.
As explained in more detail in the appendix, each pair (i, j), where 1 ≤ i, j ≤ p, is connected once in the vine. This is also the case in the covariance matrix. Let (σ ij ) be the target covariance matrix in sigma.target, and let the parameter of the bivariate copula connecting the (i, j) distribution be parameterized by θ ij . The first pairs of bivariate distributions that are calibrated are those connected at the lowest level of the vine tree. For illustration, we consider the vine given in Figure 4 (p. 10). We see that (1, 2) are connected at the lowest tree. We may therefore simulate directly from this bivariate distribution. This distribution depends on a parameter θ 12 . We may choose this parameter in such a way that the covariance of the resulting bivariate distribution matches the required covariance given in σ 12 . This matching is non-trivial and is described in technical detail below. A similar matching may be done for all other marginals connected at the lowest level of the vine, which here is only (2, 3). These calibrations are done independently of each other. Now the vine in Figure 4 has only two levels, and there is only one bivariate margin left to be matched, namely (1, 3), which is connected at the topmost level. The distribution of (1, 3) is derivable from the full vine structure, and the conditional copula of (1, 3) given 2 is parameterized using a bivariate copula with parameter θ 13 . To simulate realizations from (1, 3) we need to know the distributions at the lower level of the tree, as well as specifying the value of θ 13 . The distributions of the lowest level of the tree have already been fixed, and we may, for varying values of θ 13 , simulate the full three-dimensional vine distribution, compute the covariance of (1, 3), and select the θ 13 value that yields a covariance equal to σ 13 . We have then calibrated the full three-dimensional vine.
For higher dimensions, this idea has to be iterated several times to identify all parameters in an order that enable us to always simulate from the required bivariate variables in the vine. In order to briefly illustrate how to calibrate a higher-dimensional vine, consider the four-dimensional vine in Figure 10 in the appendix (p. 41). Comparing the vine in Figures 4  and 10, we see that the three-dimensional vine in Figure 4 is included within the structure of the vine of Figure 10 as a subset of its connections. That is, the vine in Figure 4 is a sub-vine of the vine of Figure 10 that comprises the variables (1, 2, 3): The vines are equal, with the exception that the four-dimensional vine also has to connect marginal 4 with the remaining variables, which is done using the additional structure given in Figure 10. But to simulate only (1, 2, 3) from the four-dimensional vine in Figure 10, we only need to know the three-dimensional vine in Figure 4.
To calibrate the four-dimensional vine in Figure 10, we may therefore continue where we left off when calibrating the three-dimensional vine in Figure 4. After calibrating the new bivariate distribution connected at the lowest level, namely (3, 4), the next step is to calibrate all distributions at the second level. Only one such distribution is left, namely (2, 4). By the same reasoning as earlier, we may simulate from the sub-vine that enables the simulation of (2, 3, 4): The parameters of the (2, 3) and (3, 4) distributions have already been fixed. Therefore, the sub-vine expressing the full distribution of (2, 3, 4) only have one free parameter, namely θ 24 , which may be varied to get a covariance between (2, 4) to match up with σ 24 . As in the first level of the tree, the matching of the parameters at the second level of the tree are done independently of each other, and can be done in any order. However, to calibrate all connections at a given level, all connections at lower levels have already to be calibrated from before.
The final matching required for the four-dimensional vine is then to work with the single distribution connected at the third level of the vine, namely the (1, 4) distribution, so that the (1, 4) marginal has covariance equal to σ 14 . All parameters of the vine are now fixed with the exception of θ 14 , and this parameter may be varied until the required covariance is induced.
The calibration order of variable pairs in vita() is as follows: All copulas at the lowest level are calibrated, then the next level, and so on up to the highest level. As mentioned in Grønneberg and Foldnes (2017), other orders are possible, as exemplified above, but the order is immaterial as long as unique solutions for reaching the desired covariances exist.
In each calibration step, numerical integration done via Monte Carlo simulation and a search for the solution of an equation must be performed. We now detail how this is done in the implementation of vita(). Let (U i , U j ) be distributed according to the copula of the subvine required to simulate the (i, j) distribution as described above. Due to the order we have traversed the vine, there is always only one free parameter θ ij of this distribution that is free, and is used to match up the required covariance σ ij . In the following description we omit the ij subscript from θ, to reduce the notational burden. As explained in more detail in the appendix, we apply the corresponding inverse quantile functions according to the entries in margins to calculate the covariance induced by a given θ: Let us denote by σ ij (θ) the covariance between the resulting variables F −1 i (U i ) and F −1 j (U j ). Our aim is now to determine θ so that σ ij (θ) = σ ij . Unfortunately there are no analytical expressions available for σ ij (θ) except in very special cases, but the covariance may be approximated by simulating a large sample of size n of (U i , U j ), applying the quantile functions F −1 i and F −1 j to each simulated variable, and calculating the resulting sample covariance, which we denote byσ ij (θ). Thenσ ij (θ) will converge in probability to σ ij (θ) as n increases. However, with large n, simulating these samples is time-consuming, so vita() is implemented in two stages.
1. Initial high-speed calibration. In this stage we use the modest sample size n = 1500 to determineσ ij (θ), using function uniroot() in the stats package. That is, we approximate θ by finding the rootθ n of the discrepancy functionσ ij (θ) − σ ij . We expectθ n to be quite close to θ, but it contains random error, so we repeat this procedure and approximate θ a small number of times (the argument numrootpoints). This results in a number of root candidatesθ 1 n ,θ 2 n , . . . ,θ numrootpoints n , which are independent and identically distributed random variables. A standard t-based confidence interval for the dependence parameter θ is then constructed from these approximate roots, using a high level of confidence (as specified by the argument conflevel).

Final high-precision calibration.
In this stage, we evaluate the discrepancyσ ij (θ) − σ ij to a high precision at a small number (as specified by argument numpoints) of equally spaced points across the confidence interval determined in the first stage. The approximation is done by simulating from a very large sample (n is equal to the argument Nmax) in each of these points. We then fit a second degree polynomial to the discrepancy values and use uniroot() to locate the root of this polynomial, which yields our final estimate for θ.
If, for any pair of variables, the calibration does not find a solution θ, the algorithm changes the bivariate family to the next entry in family_set. If no solution is found, vita() terminates with an error message. This means that there is no VITA distribution with the given marginals, vine structure and bivariate families that can attain sigma.target. To proceed, the user could then rerun vita() with, e.g., a different vine structure.
As mentioned in the introduction, traditional approaches to non-normal covariance modeling only specify the lower-order univariate moments, and do not offer any control of the multivariate aspects of the simulated vector, with the exception of covariance matching. As we have demonstrated, the VITA approach is more flexible. However, the cost of increased flexibility is increased computing time necessary to calibrate the VITA distribution. The default values for arguments Nmax and numpoints in vita() guarantee a highly precise VITA calibration. That is, the calibrated VITA distribution will have a covariance matrix almost numerically indistinguishable from sigma.target. In higher dimensions, this precision comes at the cost of long calibration running times. Table 1 gives calibration times on a computer (2.3 GHz 8-Core Intel Core i9) using the default options in vita(), with target correlation among all variable pairs equal to ρ = 0.3, for increasing dimensionality. It is seen that approaching 20  Table 1: Calibration times in hours under the default Nmax = 10 6 . Simulation of 1000 samples, each of size n = 10 3 . dimensions, calibration time exceeds one hour, while simulating 1000 samples, each of size n = 1000, requires less than a minute. So the calibration step, which is only executed once, is time-consuming, while repeated simulation from the calibrated VITA is relatively fast. Foldnes and Grønneberg (2022b) calibrated and simulated from VITA distributions in twenty dimensions in an extensive simulation design. However, given that the median number of observed variables in empirical SEM studies is close to 20 (Li 2016), using vita() for larger models, with say 50 dimensions, will entail days of calibration time with the default options. In such cases the user may lower the argument Nmax from 10 6 to 10 5 , thereby reducing calibration time by a factor of 10. For instance, for dimension 40 calibration was achieved in 6.2 hours using option Nmax = 10 5 . Even with reduced precision, the calibrated VITA distribution has a covariance matrix almost equal to the target covariance. Among the 780 pair-wise correlations estimated in a n = 10 6 sample drawn from the 40-dimensional calibrated VITA distribution with Nmax = 10 5 , 737 were within a 0.005 distance of the target ρ = 0.3, and all (except for one) were within a 0.01 distance of ρ = 0.3.
If high precision is important in an application, a formal test of equality of covariance matrices should be performed. This may be done by computing as test statistic the quadratic form of the discrepancies between the sample covariances and the target covariances, weighted by the inverse of the estimated asymptotic covariance matrix of the covariances (Mair et al. 2012).
To precisely (Nmax = 10 6 ) calibrate VITA distributions with 50 or more dimensions, our current implementation will demand unrealistically long running times. The bottleneck of the calibration algorithm consists of simulating a large sample (Nmax) from a regular vine. This simulated sample is then used to compute a single covariance. If we distribute the largesample simulation to several computers, the desired covariance of all the simulated realizations across computers can be computed based on sums, cross-products and sums of squares from each computer. Hence, the VITA algorithm may conveniently be distributed across a network of computers. Such functionality may be included in future versions of covsim.

Further examples
We here consider some further applications of VITA. In Section 6.1, we consider a 20dimensional SEM example with continuous data. In Section 6.2 we discuss simulation of ordinal SEMs and show how this can be done with VITA.

Using VITA to simulate continuous data for SEM
In most SEM simulation studies the methodologist first specifies a SEM model together with its population parameter values. Then the study is conducted by drawing random samples from a distribution whose covariance matrix equals the model-implied covariance matrix. As an example, consider the SEM whose structural part is depicted in Figure 8. The model has five factors, and is representative of a medium-sized SEM in applied studies (Li 2016). Each factor has four indicators, yielding a total of 20 dimensions. The factor loadings for the indicators were set to 0.8, 0.7, 0.6 and 0.5 within each factor, and the corresponding residual variances were set to 0.5. The correlation was set to 0.3 between the two exogenous factors, each of which had unit variance. The residual variances for the endogenous factors were also set to 0.5. Using the package lavaan we can compute the target covariance matrix implied by these population parameters as follows.
R> marginsnorm <-lapply(X = sqrt(diag(sigma.target)), + function(X) list(distr = "norm", sd = sqrt(X))) R> vitadist <-vita(marginsnorm, sigma.target) R> randomsamples <-replicate(10^3, rvine(10^3, vitadist)) As discussed previously, the calibration step is time-consuming in higher dimensions. Here, with 20 variables, the calibration step required 1.8 hours (again using a 2.3 GHz 8-Core Intel Core i9 CPU). This step is only performed once. When completed, random samples can be drawn at a relatively fast rate. Producing 1000 samples each of size 1000 took one minute to complete. Finally, we note that the calibration step may be performed faster by specifying option Nmax = 10 5 when calling vita(), at the expense of reduced precision in covariance matching.

Using VITA to simulate ordinal-categorical data for SEM
A major approach for SEM with ordinal data is to impose a threshold model to the data, which postulates that the categorical data arise from discretization of an underlying continuous vector which is multivariate normally distributed. Many influential simulation studies (e.g., Quiroga 1994;Rhemtulla, Brosseau-Liard, and Savalei 2012;Flora and Curran 2004;Li 2016) have investigated the robustness of ordinal SEM against violation of non-normality, using the Vale-Maurelli approach. However, Grønneberg and Foldnes (2019a) showed that the Vale Maurelli approach is not suitable for ordinal data simulation in the context of covariance modeling. We here briefly show how to simulate ordinal bivariate data by discretizing bivariate VITA distributions. For an observed ordinal variable, there is no way to identify which underlying univariate distribution produced the data, since the thresholds may be transformed to accommodate all continuous univariate distributions.
As argued in Grønneberg (2019a, 2022b), it is advantageous to keep the marginals fixed during simulation. Since VITA offers exact control of marginals, it is uniquely suited for simulation studies with ordinal data for SEM. We will here set the marginals to standard normal. When simulating fully normal data, both the marginals and the copula are normal. We will let the copula be non-normal, but the marginals will be normal.
For illustration, we assume that the underlying correlation in a continuous bivariate distribution with standard normal marginals is ρ = 0.5, and we discretize into three categories using  Table 2: Population contingency tables after discretizing the distribution with standard normal marginals and a Clayton copula (left) and the distribution with standard normal marginals and a Joe copula (right).
thresholds τ 1 = 0 and τ 2 = 1. This means that we consider simulated data of the form for i = 1, 2, where (ξ 1 , ξ 2 ) is a continuous random vector simulated using VITA. Both ordinal variables have proportions 0.5, 0.34, and 0.16. We inquire whether the polychoric correlation estimator used in ordinal SEM becomes biased when we replace the bivariate normal with a Clayton or a Joe copula. So first, we determine parameters for the latter two copulas such that, when marginals are standard normal, the Pearson correlation is 0.5.
After discretizing the distribution with standard normal marginals and a Clayton copula shown in Figure 9a and the distribution with standard normal marginals and a Joe copula in Figure 9b, the resulting population contingency tables are given in Table 2. The two contingency tables in Table 2 indicate that under the Clayton and Joe copula the probabilities that both ordinal variables take their maximum value are 4.4% and 9.1%, respectively. Such discrepancies in the bivariate ordinal distribution affects the normal-theory based polychoric   Figure 9 is 0.5, this shows that the polychoric correlation may become strongly (downwards or upwards) biased when underlying normality is violated. In this illustration, the lower tail dependency in the Clayton copula, combined with the chosen thresholds, results in strong downward bias. And the upper tail dependency in the Joe copula leads to strong upward bias.
The sensitivity of the polychoric correlation to underlying non-normality poses a threat to the popular practice of conducting SEM with ordinal data based on the polychoric correlation matrix. Even though a proposed SEM model fits well to the underlying data, it might fit poorly to the polychoric correlation matrix, since the latter might be biased due to nonnormality in the underlying data. The result might be biased estimates and inflated type I error rates, and it is therefore important to assess whether the ordinal dataset is compatible with the underlying normality assumption. Foldnes and Grønneberg (2019b) proposed a bootstrap test for this purpose, which is implemented in the R package discnorm (Foldnes and Grønneberg 2021). It is used as follows.

R> library("discnorm") R> bootTest(clayton.disc)
We conducted a small simulation study on the type I error control and power of the bootstrap test in the context of the present bivariate illustration. We generated 500 samples of size n = 100 and of size n = 500 and collected the rejection rate of the bootstrap test for ordinal data stemming from discretization of a normal distribution, the Clayton VITA, and the Joe VITA, using the same set of thresholds as depicted in Figure 9. The rejection rates are given in Table 3. The bootstrap test maintains type I error rates well, but has low power to detect the underlying normality in both the Clayton and Joe VITA distributions at the smallest sample size. Expectedly, at the larger sample size n = 500 the power to detect the underlying non-normality is higher. The non-normality in the Joe VITA is more detectable than the non-normality in the Clayton VITA at both sample sizes.

Conclusion
The VITA approach, implemented in the R package covsim, is very flexible, since it accommodates user-specified marginal distributions and also offers a great deal of control over the bivariate dependencies in the simulated vector. This control is in contrast to more established methodology, like the Vale and Maurelli (1983) method. In most cases, Vale-Maurelli has an exactly normal copula (Foldnes and Grønneberg 2015), and does not allow the specification of the resulting distribution except the covariance matrix, skewness and kurtosis. The increased flexibility of VITA, however, comes at a cost. VITA, being based on the statistical concept of a regular vine, is a more complex construction than the Vale Maurelli transform. Also, VITA calibration is more computationally demanding than is the case for the Vale-Maurelli transform. In the appendix we have given an introduction to the statistical machinery underlying vines and VITA, such as bivariate copulas and their use in constructing regular vines. We also give an introduction to how VITA simulation is performed on a computer. Numerical illustrations of applying VITA to simulate non-normal residual error structures in growth curve modeling were presented, demonstrating the effects of different kinds of non-normality on inference. Also, we have illustrated how VITA may be used to simulate continuous non-normal data from a SEM, and to simulate ordinal data in a way that properly violates the underlying normality assumption. For ordinal data, we illustrated a new bootstrap test, implemented in the R package discnorm, for the central assumption of underlying normality.

A.1. How uni-and bivariate simulations are performed on a computer
Throughout the paper, we assume that the marginals of the distribution we simulate from are continuous. In SEM, this is mostly without loss of generality, as ordinal variables are usually modeled as discretizations of a continuous random vector, see Section 6.2. Note that this also applies to a large class of IRT models (see, e.g., Foldnes and Grønneberg 2019a; Takane and de Leeuw 1987).

Univariate simulation
We first review a standard method to simulate from a continuous univariate distribution with cumulative distribution function F 1 . This is covered in most standard statistics text books (see, e.g., Rice 2006, Proposition D, Section 2.3).
We assume further that F 1 is strictly increasing, and therefore has an inverse F −1 1 . Since F 1 is a continuous distribution function, it will be continuous and increasing, but not necessarily strictly increasing (i.e., there may be flat regions), necessitating the use of more complex arguments, such as those in Chapter 1 of Shorack and Wellner (2009). We will throughout the paper pedagogically assume such strict monotonicity, and thereby avoiding such complex arguments.
Recall that the inverse function F −1 1 is defined as the solution to the equation F 1 (x) = u with respect to x, and where 0 < u < 1. Clearly, this solution depends on u, and we therefore denote the solution as a function of u, that is, F −1 1 (u). Since F −1 1 (u) = x where F 1 (x) = u we get that F (F −1 (u)) = F (x) = u. We may compute F −1 1 (u) by solving for u. Since F 1 is increasing and continuous, with F 1 (−∞) = 0 and F 1 (∞) = 1, (2) has a single solution, which can be found either analytically, or using any standard root finding procedure.
Now let U be a univariate random variable with the uniform distribution on [0, 1], denoted by U ∼ U [0, 1]. This means that P (U ≤ x) = xI{0 ≤ x ≤ 1} + I{x > 1} where I{A} is the indicator function of A which is 1 if A is true, and zero otherwise. We may then let The distribution of X is F 1 . To see this, we start with P (X ≤ x) = P (F −1 1 (U ) ≤ x). Applying F 1 on both sides of the inequality is allowed since F 1 is increasing. Since F (F −1 1 (U )) = U , this gives Since F 1 is a cumulative distribution function, we have 0 ≤ F 1 (x) ≤ 1, and therefore the first indicator function is always one, while the second is always zero. Therefore, we conclude that P (X ≤ x) = F 1 (x) as required.

Imposing required marginals on copula-distributions
Let us now consider the more complex bivariate case, which is not as well-known as the univariate case. Firstly, let us assume that we are able to simulate from a copula C. That is, suppose (U 1 , U 2 ) ⊤ ∼ C, meaning Recall that a copula has uniform marginals. This means that U 1 ∼ U [0, 1] and U 2 ∼ U [0, 1]. Therefore, following the argument above, we may let X 1 = F −1 1 (U 1 ) and X 2 = F −1 2 (U 2 ), and see that the marginal distribution of X 1 is F 1 and the marginal distribution of X 2 is F 2 . This means P (X i ≤ x i ) = F i (x i ) for i = 1, 2, and does not say anything about the dependence between X 1 and X 2 . We now show that (X 1 , X 2 ) has the distribution as in (1), i.e., P (X 1 ≤ x 1 , X 2 ≤ X 2 ) = C(F 1 (x 1 ), F 2 (x 2 )). Using the definition of X 1 , X 2 and applying the functions F 1 and F 2 respectively on the two inequalities, we have where the last equality uses (3).

Bivariate copula simulation
Let us now review how to simulate (U 1 , U 2 ) from C. We follow a general method described in Nelsen (2007, Section 2.9), which uses the so-called multivariate quantile transform. In the univariate case, the central step in simulating from F 1 was to compute the function F −1 1 . In the bivariate case, the central step is to compute a function h −1 u 1 (u 2 ), which will later be shown to be the inverse of the distribution function of U 2 conditional on U 1 . After having written code which can evaluate this function, the details of which will be covered shortly, we may simulate from C as follows: Let U 1 ∼ U [0, 1] and V ∼ U [0, 1] be independent. Then set U 2 = h −1 U 1 (V ). The pair (U 1 , U 2 ) is then distributed according to the copula C (Nelsen 2007, Section 2.9).
We now define h −1 u 1 (u 2 ). For each value 0 < u 1 < 1, let Then, for each 0 < u 1 < 1, the function h −1 u 1 is the (generalized) inverse of h u 1 (u 2 ) as a function of u 2 . Recall that for a function H(x), its generalized inverse is defined as H −1 (y) = inf{x : H(x) > y}, a definition which agrees with the standard inverse when H is invertible. As in the univariate case, where it was simpler to work with the case when F 1 was continuous and strictly increasing, we will again assume that h u 1 is continuous and strictly increasing for all u 1 . We now show that this follows if C has a density c which is non-zero on (0, 1) 2 and continuous in each coordinate. Both assumptions on c are fulfilled for all popular copula classes, and will therefore be assumed also in the following. To see that these two assumptions imply that h u 1 is continuous and strictly increasing for any u 1 , notice that since we have h u 1 (u 2 ) = u 2 0 c(u 1 , x 2 ) dx 2 by the fundamental theorem of calculus. Since c(u 1 , u 2 ) > 0, and since the function x 2 → c(u 1 , x 2 ) is continuous for a given u 1 , the integral is strictly increasing and continuous in u 2 and hence h u 1 (u 2 ) is strictly increasing and continuous in u 2 . Further, we have h u 1 (0) = 0 and h u 1 (1) = 1, and h u 1 : [0, 1] → [0, 1] is therefore a bijection for each u 1 . That h u 1 (0) = 0 and h u 1 (1) = 1 can be seen from noticing that h u 1 (0) = ∂ ∂u 1 C(u 1 , 0) = ∂ ∂u 1 0 = 0 since C(u 1 , 0) = P (U 1 ≤ u 1 , U 2 ≤ 0) = 0, and that since C(u 1 , 1) = P (U 1 ≤ u 1 , U 2 ≤ 1) = P (U 1 ≤ u 1 ) = u 1 we also have h u 1 (1) = ∂ ∂u 1 C(u 1 , 1) = ∂ ∂u 1 u 1 = 1. Finally, we show that (U 1 , U 2 ) ⊤ ∼ C. We have not found an elementary presentation of this result in the literature (see Rüschendorf 2009, Section 3, for an authoritative account), and include it for completeness and since the following argument is short, and will be generalized progressively in the following. For compactness, we assume h u 1 (u 2 ) is invertible with respect to u 2 for all 0 < u 1 < 1. We have Explanations: (a) Apply h U 1 to both sides of the second inequality.
where the expectation is with respect to V only, and U 1 is fixed. Recalling that for a random variable X with CDF F X we have Fundamental theorem of calculus. (f) Since 0 ≤ U 1 ≤ 1 and continuous we have C(0, u 2 ) = P (U 1 ≤ 0, U 2 ≤ u 2 ) = P (U 1 = 0, U 2 ≤ u 2 ) = 0. Practically speaking, we may therefore simulate from any copula by computing h −1 u 1 (u 2 ), which requires the computation of a partial derivative and the inversion of a function. This may be done analytically in some cases, but in general, numerical approximation is required.

Identifying correlations from a bivariate distribution
We now consider how we may identify a θ 0 such that the distribution F (x 1 , x 2 ; θ 0 ) = C(F 1 (x 1 ), F 2 (x 2 ); θ 0 ) has a Pearson correlation ρ. The Höffding (1940) formula for correlation ρ(θ) gives where sd(F 1 ), sd(F 2 ) are the standard deviations of F 1 , F 2 . Most bivariate copula classes are such that C(u 1 , u 2 ; θ) is increasing in θ, a property fulfilled by all copulas cataloged in Section 5.1 of Joe (1997). This implies , Theorem 1) that ρ(θ) is an increasing function. It is therefore easy to solve for θ 0 via numerical root finding methods. The function ρ(θ) then has to be evaluated through numerical integration.

A.2. How trivariate vine simulations are performed on a computer
We here provide more technical details on how to simulate from the three-dimensional vine distribution used as an example in the main text. As is generally the case, we only need to simulate from the vine copula (which by definition has uniform marginals), as the marginals are easily transformed to any desired marginal distributions in the same way as explained in the univariate and bivariate case, i.e., by applying the quantile functions of the marginals to each of the coordinates of the simulated vector from a copula. When this is done, the resulting distribution has the desired marginals, and the same copula as before transformation. That is, if C is a d-dimensional copula, meaning C is a d-dimensional cumulative distribution function with uniform marginals, and (U 1 , U 2 , . . . , meaning X has marginals F 1 , F 2 , . . . , F d and C as its copula.
Let us first consider how to simulate from a three-dimensional copula in general. Constructing multivariate distributions can be difficult, and vines provide a general construction which is often useful. A useful feature of this class is that some of the properties of the resulting distribution are well suited for computation, and in spite of the flexibility and simplicity of constructing vine distributions, simulating from them is straightforward.
We first consider how to simulate from a general three-dimensional copula (U 1 , U 2 , U 3 ) ⊤ ∼ C, which does not have to be a vine distribution. We first simulate (U 1 , U 2 ) from the bivariate copula C 1,2 as described in the bivariate section above. Recall that C 1,2 can be easily computed from C, since 0 ≤ U 3 ≤ 1 implies that C 1,2 (u 1 , u 2 ) = P (U 1 ≤ u 1 , U 2 ≤ u 2 ) = P (U 1 ≤ u 1 , U 2 ≤ u 2 , U 3 ≤ 1) = C(u 1 , u 2 , 1). We may now simulate U 3 from the distribution of (U 1 , U 2 , U 3 ) after "conditioning away" the values of U 1 , U 2 . Again we may use the multivariate quantile transform. We simulate V which is uniform on [0, 1] and independent from previously generated variables, and then let Conditional distributions are conceptually complex, and will only be covered superficially here. While we will give some more needed technical details for conditional distributions in the next subsection, we follow common text-book treatments (e.g., Rice 2006), and use the fact that if C has a density c 1,2,3 , which we will assume, C 3|12 has density given by where c 1,2 (u 1 , u 2 ) = 1 0 c 1,2,3 (u 1 , u 2 , x 3 ) dx 3 .
We then have that As in the bivariate case, h u 1 ,u 2 (u 3 ) is seen to be invertible if we assume that c is continuous and non-zero on (0, 1) 3 , and generalized inverses are not needed when computing U 3 . If a simple formula for the copula CDF is available, which we note is often not the case, we may avoid integration when computing h u 1 ,u 2 , since When only the joint density of C is available, integration is in general needed for computing C 3|12 , and this is achieved by numerical approximations.
Following this recipe gives (U 1 , U 2 , U 3 ) ⊤ ∼ C 1,2,3 by a similar argument as in the bivariate case: Again we assume h u 1 ,u 2 (u 3 ) is invertible as a function of u 3 for all 0 < u 1 , u 2 < 1. We then have Let us now apply this to three-dimensional regular vines. The idea behind vines is described in Joe (1996) and Joe (2014, Sections 3.8 and 3.9), and is based on expressing cumulative distribution functions as mixtures of conditional distributions. The motivation for its construction will be sketched in the next subsection, and we here simply state the distribution for a trivariate copula in terms of its CDF.
The three-dimensional vine copula illustrated in Figure 4 has cumulative distribution C 1,2,3 (u 1 , u 2 , u 3 ) = u 2 0 C 1,3;2 (C 1|2 (u 1 |z 2 ), C 3|2 (u 3 |z 2 )) dz 2 , where C 1,3;2 is a chosen bivariate copula to bind marginals 1 and 3 together, when given 2, and where with C 1,2 and C 3,2 the chosen copulas, directly giving the copula of marginals 1, 2 and 3, 2 respectively. Note that C 1,3;2 is a standard bivariate copula, and does not depend on the value x 2 being integrated over in the above display. This is an important point which will be discussed further in the next subsection.
There is an important distinction between C 1,3|2 , which is the conditional distribution of (U 1 , U 3 ) given U 2 , and C 1,3;2 , which as we will discuss later is the copula of C 1,3|2 . The objects C 1,3|2 and C 1,3;2 need not be the same, and while C 1,2,3 has all uniform marginals, the marginals of the conditional distribution C 1,3|2 need not be uniform, which in turn implies that C 1,3|2 need not be a copula. We will return to this issue in the following. In order to further separate the two further, we will keep the C notation for functions such as C 1,3;2since these are actually copulas, but rather write F 1,3|2 to refer to the conditional distribution of (U 1 , U 3 ) given U 2 . That is, we will from now on write F 1,3|2 = C 1,3|2 .
A notable feature is that this expression only depends on bivariate distributions, which are usually computationally well-behaved.

A.3. More details on the vine construction in the trivariate case
We here provide a sketch of the vine construction of Joe (1996). We are unaware of an elementary presentation of this material in the literature, and presentations such as those in Joe (1996); Bedford and Cooke (2002); Joe (2014) require considerable technical training to read. We therefore include an elementary presentation of this material here, restricted to the trivariate case.
Using operations similar to the derivation on the validity of the general trivariate simulation method, we see that for any trivariate continuous copula C, we have for variables This calculation provides an expansion of the full distribution of (U 1 , U 2 , U 3 ) in terms of the conditional distribution of (U 1 , U 3 ) given U 2 . This conditional bivariate distribution F 1,3|2 (u 1 , u 3 |x 2 ) = P (U 1 ≤ u 1 , U 3 ≤ u 3 |U 2 = x 2 ) has marginals F 1|2 (u 1 |x 3 ) and F 3|2 (u 3 |x 2 ), which can be derived using properties of conditional distributions. A non-rigorous heuristic argument for the formula for F 1|2 (u 1 |x 2 ) is that using the uniformity of U 2 . Similarly, F 3|2 (u 3 |x 2 ) = ∂ ∂u 2 C 3,2 (u 3 , u 2 ). A formal argument justifying the formulas for F 1|2 and F 3|2 requires the general and rather complex mathematical framework of conditional probability, as developed by Kolmogorov, see Kallenberg (2002). A nice feature following from our focus on simulation is that an alternative justification for the formula for conditional distributions is provided by its successful application in simulation. Sklar's theorem applied for each given x 2 value to the conditional distribution shows that there is a class of copulas C 13;2 (u 1 , u 3 ; x 2 ) varying with x 2 , which is such that F 1,3|2 (u 1 , u 3 |x 2 ) = C 13;2 (F 1|2 (u 1 |x 2 ), F 3|2 (u 3 |x 2 ); x 2 ).
Using the formulas we identified for F 1|2 (u 1 |x 2 ), F 3|2 (u 3 |x 2 ) and recalling that we started with an expansion for C(u 1 , u 2 , u 3 ), we have shown that which holds in general. This expression can also be used to construct multivariate distributions from bivariate distributions: Based on bivariate copulas C 1,2 and C 3,2 we may compute ∂ ∂u 2 C 1,2 (u 1 , u 2 ), ∂ ∂u 2 C 3,2 (u 3 , u 2 ), and they may be combined using a family of copulas C 13;2 (u 1 , u 3 ; x 2 ) for every x 2 . For each x 2 , the Sklar theorem implies that is a proper distribution. However, the family of copulas C 13;2 (u 1 , u 3 ; x 2 ) has to be linked together via their x 2 dependence in such that the resulting C in (7) is a proper CDF. This may be challenging, and does not have a simple solution.
The vine copula construction assumes that the family C 13;2 (u 1 , u 3 ; x 2 ) is constant in x 2 , i.e., does not depend on x 2 . This is known as the simplifying assumption (Joe 2014). We therefore write C 13;2 (u 1 , u 3 ; x 2 ) = C 13;2 (u 1 , u 3 ), and see that we re-gain the vine CDF of (6). Since C 13;2 (u 1 , u 3 ) does not vary with x 2 , the combination from (6) always results in a valid CDF, as may be seen as follows. We may consider the algorithm for simulating from C 1,2,3 . After having simulated from (U 1 , U 2 ) using previously described bivariate techniques, we define U 3 = h −1 U 1 ,U 2 (V ) from an independent V ∼ U [0, 1]. Clearly, U 3 is a random variable, and by the above argument, the joint distribution of (U 1 , U 2 , U 3 ) is precisely C 1,2,3 from (6), and hence C 1,2,3 is indeed a valid distribution function since it is the CDF of a random vector. By (7) the constructed distribution has C 13;2 as the copula of the conditional distribution of (U 1 , U 3 ) given U 2 .

A.4. The density of a four-dimensional vine
In the main text, we gave the density of the three-dimensional vine in Figure 4 without a complete technical description. We here rectify this by deriving the density of the more general four-dimensional vine as depicted in Figure 10, and sketch how to form such densities in general. How to simulate from this four-dimensional vine will be the topic of the next section. Our discussion of this four-dimensional example ought to be sufficient to prepare the reader to understand general descriptions of simulation in, e.g., Dissmann, Brechmann, Czado, and Kurowicka (2013); Joe (2014), as well as fully understanding the vine based VITA simulation methodology of Grønneberg and Foldnes (2017).
The copula density of a vine is found by multiplying all copulas that are chosen as edge copulas. These copulas are evaluated at rather specific points, which will be discussed in the following. The edge copulas are the copulas of bivariate conditional distributions from the resulting full copula, with conditioning set indicated by the edge names. For example, the top-most edge connects (U 1 , U 4 ) and conditions on (U 2 , U 3 ), and represents the copula of F 1,4|2,3 . Its contribution to the full density therefore includes a multiplicative factor c 1,4;2,4 , where the use of a semi-colon indicates that this is the copula of a conditional distribution. As explained in the previous section, we write all conditional distributions of c such as the actual conditional distribution (here, a density) of (U 1 , U 4 ) conditional on (U 2 , U 3 ) using the notation f 1,4|2,3 for the density and F 1,3|2,3 for the CDF.
Conditional marginals are the key to the general description of writing down the density of a vine copula based on its vine, such as Figure 10, as they are included in the multiplicative contribution from each edge. For any edge on the vine, the edge may be denoted by (i, j|v) where i, j are the marginals connected by this edge, and v may contain several indices (or none, as is the case at the lowest tree) which are conditioned on. For example, the top-most tree in Figure 10 contains only one edge, where i = 1, j = 4 and v = {2, 3}. In contrast, the second tree in Figure 10 contains two edges. For the left-most edge, we have i = 1, j = 3 and v = {2}. For the right-most edge, we have i = 2, j = 4 and v = {3}. For the lowest tree, we do not condition on any indices, and so v is always empty. Going from left to right, the first edge has i = 1, j = 2, the next edge has i = 2, j = 3, and the final edge has i = 3, j = 4.
The multiplicative contribution of every edge i, j|v is c i,j|v (F i|v When v is empty, F i|v (u i |u v ) is the actual cumulative distribution of U i , which is uniform since c is a copula. In these cases, we have F i|v (u i |u v ) = u i . Therefore, the contributions of the lowest tree are simply c 1,2 (u 1 , u 2 )c 2,3 (u 2 , u 3 )c 3,4 (u 3 , u 4 ).
Combining this description for the vine in Figure 10, we see that c 1,2,3,4 (u 1 , u 2 , u 3 , u 4 ) = c 1,2 (u 1 , u 2 )c 2,3 (u 2 , u 3 )c 3,4 (u 3 , u 4 ) Now each of the bivariate copulas, i.e., each c i,j;v are chosen by us, and therefore does not require further calculation to be evaluated. In contrast, the conditional marginal distributions have to be computed, and we now explain how this is done. We note that for general regular vines, there is a simple recursive method to calculate the required conditional densities, see Section 2.4 of Dissmann et al. (2013).
Combining the above derivations gives a complete expression for the density of the vine in Figure 10. A notable feature is that numerical integration is avoided, at least when the densities and cumulative distribution functions of the bivariate copulas chosen by the user can be evaluated without numerical integration, which is usually the case for commonly used bivariate copulas. In cases where numerical integration is required, only bivariate numerical integration is needed, which is considerably less complex than general high-dimensional integration.

A.5. How to simulate from a vine
Since the three-dimensional case considered earlier is too simple to easily see the general pattern of how to simulate from a general multivariate regular vine, we here consider the four-dimensional vine of Figure 10. The four-dimensional case is sufficiently complex for the general case to be within reach after having studied it.

Simulation from a general p-dimensional copula
We start by providing a general algorithm for simulating from an arbitrary p-dimensional copula. This method, while general, will in high dimensions often be numerically infeasible, as there are no closed form expressions for the quantities required for applying the algorithm and numerical approximations have to be employed. In contrast, we will see that simulating from vines is computationally simpler, since high-dimensional numerical integration is not required in most cases. Vine simulation, illustrated via a four-dimensional example, will be explained in the next section.
as required.

Simulating from a four-dimensional vine
Let us now see how to apply this general technique to the four-dimensional vine copula distribution represented in Figure 10. Again, simulation can be performed without needing numerical integration. The general simulation approach from the multivariate quantile transform always simulates from F j|1,2,...,j−1 with j starting at 1 and increasing up to p. This will always work, but for vines, we may simulate directly from the bivariate conditional distributions specified in the vine. Simulating from a bivariate conditional distribution will amount to simulate from two conditional distributions that are connected via bivariate copulas. This will in total lead to the same steps as the multivariate quantile transform, but has the advantage of having computations that are simpler to follow, as we follow the structure of the vine. A general simulation algorithm for regular vines is given in Algorithm 2.2 of Dissmann et al. (2013).
The main insight we need is that we have direct knowledge of certain conditional distributions from how the vine distribution is specified. We have easy access to the following conditional (and unconditional) distributions These distributions are all bivariate, and, as we have seen from constructing the joint density of (U 1 , U 2 , U 3 , U 4 ), can be joined to produce the full joint distribution of (U 1 , U 2 , U 3 , U 4 ).
We already know how to simulate from bivariate distributions. Let us see how this can be extended to simulating from bivariate conditional distributions. Suppose therefore, that we have simulated (U 2 , U 3 ) in such a way that it has the required bivariate distribution, i.e., it has the cumulative distribution function C 2,3 (u 2 , u 3 ) = C(1, u 2 , u 3 , 1). We may do this directly using previously described techniques, since the copula and marginals of U 2 , U 3 are known and directly specified.
How should we simulate from F 1,4|2,3 ? We will show that if u 2 and u 3 are fixed to the already simulated U 2 and U 3 respectively, we may treat F 1,4|2,3 as if it is a standard (non-conditional) distribution, and use already described techniques to simulate from this bivariate distribution. The resulting variables will be valid simulations of the remaining U 1 , U 4 .
Since F 1,4|2,3 is a bivariate distribution with non-uniform marginals, we will as before split this simulation into two steps. Firstly, we simulate W 1 , W 4 from its copula, which is C 1,4;2,3 . By the simplifying assumption, this copula is a standard bivariate copula which does not depend on variables simulated earlier. We will then transform W 1 , W 4 using univariate quantile transforms so that they have distributions F 1|2,3 and F 4|2,3 respectively. (v 1 ) = D 1 C 1,4;2,3 (v 1 , w 4 ).
Again note that due to the simplifying assumption, this step does not depend on the already simulated U 2 , U 3 .
We next transform W 1 , W 4 so that they are F 1|2,3 and F 3|2,3 distributed respectively. An important point here is that this is where dependence from U 2 and U 3 is introduced. We again use the univariate quantile transform and set where F 1|2,3 is given in (10) (p. 40) and F 4|2,3 is given in (11) (p. 40), considering the already simulated U 2 , U 3 as fixed. It is not immediately apparent that U 1 , U 4 have uniform marginals. This can be shown directly, but we will now show the more general fact that (U 1 , U 2 , U 3 , U 4 ) ⊤ ∼ C, and since all marginals in C are uniform, this also implies that U 1 , U 4 have uniform marginals.