rTensor: An R Package for Multidimensional Array (Tensor) Unfolding, Multiplication, and Decomposition

rTensor is an R package designed to provide a common set of operations and decompositions for multidimensional arrays (tensors). We provide an S4 class that wraps around the base 'array' class and overloads familiar operations to users of 'array', and we provide additional functionality for tensor operations that are becoming more relevant in recent literature. We also provide a general unfolding operation, for which the k-mode unfolding and the matrix vectorization are special cases of. Finally, package rTensor implements common tensor decompositions such as canonical polyadic decomposition, Tucker decomposition, multilinear principal component analysis, t-singular value decomposition, as well as related matrix-based algorithms such as generalized low rank approximation of matrices and popular value decomposition.


Introduction
Advances in medical imaging technology as well as telecommunication data-collection have ushered in massive datasets that make multidimensional data more commonplace. The multilinear structure of such datasets (e.g., individuals × traits × time) gives impetus for statistical techniques that preserve the dimensionality while still tying into the familiar framework of statistical inference and learning.
Flattening of the data (treating one or more of the levels as simply more observations or more variables) and then applying traditional matrix-based methods are often used (Yang, Zhang, Frangi, and Yang 2004;Zhang and Zhou 2005); however, methods that do not reduce the structural integrity of the data often outperform in both model parsimony and predictive performance (Vasilescu 2009;Sheehan and Saad 2007;Lathauwer, Moor, and Vanderwalle 2000b,a). Hence it is important to extend the statistical framework to datasets that inherently have multi-level structures.
The tensor framework generalizes the familiar notions of vectors and matrices and has been actively used and investigated in chemometrics, image sensing, facial recognition, and psychometrics. We point to Kolda (2006) for a very comprehensive list of references of tensor uses in a variety of fields. Also more recently, there has been a rise of tensor usage in data mining (Acar, Dunlavy, Kolda, and Morup 2011a;Yilmaz, Cemgil, and Simsekli 2011;Acar, Kolda, and Dunlavy 2011b;Morup 2011;Anandkumar, Ge, Hsu, Kakade, and Telgarsky 2014) and computation (Acar et al. 2009;Golub and Van Loan 2012;Kang, Papalexakis, Harpale, and Faloutsos 2012). In fact, many of the techniques that have been developed in lieu of a formal tensor setup are later shown to be special cases of models based on the tensor structure (Sheehan and Saad 2007), which further supports the notion that the tensor framework is wide-reaching and important to develop further.
Many papers exist for cataloging and surveying the use of tensor techniques, such as Kolda (2006); Kolda and Bader (2009) ;Lu, Plataniotis, and Venetsanopoulos (2011); Vasilescu (2009) and Grasedyck, Kressner, and Tobler (2013). Over the past few years, much has changed in the landscape of tensor analysis. This warrants a much more serious consideration of adopting tensor methodology in the statistical community. Our aim in building this R (R Core Team 2018) package is to facilitate tensor manipulation, modeling, and methodological research amongst statisticians. We also believe that with so many different ways to represent a general tensor in matrix form (unfoldings), it is important to have a consistent notation and terminology for such representations. cover the population value decomposition (PVD), the t-product and the t-singular value decomposition (t-SVD) cases. Package rTensor aims to provide a tensor class with general matrix unfolding operations and fundamental tensor operations to support novel development of tensor methods, while package PTAk is more specific in its application to spatio-temporal decompositions. Of course, there is also the 'array' class, which supports multidimensional arrays. Our package aims to extend the functionality of this base 'array' class by adding k-mode multiplication, t-products, transpose, unfolding, as well as various multidimensional decompositions. Package rTensor is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=rTensor. Special care is taken to prevent too much slowdown in the code. For instance, consider the unfold function, which is central to all of the decompositions and most of the operations featured in our package. There are still significant speed differences between our unfold and that of MATLAB's Tensor Toolbox, but those are mostly due to speed differences between R's base aperm function and those of MATLAB: R> tnsr <-rand_tensor(rep(20, 6)) R> Rprof() R> mtx <-unfold(tnsr, row_idx = c(4, 1, 3), col_idx = c(2, 5, 6)) R> Rprof ( (20,20,20,20,20,20); >> tic, b = permute(T, [4, 1, 3, 2, 5, 6]); b = reshape(b, 20^3, 20^3); toc, Elapsed time is 0.182706 seconds.

rTensor basics
A tensor used in data analysis is a multi-dimensional array (MDA). The modes of a tensor correspond to the dimensions of a MDA. A vector is a 1-tensor, a matrix a 2-tensor, and Slot name Type Description num_modes 'integer' The number of modes, or K. modes 'vector' The vector of modes/sizes/extents/ dimensions. data 'vector', 'matrix', or 'array' The actual data of the tensor. tensors with 3 or more modes are generally called higher-order tensors. Decompositions of higher-order tensors are often called multi-way analysis or multi-linear models. In this section, we will give an overview of tensor basics as well as how to perform the basic tensor manipulation tasks in package rTensor.

S4 class
Package rTensor exports the 'Tensor' S4 class, which extends the base 'array' class that ships with every version of R. The most accurate way to consider the 'Tensor' class is to see it as an API to the default R multidimensional array, allowing the user to easily create, manipulate and model tensors coherent with the set of terminology and algorithms set forth by Kilmer, Braman, Hao, and Hoover (2013); Vasilescu (2009); Kolda (2006); Bro (1997). The 'Tensor' class contains three slots which are given in Table 1.
Let K denote the number of modes for a tensor, and let n 1 × n 2 × . . . × n K denote the extents of the modes associated with a K-tensor; n k specifies the extent of the tensor along mode k. Creation of a 'Tensor' object is done mostly via as.tensor, which takes in an 'array', 'matrix', or 'vector' as argument. It is also possible to initialize a 'Tensor' object using the new("Tensor", num_modes, modes, data) command. Addition and subtraction is defined element-wise for tensors of the same modes, while the Frobenius norm extends the matrix case in the usual manner: Element-wise operations such as addition, subtraction, element-wise exponentiation, etc. have been overloaded for the 'Tensor' class. For binary element-wise operations, we can provide a 'Tensor' operand along with a 'array' operand as long as the dimensions match up. The same rules apply for 'matrix' objects and 'numeric' vectors.
The Frobenius norm of a K-tensor can be obtained using the method fnorm, and we can sum or average across any mode of the tensor to obtain a (K − 1)-tensor using the methods modeSum or modeMean. The inner product of two tensors of equal modes can be calculated via the method innerProd.
We can subset a 'Tensor' just as we would an 'array' object: simply invoke the subsetting operator [ and provide the indices of the subset. Any index that is left blank will retrieve the entire range of that mode. To assign the subset of a tensor a value, we would do the expected thing. When a 'Tensor' object is printed to the screen, we only display the first few elements of the data, along with the number of modes and the mode vector. To save space, we omit the data output of printing the tensor. The following code illustrates how to create and operate on a tensor: R> library("rTensor") R> indices <-c(10, 20, 30, 40) R> arr <-array(rnorm(prod(indices)), dim = indices) R> tnsr <-as.tensor (arr

Datasets
We include the AT&T database of faces (Cambridge 1994) in tensor format in this package, which contains images of 40 individuals under 10 different lightings, each image with 92 × 112 pixels. We structured this dataset as a 4-tensor with modes 92 × 112 × 40 × 10, where the first two modes correspond to the image pixels, the third mode corresponds to the individual, and the last mode corresponds to the lighting condition. The data object can be accessed using faces_tnsr. It is also fairly simple to plot any of the images in this dataset, using the function plot_orl. The following snippet of code generates the image corresponding to the 5th individual under the 10th lighting condition: R> plot_orl(subject = 5, condition = 10)

Tensor unfolding
For K ≥ 3, it is often useful to be able to represent a K-tensor as a matrix or as a vector, especially as a first step in defining a tensor multiplication. This representation is often called unfolding or flattening. To represent a general K-tensor as a matrix, one can choose exactly which modes to map onto the rows and columns. While there are a few conventions that have prevailed in the tensor literature, package rTensor provides a general unfolding function that encompasses these conventions as special cases. The folding operations, which invert these unfold operations, are defined through the unfolding themselves. It is important to note that the foldings operate on any arbitrary matrix, so it becomes necessary to specify the exact modes of the resulting tensor.
The general matrix unfolding maps a subset of the modes as indices in the rows and the remaining modes as indices in the columns. As such, it needs to know both which modes are mapped to the rows (row_idx =) and which are mapped to the columns (col_idx =). The orders of the indices within the rows and columns depend on the order given in these two parameters. Consider the following example. We first use the function rand_tensor to generate a 4-tensor consisting of i.i.d. random Normal(µ = 0, σ = 1) entries, then unfold in two different ways: R> tnsr <-rand_tensor(modes = c(3, 4, 5, 6)) R> unfold(tnsr, row_idx = c(1, 2), col_idx = c(3, 4)) In the general fold method, we would need to specify the full modes of the original 'Tensor' object as well as row_idx and col_idx.
The default for col_idx is NULL so it does not need to be specified. We also provide a convenience function vec.
The formal notation for this operation gives a mapping from the (i 1 , i 2 , . . . , i K )th element to the (i k , j)th element of the resulting matrix, where We stay consistent to the convention in the permutation of the indices {n 1 , . . . , n k−1 , n k+1 , . . . , n K }. For a 3-tensor, there are three k-mode unfoldings, denoted X (1) , X (2) , and X (3) . Figure 1 shows how the extents of the original tensor map onto the rows and columns of the three unfoldings.
To invoke the k-mode unfolding in the mode k using package rTensor, the user can call unfold by specifying row_idx = k or use the convenience function k_unfold with m = k.

Tensor multiplication
The k-mode product specifies multiplication between a K-tensor X ∈ R n 1 ×n 2 ×...×n K and a matrix M ∈ R J×n k , where n k is the k-mode for X (Kolda 2006). The result is a K-tensor in As the name suggests, this product definition is closely related to the k-mode unfolding. In fact where "·" denotes the usual matrix multiplication.
In other words, we can think about the k-mode product as a left matrix multiplication onto the k-mode vectors: each k-mode vector of the resulting tensor Y is a result of a matrix-vector multiplication between M and the corresponding k-mode vector of X . Note that if M is a vector (i.e., J = 1), then each k-mode vector of Y is the result of an inner product between two vectors, and Y will have the kth mode being 1 and is essentially a (K − 1)-tensor.
Also note that if X ∈ R n 1 ×n 2 were a matrix, then the k-mode product between X and M 1 ∈ R J 1 ×n 1 , M 2 ∈ R J 2 ×n 2 is equivalent to the following matrix products The function to perform k-mode multiplication is ttm -short for "tensor times matrix", the name of a function with similar usage in the MATLAB Tensor Toolbox (Bader and Kolda 2004). ttm takes in a 'Tensor' object, a 'matrix' object, and the mode for multiplication. It then proceeds to unfold the 'Tensor' object in the mode specified, to perform matrix multiplication with the 'matrix' object on the left, and to fold the resulting matrix back into a 'Tensor'. Naturally, the number of columns of the 'matrix' object must match the mode specified for the original 'Tensor' object.
R> tnsr <-rand_tensor(modes = c(4, 6, 8, 10)) R> mat <-matrix(rnorm(12), ncol = 6) R> ttm(tnsr = tnsr, mat = mat, m = 2) Numeric Tensor of 4 Modes Modes: 4 2 8 10 Data: [1] -1.4457715 -0.1249305 2.3950996 -0.7543450 -0.1727204 0.7295488 The k-mode product serves as the basis for many tensor decompositions and regression models, including the Tucker decomposition and the CP decomposition. It also bears mentioning that the Kronecker product permits a matrix view of the product between a general K-tensor and a list of matrices (Kolda 2006) For many tensor decompositions, there is frequently a need to perform a series of k-mode multiplications using multiple factor matrices. To this end, ttl is a function that takes in a single tensor X , a 'list' of 'matrix' objects {M 1 , M 2 , . . . , M n }, and a vector of modes (i 1 , i 2 , . . . , i n ), and then returns the output The number of columns of each matrix must match the corresponding modes of X .

TRUE
For more properties of the k-mode product, see Kolda (2006). While the k-mode product defines multiplication between a tensor and a matrix, it does not provide a natural way to multiply two 3-tensors. To this end, the t-product has recently been proposed by Kilmer and Martin (2011). First we must illustrate the block circulant matrix generated from the matvec(·) of a tensor. Let X ∈ R n 1 ×n 2 ×n 3 , then Figure 3: k-mode product of X ∈ R n 1 ×n 2 ×n 3 and matrices where X j = X [:, :, j] is defined to be the jth slice of X along mode 3.
The t-product is defined via the block circulant structure and the matvec(·) operator and allows for a direct multiplication of two 3-tensors. For A ∈ R n 1 ×n 2 ×n 3 and B ∈ R n 2 ×L×n 3 , the t-product is A * B ∈ R n 1 ×L×n 3 , where the matvec(A * B) is a result of matrix multiplication Here A j = A[:, :, j], B j = B[:, :, j] are the jth slices along mode 3 of A and B respectively. To get the tensor product A * B, we simply have to fold matvec(A * B) using the inverse folding for matvec(·), denoted unmatvec(·). Hence A * B = unmatvec(circ(matvec(A)) · matvec(B)).
From the definition of the t-product, we can see that each mode-3 slice of the resulting tensor A * B is given by a sum of products of the mode-3 slices of A and B. In fact, the t-product A * B defines a linear map that takes B ∈ R n 2 ×L×n 3 to R n 1 ×L×n 3 (Martin, Shafer, and LaRue 2013). It also allows the extension of familiar linear algebra concepts such as the transpose, orthogonality, nullspace, and range. For detailed accounts of these properties, refer to Kilmer and Martin (2011). When n 3 = 1, we get back the usual matrix multiplication.
The t-product is implemented via the function t_mult. It takes in two 'Tensor' objects, X ∈ R n 1 ×n 2 ×n 3 , Y ∈ R n 2 ×L×n 3 , and returns X * Y ∈ R n 1 ×L×n 3 (Algorithm 1). Note that we take the fast Fourier transform based algorithm approach as found in Kilmer et al. (2013) in our implementation. This implementation avoids creating the prohibitively large block circulant matrix that is involved in the definition of the t-product.

rTensor decompositions
In this section, we describe tensor decompositions that are implemented in package rTensor. These decomposition models represent the bulk of the tensor methodology used in facial recognition, data mining, and statistical analysis of images. Throughout this section, we illustrate the various decomposition functions in package rTensor using the AT&T face dataset (Cambridge 1994), which is included in the package as faces_tnsr.

CP, HOSVD, and Tucker
The CP decomposition stems independently from psychometrics (Carroll and Chang 1970) and chemometrics (Bro 1997), where the same method was separately named canonical decomposition (CANDECOMP) and parallel factors (PARAFAC).
A K-tensor X ∈ R n 1 ×n 2 ×...×n K is called rank-1 if it can be expressed as an outer product of K vectors. More precisely, the rank of a K-tensor is defined as the minimal value of r for which the tensor can be expressed as as sum of r rank-1 tensors: For matrices, the well-known Eckhart Young theorem provides the existence and form of an optimal lower-rank approximation. However, this type of result has been shown not to generalize to K-tensors for K ≥ 3 (Kolda 2003).
The CP decomposition provides an approximation of X using a rank-r tensorX , where r is given a priori. The goal is then to construct a rank-r tensor that minimizes the Frobenius norm of the difference between X andX with U k = u k1 u k2 . . . u kr ∈ R n k ×r and Λ ∈ R r×r×r a 3-tensor that contains the λ's on the super-diagonal and 0 elsewhere, as seen in Figure 4. Figure 4: CP decomposition for a X ∈ R n 1 ×n 2 ×n 3 . The first part shows a representation using a sum of rank-1 tensors. The second part shows a representation using factor matrices.
The equivalence of lines (1) and (2) above are due to the fact that each u k vector is v k normalized by its norm with the norm information stored in the λ r 's. Furthermore, we can store the u k vectors as a factor matrix U k for each k = 1, . . . , K (Kroonenberg 2012), leading to the form in line (3). Note that here the U k matrices are not orthogonal. This relationship is illustrated for a 3-tensor below in Figure 4.
The cp function implements the classical alternating least squares method to compute the CP decomposition of a general K-tensor. 1 Note that this algorithm is not guaranteed to converge to the global minimum (Kolda 2006). The function returns lambdas and U_list. lambdas is a vector containing the elements in the super-diagonal core tensor λ, while U_list is the list of factor matrices U 1 , . . . , U K . We demonstrate the CP decomposition on one of the subjects (#14) in the AT&T face database.  The Tucker decomposition (Tucker 1966;Kroonenberg 2012;Lathauwer et al. 2000b) is still based on the idea of obtaining the best approximation of X , but relaxes the constraint thatX must be expressed as a sum of r rank-1 tensors. Instead, the Tucker decomposition constructŝ X to approximate X ∈ R n 1 ×n 2 ×...×n K using a reduced core tensor G ∈ R r 1 ×...×r K and K factor matrices, each of rank r k ≤ n k , k = 1, . . . , K, If this looks similar to the CP, it is because the CP decomposition is a specialized version of the Tucker decomposition with all the ranks equal, (i.e., r = r 1 = . . . = r K ) (Kolda 2006). One way to compute the Tucker decomposition is via the higher order orthogonal iteration (HOOI) (Lathauwer et al. 2000b), which constrains the factor matrices to be orthogonal.
Before we demonstrate HOOI, we first discuss the very much related higher-order singular value decomposition (HOSVD) provided by the seminal paper by Lathauwer et al. (2000a). HOSVD decomposes a K-tensor X ∈ R n 1 ×n 2 ×...×n K as follows where each square matrix U k ∈ R n k ×n k is orthogonal and the core tensor G ∈ R n 1 ×n 2 ×...×n K has the special property that for any k, 1 ≤ k ≤ K, it is • all-orthogonal: G i k =α , G i k =β = 0 for any α = β, and • ordered: All-orthogonality of G means that for any of the K modes, any subtensor of size K − 1 with different indices (i.e., α and β) along the same mode has an inner-product of 0. For example, if K = 3, then any two matrix slices with different indices along that mode has an innerproduct of 0. It is important to note that the resulting G is a diagonal tensor under the CP decomposition, but not necessarily so under Tucker.
The corresponding algorithm to compute the HOSVD illustrates its crucial connection between the k-mode unfolding: for each k-mode unfolding, perform a matrix SVD for X (k) so that giving us the HOSVD. 2 We demonstrate below how to do this in package rTensor. Note that we can also allow the orthogonal factor matrices to be truncated in a HOSVD (i.e., truncate each U k to its first r k columns for each 1 ≤ k ≤ K), thereby compressing X and forming an approximation. This procedure is called the truncated HOSVD (Lathauwer et al. 2000b), and it is done simply by specifying the reduced ranks in the hosvd function. The truncated HOSVD forms the conceptual basis for HOOI.
For HOOI and the remaining iterative algorithms we describe in this paper, we use the same convergence criterion. At each iteration of the algorithm, we check for convergence by first forming a current estimate of the original tensor based on the estimated U k 's, then measure the difference between the estimation error of the current iteration and the estimation error of the previous iteration. If this difference is smaller than the difference tolerance set by the user, then we stop iterating. By default there is also a maximum number of iterations set in place, which can also be adjusted by the user.
We apply the more general HOOI on the entire face dataset, compressing on all 4 modes (including the mode running across the individuals), as demonstrated in Figure 7.

GLRAM, MPCA, and PVD
In this section, we discuss several statistical models that are unified under the Tucker framework. While some of these models explicitly use tensor notation and methodology, others use a series of matrices as inputs.
The generalized low rank approximation of matrices (GLRAM; Ye 2005) belongs in the latter category. For a series of matrices of the same size, M 1 , . . . , M n 3 ∈ R n 1 ×n 2 , GLRAM constructs orthogonal matrices L ∈ R n 1 ×r 1 , R ∈ R n 2 ×r 2 and a series of core matrices G j ∈ R r 1 ×r 2 to minimize the quantity n 3 j=1 M j − L · G j · R 2 F . The parameters r 1 and r 2 would also need to be given a priori.
The series of images GLRAM takes as input can be restructured into a 3-tensor X ∈ R n 1 ×n 2 ×n 3 , where X [:, :, j] = M j . This has been done in Sheehan and Saad (2007), where it is shown that when structured in this way, GLRAM becomes a special case of the HOOI. During GLRAM, we are essentially performing compression over only the first two modes, leaving the third mode uncompressed. We present the GLRAM algorithm using tensor notation in Algorithm 3 and demonstrate the function glram on subject 21 in the AT&T face dataset in Figure 8.
We turn next to the entire face database of 40 individuals and run MPCA on the 4-tensor, compressing on the first three modes (e.g., image row, image column, and pictures), using the same r 1 and r 2 as in GLRAM. We can also examine the Frobenius norm recovered using MPCA, and it seems as though having more individuals (and hence having a 4-tensor) did not help in recovery. This is also noticeable from the estimated images.

R> mpca1$norm_percent
[1] 84.54957 R> mpca2 <-tucker(faces_tnsr, ranks = c(46, 56, 10, 10)) R> mpca2$norm_percent [1] 79.65943 PVD was recently proposed by Crainiceanu, Caffo, Luo, Zipunnikov, and Punjabi (2013) and provides a framework to construct population-level factor matrices for a series of images. We show in this section that PVD can be viewed as a variant of GLRAM. This point was first made by Lock, Nobel, and Marron (2011) in the rejoinder of the original PVD paper Crainiceanu et al. (2013), and Crainiceanu et al. replied that PVD differs from Tucker (or specifically, GLRAM) in many ways. Most notably, the matrices P and D do not have to be orthogonal and the default PVD has a closed form solution. We first present PVD and the default algorithm suggested by the authors to construct the population matrices P and D, then examine the differences between PVD and GLRAM. Finally, we discuss how PVD might be cast into the tensor framework.
Like GLRAM, PVD is a model designed for a series of matrices instead of a 3-tensor. Given a sample of images M 1 , . . . , M n 3 ∈ R n 1 ×n 2 , and 2n 3 + 2 parameters, PVD constructs population level matrices P ∈ R n 1 ×r 1 and D ∈ R n 2 ×r 2 such that X j = P ·V j ·D+E j , where the V j ∈ R r 1 ×r 2 , j = 1, . . . , n 3 , are called the core matrices. In addition to the 2 parameters r i ≤ n i , i = 1, 2, we also would need to choose 2n 3 compression parameters, l 1 , . . . , l n 3 , h 1 , . . . , h n 3 , that will determine how much left and right truncation will occur for each of the n 3 matrices.
The PVD procedure starts with a separate SVD of each image, M j = U j Σ j W j , truncating (possibly differently for each image) the left and right eigenvectors to formŨ j andW j . Then one stacks theŨ j 's column wise to form a big matrix U and does the same forW j 's to form W . The final step is to conduct an eigenvalue decomposition of U · U and W · W to form the population level matrices P and D. In the end, each M j has a projection Unlike the usual algorithm needed to solve GLRAM, the algorithm to solve the default PVD is not iterative, although the computational cost of the model does scale up with the number of images, since each image requires a separate SVD. Furthermore, with a large n 3 , the U U and W W matrices may be intractable for a full eigenvalue decomposition. We present the matrix version of the default PVD algorithm below.
We decided that the input into a PVD model should be a 3-tensor X ∈ R n 1 ×n 2 ×n 3 , with n 3 being the number of images in the series, and each n 1 ×n 2 X [:, :, j] being an image, 1 ≤ j ≤ n 3 .
As with the ranks of HOOI and CP, optimizing the parameters required by the PVD model is currently mostly ad-hoc. Recall that for X ∈ R n 1 ×n 2 ×n 3 , there are 2n 3 + 2 parameters. At each of the n 3 SVDs of individual images, j and h j are the truncation indices for the left and right eigenvectors, respectively. These are the uranks and wranks. At the end, we also have r 1 and r 2 , which are the truncation indices for the two final eigenvalue decompositions of the two large covariance matrices. These are the parameters a and b required by the pvd function. Empirically, we have found that having j > r 1 or having h j > r 2 resulted in poor fits of the data. To illustrate pvd, we return to the AT&T face dataset, choosing subject # 8 subject this time.

t-SVD
The decomposition t-SVD is based on the t-product (recall its definition from Section 2.4) (Kilmer et al. 2013). Before we discuss the t-SVD, we first introduce the notion of the tensor transpose based on the t-product. Let X ∈ R n 1 ×n 2 ×n 3 , then It is easily verified that (X ) = X . Furthermore, let the identity tensor I ∈ R n 1 ×n 1 ×n 3 be defined with I[:, :, 1] = I n 1 , the matrix identity of size n 1 , and the rest of I is set to 0. These two definitions then facilitate the notion of tensor orthogonality via the t-product Q ∈ R n 1 ×n 1 ×n 3 is orthogonal if and only if Q * Q = Q * Q = I ∈ R n 1 ×n 1 ×n 3 .
We have overloaded the transpose function t for the 'Tensor' class. Note, however, that this is only currently defined for 3-tensors and not for general K-tensors.
R> tnsr <-rand_tensor(c(4, 5, 6)) R> tnsr  Kilmer et al. (2013), an orthogonal tensor preserves the Frobenius norm under the t-product. In other words, if Q ∈ R n 1 ×n 1 ×n 3 is orthogonal and X ∈ R n 1 ×n 2 ×n 3 , then Q * X F = X F . We can now describe the t-SVD: let X ∈ R n 1 ×n 2 ×n 3 , then X admits a decomposition X = U * S * V , where U, V are orthogonal tensors of sizes n 1 × n 1 × n 3 and n 2 × n 2 × n 3 respectively, and S is of size n 1 ×n 2 ×n 3 and consists of diagonal matrices along the third mode. When n 3 = 1, then t-SVD reduces to the matrix SVD of X ∈ R n 1 ×n 2 (Kilmer et al. 2013). This is a consequence of the fact that the t-product reduces to matrix multiplication when n 3 = 1. Figure 12: The t-SVD for a X ∈ R n 1 ×n 2 ×n 3 results in two orthogonal tensors -U ∈ R n 1 ×n 1 ×n 3 and V ∈ R n 2 ×n 2 ×n 3 -and S ∈ R n 1 ×n 2 ×n 3 has diagonal faces along n 3 .
Algorithm 5: An illustration of the t-singular value decomposition (t-SVD). input : X ∈ R n 1 ×n 2 ×n 3 for i 1 = 1, . . Orthogonal tensors U ∈ R n 1 ×n 1 ×n 3 , V ∈ R n 2 ×n 2 ×n 3 and S ∈ R n 1 ×n 2 ×n 3 with diagonal slices along the third mode The S tensor contains the eigentubes S[i, i, :], 1 ≤ i ≤ñ := min(n 1 , n 2 ), each of which is a vector of length n 3 . Similar to the matrix eigenvalue counterparts, these eigentubes are ordered by the Frobenius norm Computations involving the t-product is not carried out via the block circulant matrices but rather via the fast Fourier transform based algorithms given by Kilmer and Martin (2011). FFT facilitates the computation since the matrix formed by unfolding the first tensor is block circulant, and the transform gives rise to a block diagonal structure in Fourier space, which allows for significant reduction in computation of the product. While complex values are involved in the computation, if X consists of real values, then U, V and S are all real as well.
We demonstrate how to compute the t-SVD using package rTensor below. Unfortunately, by using the fft function in R, there is some round-off error that occurs when we transform a series using FFT, then transform it back using the inverse FFT. Our calculations in the t-SVD inherit these round-off errors, as we observe in the example below.

Summary
We provide an R package that implements prevalent tensor unfolding/refolding, multiplication, and decompositions. This article is meant to guide users of our package as well as connect several themes in the tensor literature. Tensor methodology is still under active development, and the landscape of tensor computation can change very rapidly due to surging interests from the field of statistics and machine learning.
The following table summarizes all of the decompositions and the inputs. These decompositions represent the bulk of the package, and we try to be consistent in the outputs of each decomposition. The output to every function is a standard list containing objects that are relevant to that decomposition.
For decompositions that allow for compressioncp, mpca, tucker, pvd, and t_compress, the output for each function will all be a list containing: • est -the compressed estimate of the original 'Tensor' object.
• fnorm_resid -the Frobenius norm of the difference between est and the original 'Tensor' object.
• conv -whether or not the algorithm converged by the maximum iteration (only for the iterative algorithms such as cp, mpca, and tucker).
• all_resids -a vector of residuals at each iteration (only for the iterative algorithms).