Object-Oriented Software for Functional Data

This paper introduces the funData R package as an object-oriented implementation of functional data. It implements a unified framework for dense univariate and multivariate functional data on one- and higher dimensional domains as well as for irregular functional data. The aim of this package is to provide a user-friendly, self-contained core toolbox for functional data, including important functionalities for creating, accessing and modifying functional data objects, that can serve as a basis for other packages. The package further contains a full simulation toolbox, which is a useful feature when implementing and testing new methodological developments. Based on the theory of object-oriented data analysis, it is shown why it is natural to implement functional data in an object-oriented manner. The classes and methods provided by funData are illustrated in many examples using two freely available datasets. The MFPCA package, which implements multivariate functional principal component analysis, is presented as an example for an advanced methodological package that uses the funData package as a basis, including a case study with real data. Both packages are publicly available on GitHub and CRAN.


Introduction
Functional data analysis is a branch of modern statistics that has seen a rapid growth in recent years. The technical progress in many fields of application allows to collect data in increasingly fine resolution, e.g., over time or space, such that the observed datapoints form quasi-continuous, possibly noisy, samples of smooth functions and are thus called functional data. One central aspect of functional data analysis is that the focus of the analysis is not a single data point, but the entirety of all datapoints that are considered to stem from the same curve. Researchers in functional data analysis have developed many new statistical methods for the analysis of this type of data, linking the concept of functional data also to related branches of statistics, such as the study of longitudinal data, which can be seen as sparse and often also irregular samples of smooth functions, or image data, that can be represented as functions on two-dimensional domains. New approaches focus on even more generalized functional objects (next generation functional data analysis, Wang et al., 2016). When it comes to the practical application of new methods to real data, appropriate software solutions are needed to represent functional data in an adequate manner and ideally in a way that new theoretical developments can be implemented easily. The most widely used R package for functional data is fda (Ramsay et al., 2017), which is related to the popular textbook of Ramsay and Silverman (2005). There are many other R packages for functional data that build on it, e.g., Funclustering (Soueidatt, 2014), funFEM (Bouveyron, 2015) or funHDDC (Bouveyron and Jacques, 2014) or provide interfaces to fda, e.g., fda.usc (Febrero-Bande and Oviedo de la Fuente, 2012) or refund (Goldsmith et al., 2016). The fda package contains a class fd for representing dense functional data on one-dimensional domains together with many functionalities for fd objects, such as plotting or summaries. It implements a variety of functional data methods, for example principal component analysis, regression models or registration. The fd objects represent the data as a finite linear combination of given basis functions, such as splines or Fourier bases, i.e., they store the basis functions and the individual coefficients for each curve. This representation of course works best if the underlying function is smooth and can be The structure of this article is as follows: Section 2 contains a short introduction to the concept of object orientation in statistics and computer science and discusses how to adequately represent functional data in terms of software objects. The next section presents the object-oriented implementation of functional data in the funData package. Section 4 introduces the MFPCA package as an example on how to use the funData package for the implementation of new functional data methods. The final section contains a discussion and an outlook to potential future extensions.

Object orientation and functional data
Concepts of object orientation exist both in computer science and statistics. In statistics, the term object-oriented data analysis (OODA) has been introduced by Wang and Marron (2007). They define it as "the statistical analysis of complex objects" and draw their attention on what they call the atom of the analysis. While in many parts of statistics these atoms are numbers or vectors (multivariate analysis), Wang and Marron (2007) argue that they can be much more complex objects such as images, shapes, graphs or trees. Functional data analysis (Ramsay and Silverman, 2005) is an important special case of object-oriented data analysis, where the atoms are functions. In most cases, they can be assumed to lie in L 2 (T ), the space of square integrable functions on a domain T . This space has infinite dimension, but being a Hilbert space, its mathematical structure has many parallels to the space R p of p-dimensional vectors, which allows to transfer many concepts of multivariate statistics to the functional case in a quite straightforward manner. In computer science, object orientation (Booch et al., 2007;Armstrong, 2006;Meyer, 1988) is a programming paradigm which has profoundly changed the way how software is structured and developed.
The key concept of object-oriented programming (OOP) is to replace the until then predominant procedural programs by computer programs made of objects, that can interact with each other and thus form, in a way, the "atoms" of the program. These objects usually consist of two blocks. First, a collection of data, which may have different data types, such as numbers, character strings or vectors of different length and is organized in fields. Second, a collection of methods, i.e., functions for accessing and/or modifying the data and for interacting with other objects. The entirety of all objects and their interactions forms the final program. The main idea of the funData package is to combine the concepts of object orientation that exist in computer science and in statistics for the representation of functional data. The atom of the statistical analysis should thus be represented by the "atom" of the software program. The package therefore provides classes to organize the observed data in an appropriate manner. The class methods implement functionalities for accessing and modifying the data and for interaction between objects, which are primarily mathematical operations. The object orientation is realized in R via the S4 object system (Chambers, 2008). This system fulfills most of the fundamental concepts of object-oriented programming listed in Armstrong (2006) and is thus more rigorous than R's widely used S3 system, which is used e.g., by fda or fda.usc. In particular, it checks for example if a given set of observation (time) points matches the observed data before constructing the functional data object. For the theoretical analysis of functional data, the functions are mostly considered as elements of a function space such as L 2 (T ). For the practical analysis, the data can of course only be obtained in finite resolution. Data with functional features therefore will always come in pairs of the form (t ij , x ij ) with x ij = x i (t ij ), j = 1, . . . , S i , i = 1, . . . , N for some functions x 1 , . . . , x N that are considered as realizations of a random process X : T → R. The domain T ⊂ R d here is assumed to be a compact set with finite (Lebesgue-) measure and in most cases, the dimension d will be equal to 1 (functions on one-dimensional domains) sometimes also 2 (images) or 3 (3D images). The observation points t ij ∈ T in general can differ in their number and location between the individual functions. When implementing functional data in an object-oriented way, it is thus natural to collect the data in two fields: the observation points {(t i1 , . . . , t iS i ) : i = 1, . . . , N } on one hand and the set of observed values {(x i1 , . . . , x iS i ) : i = 1, . . . , N } on the other hand. Both fields form the data block of the functional data object as an inseparable entity. This is a major advantage compared to non object-oriented implementations, that can consider the observation points and the observed values as parameters in their methods, but can not map the intrinsic dependence between both of them.
In the important special case that the functions are observed on a one-dimensional domain and that the arguments do not differ across functions, they can be collected in a single vector (t 1 , . . . , t S ) and the observed values can be stored in a matrix X with entries x ij , i = 1, . . . , N, j = 1, . . . , S. The matrix-based concept can be generalized to data observed on common grids on higher dimensional domains. In this case, the observation grid can be represented as a matrix (or array) or, in the case of a regular and rectangular grid, as a collection of vectors that define the marginals of the observation grid. The observed data is collected in an array with three or even higher dimensions.
In recent years, the study of multivariate functional data that takes multiple functions at the same time into account, has led to new findings. Each observation unit here consists of a fixed number of functions p, that can also differ in their domain (e.g., functions and images, Happ and Greven, 2018). Technically, the observed values are assumed to stem from a random process X = (X (1) , . . . , X (p) ), with random functions X (k) : T k → R, T k ∈ R d k , k = 1, . . . , p, that we call the elements of X. Realizations x 1 , . . . , x N of such a process all have the same structure as X. If for example p = 2 and d 1 = 1, d 2 = 2, the realizations will all be bivariate functions with one functional and one image element. As data can only be obtained in finite resolution, observed multivariate functional data is of the form (t i , i = 1, . . . , N, k = 1, . . . , p. Each element thus can be represented separately by its observation points and the observed values, and the full multivariate sample constitutes the collection of all the p elements.

The funData package
The funData package implements the object-oriented approach for representing functional data in R. It provides three classes for functional data on one-and higher dimensional domains, multivariate functional data and irregularly sampled data, which are presented in Section 3.1. Section 3.2 presents the methods associated with the functional data classes based on two example datasets.

Three classes for functional data
For the representation of functional data in terms of abstract classes -which, in turn, define concrete objects -we distinguish three different cases.
1. Class funData for dense functional data of "arbitrary" dimension (in most cases the dimension of the domain will is d ∈ {1, 2, 3}) on a common set of observation points t 1 , . . . , t S for all curves. The curves may have missing values coded by NA.
2. Class irregFunData for irregularly sampled functional data with individual sampling points t ij , j = 1, . . . , S i , i = 1, . . . , N for all curves. The number S i and the location of observation points can vary across individual observations. At the moment, only data on one-dimensional domains is implemented.
3. Class multiFunData for multivariate functional data, which combines p elements of functional data that may be defined on different dimensional domains (e.g., functions and images).
In the case of data on one-dimensional domains, the boundaries between the funData and the irreg-FunData class may of course be blurred in practice. The conceptual difference is that in 1. all curves are ideally supposed to be sampled on the full grid T = {t 1 , . . . , t S } ⊂ T and differences in the number of observation points per curve are mainly driven by anomalies or errors in the sampling process, such as missing values, which can be coded by NA. In contrast, case 2. expects a priori that the curves can As funData object:  T . The functions are observed on a common discrete grid having S x points in x-and S y points in y-direction, i.e., each observation forms an image with S x × S y pixels. Right: Representation of the data in a funData object. The @argvals slot is a list of length 2, containing the marginal sampling points. The slot @X is an array of dimension N × S x × S y .
be observed at different observation points t ij , and that the number of observations per curve may vary.
For funData and irregFunData, the data is organized in two fields or slots, as they are called for S4 classes (Chambers, 2008): The slot @argvals contains the observation points and the slot @X contains the observed data. For funData, the @argvals slot is a list, containing the common sampling grid for all functions and @X is an array containing all observations. In the simplest case of functions defined on a one-dimensional domain and sampled on a grid with S observation points, @argvals is a list of length one, containing a vector of length S and @X is a matrix of dimension N × S, containing the observed values for each curve in a row-wise manner. For an illustration, see Figure 1. If the funData object is supposed to represent N images with S x × S y pixels, @argvals is a list of length 2, containing two vectors with S x and S y entries, respectively, that represent the sampling grid. The slot @X is an array of dimension N ×S x ×S y , cf. Figure 2. For the irregFunData class, only functions on one-dimensional domains are currently implemented. The @argvals slot here is a list of length N , containing in its i-th entry the vector (t i1 , . . . , t iS i ) with all observation points for the i-th curve. The @X slot organizes the observed values analogously, i.e., it is also a list of length N with the i-th entry containing the vector (x i1 , . . . , x iS i ). An illustration is given in Figure 3. The multiFunData class, finally, represents multivariate functional data with p elements. An object of this class is simply a list of p funData objects, representing the different elements. For an illustration, see Figure 4. Given specific data, the realizations of such classes are called funData, irregFunData or multiFunData objects. We will use the term functional data object in the following for referring to objects of all three classes.

Sy
As multiFunData object: list @argvals: list  i.e., each observation (red/blue) consists of two elements, a curve and an image. Right: Representation of the data as a multiFunData object. As the data is bivariate, the multiFunData object is a list of length 2, containing the two elements as funData objects.

Methods for functional data objects
Essential methods for functional data objects include the creation of an object from the observed data, methods for modifying and extracting the data, plotting, arithmetics and other functionalities. The methods in the funData package are implemented such that they work on functional data objects as the atoms of the program, i.e., the methods accept functional data objects as input and/or have functional data objects as output. Moreover, all functions are implemented for the three different classes with appropriate sub-methods. This corresponds to the principle of polymorphism in Armstrong (2006), as different classes have their own implementation e.g., of a plot function. In most cases, the methods for multiFunData objects will simply call the corresponding method for each element and concatenate the results appropriately.

Data used in the examples
The following code examples use the Canadian weather data (Ramsay and Silverman, 2005), that is available e.g., in the fda package and the CD4 cell count data (Goldsmith et al., 2013) from the refund package. In both cases, the data is observed on a one-dimensional domain. Examples for image data are included in the description of the simulation toolbox at the end of this section. The Canadian weather data contains daily and monthly observations of temperature (in°C) and precipitation (in mm) for N = 35 Canadian weather stations, averaged over the years 1960 to 1994. We will use the daily temperature as an example for dense functional data on a one-dimensional domain. Moreover, it is combined with the monthly precipitation data to multivariate functional data with elements on different domains (T 1 = [1, 365] for the temperature and T 2 = [1, 12] for the precipitation). The CD4 cell count data reports the number of CD4 cells per milliliter of blood for N = 366 subjects who participated in a study on AIDS (MACS, Multicenter AIDS Cohort Study). CD4 cells are part of the human immune system and are attacked in the case of an infection with HIV. Their number thus can be interpreted as a proxy for the disease progression. For the present data, the CD4 counts were measured roughly twice a year and centered at the time of seroconversion, i.e., the time point when HIV becomes detectable. In total, the number of observations for each subject varies between 1 and 11 in the period of 18 months before and 42 months after seroconversion. The individual time points do also differ between subjects. The dataset thus serves as an example for irregular functional data. For more information on the data, please see Goldsmith et al. (2013).

Creating new objects and accessing an object's information
The following code creates funData objects for the Canadian temperature and precipitation data: R> dailyTemp <-funData(argvals = 1:365, + X = t(CanadianWeather$dailyAv[,, 'Temperature.C'])) R> monthlyPrec <-funData(argvals = 1:12, It is then very easy to create a bivariate multiFunData object, containing the daily temperature and the monthly precipitation for the 35 weather stations: The cd4 data in the refund package is stored in a data.frame with 366 × 61 entries, containing the CD4 counts for each patient on the common grid of all sampling points. Missing values are coded as NA. Since each patient has at least 1 and at most 11 observations, more than 90% of the dataset consists of missings. The irregFunData class stores only the observed values and their time points and is therefore more parsimonious. The following code extracts both separately as lists and then constructs an irregFunData object:  The slots can be accessed directly via @argvals or @X. The preferable way of accessing and modifying the data in the slots, however, is via the get/set methods, following the principle of limited access (or encapsulation, Armstrong, 2006), as an example: [1] 1 2 3 4 5 6 7 8 9 10 11 12 The names can be set or get by the names function: R> names(monthlyPrec) <-names(dailyTemp) R> names(monthlyPrec) [1] "St. Johns" "Halifax" "Sydney" "Yarmouth" "Charlottvl" [6] "Fredericton" "Scheffervll" "Arvida" "Bagottville" "Quebec" The number of observation points is given by nObsPoints. Note that the functions in dailyTemp and canadWeather are densely sampled and therefore return a single number or two numbers (one for each element). For the irregularly sampled data in cd4Counts, the method returns a vector of length N = 366, containing the individual number of observations for each subject: The dimension of the domain can be obtained by the dimSupp method: Finally, a subset of the data can be extracted using the function extractObs. We can for example extract the temperature data for the first five weather stations: In both cases, the method returns an object of the same class as the argument with which the function was called (funData for dailyTemp and irregFunData for cd4Counts), which is seen by the output.

Coercion
As discussed earlier, there is no clear boundary between the irregFunData class and the funData class for functions on one-dimensional domains. The package thus provides coercion methods to coerce funData objects to irregFunData objects, as seen in the output:  Further, funData objects can also be coerced to multiFunData objects with only one element: The daily temperature in 35 Canadian weather stations (funData object, left) and the CD4 counts for the first five individuals (irregFunData object, right). Second row: The Canadian weather data for ten weather stations (multiFunData object). See text for the commands used. For all plots the option theme_bw() has been added for optimal print results; all other parameters were kept as defaults.
X: array of size 35 x 365

Mathematical operations for functional data objects
With the funData package, mathematical operations can directly be applied to functional data objects, with the calculation made pointwise and the return being again an object of the same class. The operations build on the Arith and Math group generics for S4 classes. We can for example convert the Canadian temperature data from Celsius to Fahrenheit: R> 9/5 * dailyTemp + 32 or calculate the logarithms of the CD4 count data:

R> log(cd4Counts)
Arithmetic operations such as sums or products are implemented for scalars and functional data objects as well as for two functional data objects. Note that in the last case, the functional data objects must have the same number of observations (in this case, the calculation is done with the i-th function of the first object and the i-th function of the second object) or one object may have only one observation (in this case, the calculation is made with each function of the other object). This is particularly useful e.g., for subtracting a common mean from all functions in an object as in the following example, using the meanFunction method:

R> canadWeather -meanFunction(canadWeather)
Some of the demeaned curves are shown in Figure 7. Note that the functions also need to have the same observation points, which is especially important for irregFunData objects.
The tensorProduct function allows to calculate tensor products of functional data objects f 1 , f 2 on one-dimensional domains T 1 , T 2 , respectively, i.e., The following code calculates the tensor product of the Canadian weather data and the output shows that the result is a funData object on a two-dimensional domain: Two observations in tensorData are shown in Figure 11. Note that for image data, a single observation has to be specified for plotting. Another important aspect when working with functional data is integration. The funData package implements two quadrature rules, "midpoint" and "trapezoidal" (the default). The data is always integrated over the full domain and in the case of multivariate functional data, the integrals are calculated for each element and the results are added: R> integrate(dailyTemp, method = "trapezoidal") [ For irregular data, the integral can be calculated on the observed points or they can be extrapolated linearly to the full domain. For the latter, curves with only one observation point are assumed to be constant. Based on integrals, one defines the usual scalar product on L 2 (T ) and the induced norm ||f || 2 = f, f 2 1/2 for f, g ∈ L 2 (T ). For multivariate functional data on domains T 1 × . . . × T p , the scalar product can be extended to with the induced norm |||f ||| = f, f 1/2 . The multivariate scalar product can further be generalized by introducing weights w j > 0 for each element (cf. Happ and Greven, 2018): Scalar products and norms are implemented for all three classes in the funData package. Here also, the scalar product can be calculated for pairs of functions f 1 , . . . , f N and g 1 , . . . , g N , hence f i , g i 2 , or for a sample f 1 , . . . , f N and a single function g, returning f i , g 2 . The norm function accepts some additional arguments, such as squared (logical, should the squared norm be calculated), obs (the indices of curves for which the norm should be calculated) or weight (a vector containing the weights w 1 , . . . , w p for multivariate functional data): R> scalarProduct(dailyTemp, meanFunction(dailyTemp)) [

Simulation toolbox
The funData package comes with a full simulation toolbox for univariate and multivariate functional data, which is a very useful feature when implementing and testing new methodological developments. The data is simulated based on a truncated Karhunen-Loève representation of functional data, as for example in the simulation studies in Scheipl and Greven (2016) or Happ and Greven (2018). All examples in the following text use set.seed(1) before calling a simulation function for reasons of reproducibility. Example plots are given in the appendix. For univariate functions x i : T → R, the Karhunen-Loève representation of a function x i truncated at M ∈ N is given by For the eigenvalues λ m , one can choose between a linear (λ m = M −m+1 M ) or exponential decrease (exp(− m+1 2 )) or those of the Wiener process. New eigenfunctions and eigenvalues can be added to the package in an easy and modular manner. The next code chunk simulates N = 8 curves on the one-dimensional observation grid {0, 0.01, 0.02, . . . , 1} based on the first M = 10 Fourier basis functions on [0, 1] and eigenvalues with a linear decrease: R> simUniv1D <-simFunData(N = 8, argvals = seq(0,1,0.01), + eFunType = "Fourier", eValType = "linear", M = 10) The function returns a list with 3 entries: The simulated data (simData, a funData object shown in Figure 12 in the appendix), the true eigenvalues (trueVals) and eigenfunctions (trueFuns, also as a funData object). For simulating functional data on a two-or higher dimensional domain, simFunData constructs eigenfunctions based on tensor products of univariate eigenfunctions. The user thus has to supply the parameters that relate to the eigenfunctions as a list ( The first simulated image is shown in Figure 12 in the appendix. As for functions on one-dimensional domains, the function returns the simulated data together with the true eigenvalues and eigenfunctions. R> argvals <-list(seq(-0.5,0.5,0.01), seq(0,1,0.01)) R> simMultiSplit <-simMultiFunData(N = 7, argvals = argvals, + eFunType = "Fourier", eValType = "linear", M = 10, + type = "split") As an alternative, multivariate eigenfunctions can be constructed as weighted versions of univariate eigenfunctions. With this approach, one can also simulate multivariate functional data on different dimensional domains, e.g., functions and images. It is implemented in funData's simMultiFunData method using the option type = "weighted". The following code simulates N = 5 bivariate functions on T 1 = [−0.5, 0.5] and T 2 = [0, 1] × [−1, 1]. The first elements of the eigenfunctions are derived from M 1 = 12 Fourier basis functions on T 1 and the second elements of the eigenfunctions are constructed from tensor products of 4 eigenfunctions of the Wiener process on [0, 1] and 3 Legendre polynomials on [−1, 1], which give together M 2 = 12 eigenfunctions on T 2 . The scores are sampled using exponentially decreasing eigenvalues: R> argvals <-list(seq(-0.5,0.5,0.01), list(seq(0,1,0.01), seq(-1,1,0.01))) R> simMultiWeight <-simMultiFunData(N = 5, argvals = argvals, + eFunType = list("Fourier", c("Wiener", "Poly")), + eValType = "exponential", M = list(12, c(4,3)), + type = "weighted") In both cases, the result contains the simulated data as well as the eigenfunctions and eigenvalues. The simulated functions are shown in Figures 13 and 14 in the appendix. For more technical details on the construction of the eigenfunctions, see Happ and Greven (2018). Once simulated, the data can be further processed by adding noise (function addErr) or by artificially deleting measurements (sparsification, function sparsify). The latter is done in analogy to Yao et al. (2005). Examples for modified versions of simulated functions are shown in the appendix ( Figure 15 for the univariate case and in Figure 16 for the multivariate case).

The MFPCA package
The MFPCA package implements multivariate functional principal component analysis (MFPCA) for data on potentially different dimensional domains (Happ and Greven, 2018). It heavily builds upon the funData package, i.e., all functions are implemented as functional data objects. The MFPCA package thus illustrates the use of funData as a universal basis for implementing new methods for functional data. Section 4.1 gives a short review of the MFPCA methodology and Section 4.2 describes the implementation including a detailed description of the main functions and a practical case study. For theoretical details, please refer to Happ and Greven (2018).

Methodological background
The basic idea of MFPCA is to extend functional principal component analysis to multivariate functional data on different dimensional domains. The data is assumed to be iid samples x 1 , . . . , x N of a random process X = (X (1) , . . . , Happ and Greven (2018) provide an algorithm to estimate multivariate functional principal components and scores based on their univariate counterparts. The algorithm starts with demeaned samples x 1 , . . . , x N and consists of four steps: 1. Calculate a univariate functional principal component analysis for each element j = 1, . . . , p. This results in principal component functionsφ The advantage of MFPCA with respect to univariate FPCA for each component can be seen in steps 2 and 3: The multivariate version takes covariation between the different elements into account, by using the joint covariance of the scores of all elements.
As discussed for the simulation in Section 3.2, the multivariate principal component functions will have the same structure as the original samples, i.e.,ψ m = ψ (1) m , . . . ,ψ  K j , such as splines. In Happ and Greven (2018) it is shown how the algorithm can be extended to arbitrary basis functions in L 2 (T j ). Mixed approaches with some elements expanded in principal components and others for instance in splines are also possible. Another very likely case is that the elements of the multivariate functional data differ in their domain, range or variation. For this case, Happ and Greven (2018) develop a weighted version of MFPCA with weights w j > 0 for the different elements j = 1, . . . , p. The weights have to be chosen depending on the data and the question of interest. One possible choice is to use the inverse of the integrated pointwise variance for the weights, as proposed in Happ and Greven (2018)

MFPCA implementation
The main function in the MFPCA package is MFPCA, that calculates the multivariate functional principal component analysis. It requires as input arguments a multiFunData object for which the MFPCA should be calculated, the number of principal components M to calculate and a list uni-Expansions specifying the univariate representations to use in step 1. It returns an object of class MFPCAfit, which has methods for printing, plotting and summarizing. Before discussing the detailed options, we illustrate the usage of MFPCA with a real data application.

Case study: Calculating the MFPCA for the Canadian weather data
The following example calculates a multivariate functional principal component analysis for the bivariate Canadian weather data with three principal components, using univariate FPCA with five principal components for the daily temperature (element 1) and univariate FPCA with four principal components for the monthly precipitation (element 2). The univariate expansions are specified in a list with two list entries (one for each element) and are then passed to the main function: R> uniExpansions <-list(list(type = "uFPCA", npc = 5), # temperature element + list(type = "uFPCA", npc = 4)) # precipitation element R> MFPCAweather <-MFPCA (canadWeather,M = 3,uniExpansions = uniExpansions) The full analysis takes roughly nine seconds on a standard laptop, with most time spent for the univariate decompositions (if the elements are e.g., expanded in penalized splines, the total calculation time reduces to one second).
The resulting object MFPCAweather contains the following elements: the multivariate mean function (meanFunction, as the data is demeaned automatically before the analysis), the empirical multivariate principal component functions (functions), the individual scores for each city (scores) and the estimated eigenvalues (values). Two additional elements can be used for calculating out-of-sample predictions (vectors and normFactors). The summary function gives a basic overview of the results.

R> summary(MFPCAweather)
3 multivariate functional principal components estimated with 2 elements, each. * * * * * * * * * * The eigenvalues here are rapidly decreasing, i.e., the first principal component already explains almost 90% of the variability in the data. The decrease of the eigenvalues is graphically illustrated by the screeplot function (see Figure 17 in the appendix). All functions in MFPCAweather are represented as functional data objects and can thus be plotted using the methods provided by the funData package (see Figure 8). The mean function of the temperature element is seen to have low values below -10°C in the winter and a peak at around 15°C in the summer, while the mean of the monthly precipitation data is slightly increasing over the year. The first principal component is negative for both elements, i.e., weather stations with positive scores will in general have lower temperatures and less precipitation than the mean. The difference is more pronounced in the winter than in the summer, as both the temperature as well as the precipitation element of the first principal component have more negative values in the winter period. This indicates that there is covariation between both elements, that can be captured by the MFPCA approach. An alternative visualization, plotting the principal component as perturbation of the mean as in the fda package, can be obtained via plot(MFPCAweather) (see Figure 18 in the appendix). In total, the first bivariate eigenfunction can be associated with arctic and continental climate, characterized by low temperatures (especially in the winter) and less precipitation than on average. Weather stations with negative score values will show an opposite behavior, with higher temperatures and more rainfall than on average, particularly in the winter months. This is typical for maritime climate. The estimated scores for the first principal component support this interpretation, as weather stations in arctic and continental areas mainly have positive scores, while stations in the coastal areas have negative values in most cases (see Figure 9). Moreover, weather stations in the arctic and pacific regions are seen to have more extreme score values than those in continental areas and on the Atlantic coast, meaning that the latter have a more moderate climate. An alternative visualization of the scores is given by the scoreplot function (see Figure 19 in the appendix).

Univariate basis expansions
In the above example, both univariate elements have been decomposed in univariate functional principal components in step 1. The MFPCA package implements some further options for the univariate expansions, that can easily be extended in a modular way. Currently implemented basis expansions are: given: Given basis functions. This can for example be useful if a univariate FPCA was already calculated for each element. For one element, uniExpansions looks as follows: R> list(type = "given", functions, scores, ortho) Here functions is a funData object on the same domain as the data and contains the given basis functions. The parameters scores and ortho are optional. The first represents the coefficient matrix of the observed functions for the given basis functions in a row-wise manner, while ortho specifies whether the basis functions are orthonormal or not. If ortho is not supplied, the functions are treated as non-orthonormal.
uFPCA: Univariate functional principal component analysis for data on one-dimensional domains. This option was used in the previous example. The list entry for one element has the form: R> list(type = "uFPCA", nbasis, pve, npc, makePD, cov.weight.type) The implementation is based on the PACE approach (Yao et al., 2005) with the mean function and the covariance surface smoothed with penalized splines (Di et al., 2009), following the implementation in the refund package. The MFPCA function returns the smoothed mean function, while for all other options, the mean function is calculated pointwise. Options for this expansion include the number of basis functions nbasis used for the smoothed mean and covariance functions (defaults to 10; for the covariance this number of basis functions is used for each marginal); pve, a value between 0 and 1, giving the proportion of variance that should be explained by the principal components (defaults to 0.99); npc, an alternative way to specify the number of principal components to be calculated explicitly (defaults to NULL, otherwise overrides pve); makePD, an option to enforce positive definiteness of the covariance surface estimate (defaults to FALSE) and cov.weight.type, which characterizes the weighting scheme for the covariance surface (defaults to "none").
spline1D and spline1Dpen: These options calculate a spline representation of functions on onedimensional domains using the gam function in the mgcv package (Wood, 2011(Wood, , 2017. When using this option, the uniExpansions entry for one element is of the form: R> list(type = "splines1D", bs, m, k) R> list(type = "splines1Dpen", bs, m, k, parallel) For spline1Dpen, the coefficients are found by a penalization approach, while for spline1D the observations are simply projected on the spline space without penalization. Thus, the spline1Dpen option will in general lead to smoother representations than spline1D. Possible options passed for these expansions are bs, the type of basis functions to use (defaults to "ps" for possibly penalized B-spline functions); m, the order of the spline basis (defaults to NA, i.e., the order is chosen automatically); k, the number of basis functions to use (default value is -1, which means that the number of basis functions is chosen automatically). For the penalized version, there is an additional option parallel which, if set to TRUE, calculates the spline coefficients in parallel. In this case, a parallel backend must be registered before (defaults to FALSE).
spline2D and spline2Dpen: These are analogue options to spline1D and spline1Dpen for functional data on two-dimensional domains (images): R> list(type = "splines2D", bs, m, k) R> list(type = "splines2Dpen", bs, m, k, parallel) The parameters bs, m and k for the type, order and number of basis functions can be either a single number/character string that is used for all marginals or a vector with the specifications for all marginals. For the penalized version, the function bam in mgcv is used to speed up the calculations and reduce memory load. Setting parallel=TRUE enables parallel calculation of the basis function coefficients. As for the one-dimensional case, this requires a parallel backend to be registered before.
FCP_TPA: This option uses the Functional CP-TPA algorithm of Allen (2013) to compute an eigendecomposition of image observations, which can be interpreted as functions on a two-dimensional domain. The algorithm assumes a CANDECOMP/PARAFRAC (CP) representation of the data tensor X ∈ R N ×Sx×Sy containing all observations x i with S x × S y pixels, each: Here, d m is a scalar, u m ∈ R N , v m ∈ R Sx , w m ∈ R Sy are vectors and • denotes the outer product. We can thus interpret v m • w m as the m-th univariate eigenfunction evaluated at the same pixels as the originally observed data. The vector d m · u m ∈ R N can in turn be interpreted as the score vector containing the scores for the m-th principal component function and each observation. The algorithm proposed in Allen (2013) includes smoothing parameters λ u , λ v , λ w ≥ 0 to smooth along all dimensions, extending the approach of Huang et al. (2009) from one-dimensional to two-dimensional functions. As smoothing along the observations u m ∈ R N is not required in the given context, the parameter λ u is fixed to zero and the smoothing is implemented only for the v and w directions. When decomposing images with this algorithm, the user has to supply a list of the following form for the corresponding element: R> list(type = "FCP_TPA", npc, smoothingDegree, alphaRange, + orderValues, normalize) Required options are npc, the number of eigenimages to be calculated, and alphaRange, the range of the smoothing parameters. The latter must be a list with two entries named v and w, giving the possible range for λ v , λ w as vectors with the minimal and maximal value, each (e.g., alphaRange = list(v = c(10^-2,10^2), w = c(10^-3,10^3)) would enforce λ v ∈ [10 −2 , 10 2 ] and λ w ∈ [10 −3 , 10 3 ]). Optimal values for λ v and λ w are found by numerically optimizing a GCV criterion (cf. Huang et al., 2009, in the one-dimensional case). Further options are the smoothing degree, i.e., the type of differences that should be penalized in the smoothing step (smoothingDegree, defaults to second differences for both directions) and two logical parameters concerning the ordering of the principal components and their normalizations: If orderValues is TRUE, the eigenvalues and associated eigenimages and scores are ordered decreasingly (defaults to TRUE), i.e., the first eigenimage corresponds to the highest eigenvalue that has been found, the second eigenimage to the second highest eigenvalue and so on. The option normalize specifies whether the eigenimages should be normalized (defaults to FALSE).
UMPCA: This option implements the UMPCA (Uncorrelated Multilinear Principal Component Analysis, Lu et al., 2009) algorithm for finding uncorrelated eigenimages of two-dimensional functions (images). Essentially, this implements the UMPCA toolbox for MATLAB (Lu, 2012) in R: R> list(type = "UMPCA", npc) The number of eigenimages that are calculated has to be supplied by the user (npc). Note that this algorithm aims more at uncorrelated features than at an optimal reconstruction of the images and thus may lead to unsatisfactory results for the MFPCA approach.
DCT2D/DCT3D: This option calculates a representation of functional data on two-or three-dimensional domains in a tensor cosine basis. For speeding up the calculations, the implementation is based on the fftw3 C-library (Frigo and Johnson, 2005, developer version). If the fftw3-dev library is not available during the installation of the MFPCA package, the DCT2D and DCT3D options are disabled and throw an error. After installing fftw3-dev on the system, MFPCA has to be re-installed to activate DCT2D/DCT3D. The uniExpansions entry for a cosine representation of 2D/3D elements is: R> list(type = "DCT2D", qThresh, parallel) R> list(type = "DCT3D", qThresh, parallel) The discrete cosine transformation is a real-valued variant of the fast Fourier transform (FFT) and usually results in a huge number of non-zero coefficients that mostly model "noise" and can thus be set to zero without affecting the representation of the data. The user has to supply a threshold between 0 and 1 (qThresh) that defines the proportion of coefficients to be thresholded. Setting e.g., qThresh = 0.9 will set 90% of the coefficients to zero, leaving only the 10% of the coefficients with the highest absolute values. The coefficients are stored in a sparseMatrix (package Matrix) object to reduce the memory load for the following computations. The calculations can be run in parallel for the different observations by setting the parameter parallel to TRUE (defaults to FALSE), if a parallel backend has been registered before.

Further options for MFPCA
With the mean function, the principal components and the individual scores calculated in the MFPCA function, the observed functions x 1 , . . . , x N can be reconstructed based on the truncated Karhunen-Loève representation with plugged-in estimators as in Equation 5. The reconstructions can be obtained by setting the option fit = TRUE, which adds a multivariate functional data object fit with N observations to the result object, where the i-th entry corresponds to the reconstructionx i of an observation x i . For a weighted version of MFPCA, the weights can be supplied to the MFPCA function in form of a vector weights of length p, containing the weights w j > 0 for each element j = 1, . . . , p.
Both options are used in the following example for the CanadWeather data, which uses the weights based on the integrated pointwise variance, as discussed in Happ and Greven (2018):  If elements are expanded in fixed basis functions, the number of basis functions that are needed to represent the data well will in general be quite high, particularly for elements with higher dimensional domains. As a consequence, the covariance matrix of all scores in step 2 of the estimation algorithm can become large and the eigendecompositions in step 3 can get computationally very demanding. By setting the option approx.eigen = TRUE, the eigenproblem is solved approximately using the augmented implicitly restarted Lanczos bidiagonalization algorithm (IRLBA, Baglama and Reichel, 2005) implemented in the irlba package (Baglama et al., 2018). If the number M of principal components is low with respect to the number of observations N and the total number of univariate basis functions for all elements, the approximation will in general work very well. If in contrast M is small with respect to N or the total number of univariate basis functions, the approximation may be inappropriate. Following the recommendations in irlba, the approximation is not used if M is larger than 1 2 min(N, M + ) and a warning is thrown. On the other hand, if approx.eigen = FALSE and the total number of univariate basis functions exceeds 1000, MFPCA throws a warning, but continues calculating the exact eigendecomposition.

Bootstrap confidence bands
The MFPCA function implements nonparametric bootstrap on the level of functions to quantify the uncertainty in the estimation of the MFPCA (cf. Happ and Greven, 2018). It calculates pointwise bootstrap confidence bands for the principal component functions and bootstrap confidence bands for the associated eigenvalues. For elements with fixed basis functions, that do not depend on the data (i.e., no principal component-related methods for the univariate decomposition), the algorithm efficiently uses the results of step 1 by resampling from the scores on the level of curves. For all other elements, the principal components and associated scores are recalculated for each bootstrap sample. The confidence bands are calculated by setting the parameter bootstrap to TRUE. In this case, the user has to supply the number of bootstrap iterations (nBootstrap) and the significance levels for the bootstrap confidence intervals (bootstrapAlpha).

Summary and outlook
The funData package implements functional data in an object-oriented manner. The aim of the package is to provide a flexible and unified toolbox for dense univariate and multivariate functional data with different dimensional domains as well as irregular functional data. The package implements basic utilities for creating, accessing and modifying the data, upon which other packages can be built. This distinguishes the funData package from other packages for functional data, that either do not provide a specific data structure together with basic utilities or mix this aspect with the implementation of advanced methods for functional data.
The funData package implements three classes for representing functional data based on the observed values and without any further assumptions such as basis function representations. The classes follow a unified approach for representing and working with the data, which means that the same methods are implemented for all the three classes (polymorphism). The package further includes a full simulation toolbox for univariate and multivariate functional data on one-and higher dimensional domains. This is a very useful feature when implementing and testing new methodological developments. The MFPCA package is an example for an advanced methodological package, which builds upon the funData functionalities. It implements a new approach, multivariate functional principal component analysis for data on different dimensional domains (Happ and Greven, 2018). All calculations relating to the functional data, data input and output use the basic funData classes and methods. Both packages, funData and MFPCA, are publicly available on CRAN (https://CRAN.R-project. org) and GitHub (https://github.com/ClaraHapp). They come with a comprehensive documentation, including many examples. Both of them use the testthat system for unit testing (Wickham, 2011), to make the software development more safe and stable and reach a code coverage of over 95%. Potential future extensions of the funData package include interfaces to other packages, e.g., fda, fda.usc or refund. This could open all methods for functional data implemented in these packages to functional data objects of the funData package. At the same time, the results of these methods could be transformed back to functional data objects of the funData package for using the basic utility functions, such as e.g., plotting based on the ggplot2 graphics engine. Further, one may think of extending the existing classes to represent an even broader group of functional data, by e.g., allowing the irregFunData class to have observation points in a higher dimensional space or by providing appropriate plotting methods for one-dimensional curves in 2D or 3D space. For the MFPCA, new basis functions as e.g., wavelets could be implemented.  Figure 16: Transforming the simulated bivariate data. First row: A noisy version of the first observation of simMultiWeight (see Figure 14). Second row: All 7 observations of simMultiSplit after sparsification (see Figure 13). Solid lines show the original data, filled dots correspond to the observed values of the sparsified version. Note that the standard deviation in the noise as well as the degree of sparsification varies across elements.